Experimental and analytical study of high velocity impact on Kevlar/Epoxy composite plates
NASA Astrophysics Data System (ADS)
Sikarwar, Rahul S.; Velmurugan, Raman; Madhu, Velmuri
2012-12-01
In the present study, impact behavior of Kevlar/Epoxy composite plates has been carried out experimentally by considering different thicknesses and lay-up sequences and compared with analytical results. The effect of thickness, lay-up sequence on energy absorbing capacity has been studied for high velocity impact. Four lay-up sequences and four thickness values have been considered. Initial velocities and residual velocities are measured experimentally to calculate the energy absorbing capacity of laminates. Residual velocity of projectile and energy absorbed by laminates are calculated analytically. The results obtained from analytical study are found to be in good agreement with experimental results. It is observed from the study that 0/90 lay-up sequence is most effective for impact resistance. Delamination area is maximum on the back side of the plate for all thickness values and lay-up sequences. The delamination area on the back is maximum for 0/90/45/-45 laminates compared to other lay-up sequences.
Experimental investigation of an RNA sequence space
NASA Technical Reports Server (NTRS)
Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.
1993-01-01
Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.
Beyond the bucket: testing the effect of experimental design on rate and sequence of decay
NASA Astrophysics Data System (ADS)
Gabbott, Sarah; Murdock, Duncan; Purnell, Mark
2016-04-01
Experimental decay has revealed the potential for profound biases in our interpretations of exceptionally preserved fossils, with non-random sequences of character loss distorting the position of fossil taxa in phylogenetic trees. By characterising these sequences we can rewind this distortion and make better-informed interpretations of the affinity of enigmatic fossil taxa. Equally, rate of character loss is crucial for estimating the preservation potential of phylogentically informative characters, and revealing the mechanisms of preservation themselves. However, experimental decay has been criticised for poorly modeling 'real' conditions, and dismissed as unsophisticated 'bucket science'. Here we test the effect of a differing experimental parameters on the rate and sequence of decay. By doing so, we can test the assumption that the results of decay experiments are applicable to informing interpretations of exceptionally preserved fossils from diverse preservational settings. The results of our experiments demonstrate the validity of using the sequence of character loss as a phylogenetic tool, and sheds light on the extent to which environment must be considered before making decay-informed interpretations, or reconstructing taphonomic pathways. With careful consideration of experimental design, driven by testable hypotheses, decay experiments are robust and informative - experimental taphonomy needn't kick the bucket just yet.
Design of nucleic acid sequences for DNA computing based on a thermodynamic approach
Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma
2005-01-01
We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design.
Meisel, Jacquelyn S; Hannigan, Geoffrey D; Tyldsley, Amanda S; SanMiguel, Adam J; Hodkinson, Brendan P; Zheng, Qi; Grice, Elizabeth A
2016-05-01
Culture-independent studies to characterize skin microbiota are increasingly common, due in part to affordable and accessible sequencing and analysis platforms. Compared to culture-based techniques, DNA sequencing of the bacterial 16S ribosomal RNA (rRNA) gene or whole metagenome shotgun (WMS) sequencing provides more precise microbial community characterizations. Most widely used protocols were developed to characterize microbiota of other habitats (i.e., gastrointestinal) and have not been systematically compared for their utility in skin microbiome surveys. Here we establish a resource for the cutaneous research community to guide experimental design in characterizing skin microbiota. We compare two widely sequenced regions of the 16S rRNA gene to WMS sequencing for recapitulating skin microbiome community composition, diversity, and genetic functional enrichment. We show that WMS sequencing most accurately recapitulates microbial communities, but sequencing of hypervariable regions 1-3 of the 16S rRNA gene provides highly similar results. Sequencing of hypervariable region 4 poorly captures skin commensal microbiota, especially Propionibacterium. WMS sequencing, which is resource and cost intensive, provides evidence of a community's functional potential; however, metagenome predictions based on 16S rRNA sequence tags closely approximate WMS genetic functional profiles. This study highlights the importance of experimental design for downstream results in skin microbiome surveys. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Skin microbiome surveys are strongly influenced by experimental design
Meisel, Jacquelyn S.; Hannigan, Geoffrey D.; Tyldsley, Amanda S.; SanMiguel, Adam J.; Hodkinson, Brendan P.; Zheng, Qi; Grice, Elizabeth A.
2016-01-01
Culture-independent studies to characterize skin microbiota are increasingly common, due in part to affordable and accessible sequencing and analysis platforms. Compared to culture-based techniques, DNA sequencing of the bacterial 16S ribosomal RNA (rRNA) gene or whole metagenome shotgun (WMS) sequencing provide more precise microbial community characterizations. Most widely used protocols were developed to characterize microbiota of other habitats (i.e. gastrointestinal), and have not been systematically compared for their utility in skin microbiome surveys. Here we establish a resource for the cutaneous research community to guide experimental design in characterizing skin microbiota. We compare two widely sequenced regions of the 16S rRNA gene to WMS sequencing for recapitulating skin microbiome community composition, diversity, and genetic functional enrichment. We show that WMS sequencing most accurately recapitulates microbial communities, but sequencing of hypervariable regions 1-3 of the 16S rRNA gene provides highly similar results. Sequencing of hypervariable region 4 poorly captures skin commensal microbiota, especially Propionibacterium. WMS sequencing, which is resource- and cost-intensive, provides evidence of a community’s functional potential; however, metagenome predictions based on 16S rRNA sequence tags closely approximate WMS genetic functional profiles. This work highlights the importance of experimental design for downstream results in skin microbiome surveys. PMID:26829039
Discriminative prediction of mammalian enhancers from DNA sequence
Lee, Dongwon; Karchin, Rachel; Beer, Michael A.
2011-01-01
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
On the auto and cross correlation of PN sequences
NASA Technical Reports Server (NTRS)
Morakis, J. C.
1969-01-01
The autocorrelation and crosscorrelation properties of pseudorandom (PN) sequences are analyzed by using some important properties of PN sequences. These properties make this discussion understandable without the need of linear algebraic approach. The analysis is followed by some experimental results.
Relation between native ensembles and experimental structures of proteins
Best, Robert B.; Lindorff-Larsen, Kresten; DePristo, Mark A.; Vendruscolo, Michele
2006-01-01
Different experimental structures of the same protein or of proteins with high sequence similarity contain many small variations. Here we construct ensembles of “high-sequence similarity Protein Data Bank” (HSP) structures and consider the extent to which such ensembles represent the structural heterogeneity of the native state in solution. We find that different NMR measurements probing structure and dynamics of given proteins in solution, including order parameters, scalar couplings, and residual dipolar couplings, are remarkably well reproduced by their respective high-sequence similarity Protein Data Bank ensembles; moreover, we show that the effects of uncertainties in structure determination are insufficient to explain the results. These results highlight the importance of accounting for native-state protein dynamics in making comparisons with ensemble-averaged experimental data and suggest that even a modest number of structures of a protein determined under different conditions, or with small variations in sequence, capture a representative subset of the true native-state ensemble. PMID:16829580
Structural analysis of a set of proteins resulting from a bacterial genomics project.
Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R
2005-09-01
The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.
2012-01-01
Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019
Gallus, Susanne; Lammers, Fritjof
2016-01-01
The autonomous transposable element LINE-1 is a highly abundant element that makes up between 15% and 20% of therian mammal genomes. Since their origin before the divergence of marsupials and placental mammals, LINE-1 elements have contributed actively to the genome landscape. A previous in silico screen of the Tasmanian devil genome revealed a lack of functional coding LINE-1 sequences. In this study we present the results of an in vitro analysis from a partial LINE-1 reverse transcriptase coding sequence in five marsupial species. Our experimental screen supports the in silico findings of the genome-wide degradation of LINE-1 sequences in the Tasmanian devil, and identifies a high frequency of degraded LINE-1 sequences in other Australian marsupials. The comparison between the experimentally obtained LINE-1 sequences and reference genome assemblies suggests that conclusions from in silico analyses of retrotransposition activity can be influenced by incomplete genome assemblies from short reads. PMID:27389686
BlackOPs: increasing confidence in variant detection through mappability filtering.
Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil
2013-10-01
Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
Small-target leak detection for a closed vessel via infrared image sequences
NASA Astrophysics Data System (ADS)
Zhao, Ling; Yang, Hongjiu
2017-03-01
This paper focus on a leak diagnosis and localization method based on infrared image sequences. Some problems on high probability of false warning and negative affect for marginal information are solved by leak detection. An experimental model is established for leak diagnosis and localization on infrared image sequences. The differential background prediction is presented to eliminate the negative affect of marginal information on test vessel based on a kernel regression method. A pipeline filter based on layering voting is designed to reduce probability of leak point false warning. A synthesize leak diagnosis and localization algorithm is proposed based on infrared image sequences. The effectiveness and potential are shown for developed techniques through experimental results.
Evaluating the role of coherent delocalized phonon-like modes in DNA cyclization
Alexandrov, Ludmil B.; Rasmussen, Kim Ã.; Bishop, Alan R.; ...
2017-08-29
The innate flexibility of a DNA sequence is quantified by the Jacobson-Stockmayer’s J-factor, which measures the propensity for DNA loop formation. Recent studies of ultra-short DNA sequences revealed a discrepancy of up to six orders of magnitude between experimentally measured and theoretically predicted J-factors. These large differences suggest that, in addition to the elastic moduli of the double helix, other factors contribute to loop formation. We develop a new theoretical model that explores how coherent delocalized phonon-like modes in DNA provide single-stranded ”flexible hinges” to assist in loop formation. We also combine the Czapla-Swigon-Olson structural model of DNA with ourmore » extended Peyrard-Bishop-Dauxois model and, without changing any of the parameters of the two models, apply this new computational framework to 86 experimentally characterized DNA sequences. Our results demonstrate that the new computational framework can predict J-factors within an order of magnitude of experimental measurements for most ultra-short DNA sequences, while continuing to accurately describe the J-factors of longer sequences. Furthermore, we demonstrate that our computational framework can be used to describe the cyclization of DNA sequences that contain a base pair mismatch. Overall, our results support the conclusion that coherent delocalized phonon-like modes play an important role in DNA cyclization.« less
Evaluating the role of coherent delocalized phonon-like modes in DNA cyclization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alexandrov, Ludmil B.; Rasmussen, Kim Ã.; Bishop, Alan R.
The innate flexibility of a DNA sequence is quantified by the Jacobson-Stockmayer’s J-factor, which measures the propensity for DNA loop formation. Recent studies of ultra-short DNA sequences revealed a discrepancy of up to six orders of magnitude between experimentally measured and theoretically predicted J-factors. These large differences suggest that, in addition to the elastic moduli of the double helix, other factors contribute to loop formation. We develop a new theoretical model that explores how coherent delocalized phonon-like modes in DNA provide single-stranded ”flexible hinges” to assist in loop formation. We also combine the Czapla-Swigon-Olson structural model of DNA with ourmore » extended Peyrard-Bishop-Dauxois model and, without changing any of the parameters of the two models, apply this new computational framework to 86 experimentally characterized DNA sequences. Our results demonstrate that the new computational framework can predict J-factors within an order of magnitude of experimental measurements for most ultra-short DNA sequences, while continuing to accurately describe the J-factors of longer sequences. Furthermore, we demonstrate that our computational framework can be used to describe the cyclization of DNA sequences that contain a base pair mismatch. Overall, our results support the conclusion that coherent delocalized phonon-like modes play an important role in DNA cyclization.« less
Performing SELEX experiments in silico
NASA Astrophysics Data System (ADS)
Wondergem, J. A. J.; Schiessel, H.; Tompitak, M.
2017-11-01
Due to the sequence-dependent nature of the elasticity of DNA, many protein-DNA complexes and other systems in which DNA molecules must be deformed have preferences for the type of DNA sequence they interact with. SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiments and similar sequence selection experiments have been used extensively to examine the (indirect readout) sequence preferences of, e.g., nucleosomes (protein spools around which DNA is wound for compactification) and DNA rings. We show how recently developed computational and theoretical tools can be used to emulate such experiments in silico. Opening up this possibility comes with several benefits. First, it allows us a better understanding of our models and systems, specifically about the roles played by the simulation temperature and the selection pressure on the sequences. Second, it allows us to compare the predictions made by the model of choice with experimental results. We find agreement on important features between predictions of the rigid base-pair model and experimental results for DNA rings and interesting differences that point out open questions in the field. Finally, our simulations allow application of the SELEX methodology to systems that are experimentally difficult to realize because they come with high energetic costs and are therefore unlikely to form spontaneously, such as very short or overwound DNA rings.
Tracking prominent points in image sequences
NASA Astrophysics Data System (ADS)
Hahn, Michael
1994-03-01
Measuring image motion and inferring scene geometry and camera motion are main aspects of image sequence analysis. The determination of image motion and the structure-from-motion problem are tasks that can be addressed independently or in cooperative processes. In this paper we focus on tracking prominent points. High stability, reliability, and accuracy are criteria for the extraction of prominent points. This implies that tracking should work quite well with those features; unfortunately, the reality looks quite different. In the experimental investigations we processed a long sequence of 128 images. This mono sequence is taken in an outdoor environment at the experimental field of Mercedes Benz in Rastatt. Different tracking schemes are explored and the results with respect to stability and quality are reported.
Experimental investigation of measurement-induced disturbance and time symmetry in quantum physics
NASA Astrophysics Data System (ADS)
Curic, D.; Richardson, M. C.; Thekkadath, G. S.; Flórez, J.; Giner, L.; Lundeen, J. S.
2018-04-01
Unlike regular time evolution governed by the Schrödinger equation, standard quantum measurement appears to violate time-reversal symmetry. Measurement creates random disturbances (e.g., collapse) that prevent back-tracing the quantum state of the system. The effect of these disturbances is explicit in the results of subsequent measurements. In this way, the joint result of sequences of measurements depends on the order in time in which those measurements are performed. One might expect that if the disturbance could be eliminated this time-ordering dependence would vanish. Following a recent theoretical proposal [Bednorz, Franke, and Belzig, New J. Phys. 15, 023043 (2013), 10.1088/1367-2630/15/2/023043], we experimentally investigate this dependence for a kind of measurement that creates an arbitrarily small disturbance: weak measurement. We perform various sequences of a set of polarization weak measurements on photons. We experimentally demonstrate that, although the weak measurements are minimally disturbing, their time ordering affects the outcome of the measurement sequence for quantum systems.
CisSERS: Customizable in silico sequence evaluation for restriction sites
Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...
2016-04-12
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
CisSERS: Customizable in silico sequence evaluation for restriction sites
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
Karaboga, D; Aslan, S
2016-04-27
The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.
Schlötterer, C; Kofler, R; Versace, E; Tobler, R; Franssen, S U
2015-05-01
Evolve and resequence (E&R) is a new approach to investigate the genomic responses to selection during experimental evolution. By using whole genome sequencing of pools of individuals (Pool-Seq), this method can identify selected variants in controlled and replicable experimental settings. Reviewing the current state of the field, we show that E&R can be powerful enough to identify causative genes and possibly even single-nucleotide polymorphisms. We also discuss how the experimental design and the complexity of the trait could result in a large number of false positive candidates. We suggest experimental and analytical strategies to maximize the power of E&R to uncover the genotype-phenotype link and serve as an important research tool for a broad range of evolutionary questions.
Counterbalancing for serial order carryover effects in experimental condition orders.
Brooks, Joseph L
2012-12-01
Reactions of neural, psychological, and social systems are rarely, if ever, independent of previous inputs and states. The potential for serial order carryover effects from one condition to the next in a sequence of experimental trials makes counterbalancing of condition order an essential part of experimental design. Here, a method is proposed for generating counterbalanced sequences for repeated-measures designs including those with multiple observations of each condition on one participant and self-adjacencies of conditions. Condition ordering is reframed as a graph theory problem. Experimental conditions are represented as vertices in a graph and directed edges between them represent temporal relationships between conditions. A counterbalanced trial order results from traversing an Euler circuit through such a graph in which each edge is traversed exactly once. This method can be generalized to counterbalance for higher order serial order carryover effects as well as to create intentional serial order biases. Modern graph theory provides tools for finding other types of paths through such graph representations, providing a tool for generating experimental condition sequences with useful properties. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment
2011-01-01
Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510
Wang, Tao; Huang, Jiang-hua; Lin, Lin; Zhan, Chang'an A
2013-01-01
To obtain reliable transient auditory evoked potentials (AEPs) from EEGs recorded using high stimulus rate (HSR) paradigm, it is critical to design the stimulus sequences of appropriate frequency properties. Traditionally, the individual stimulus events in a stimulus sequence occur only at discrete time points dependent on the sampling frequency of the recording system and the duration of stimulus sequence. This dependency likely causes the implementation of suboptimal stimulus sequences, sacrificing the reliability of resulting AEPs. In this paper, we explicate the use of continuous-time stimulus sequence for HSR paradigm, which is independent of the discrete electroencephalogram (EEG) recording system. We employ simulation studies to examine the applicability of the continuous-time stimulus sequences and the impacts of sampling frequency on AEPs in traditional studies using discrete-time design. Results from these studies show that the continuous-time sequences can offer better frequency properties and improve the reliability of recovered AEPs. Furthermore, we find that the errors in the recovered AEPs depend critically on the sampling frequencies of experimental systems, and their relationship can be fitted using a reciprocal function. As such, our study contributes to the literature by demonstrating the applicability and advantages of continuous-time stimulus sequences for HSR paradigm and by revealing the relationship between the reliability of AEPs and sampling frequencies of the experimental systems when discrete-time stimulus sequences are used in traditional manner for the HSR paradigm.
The Design and Analysis of Transposon-Insertion Sequencing Experiments
Chao, Michael C.; Abel, Sören; Davis, Brigid M.; Waldor, Matthew K.
2016-01-01
Preface Transposon-insertion sequencing (TIS) is a powerful approach that can be widely applied to genome-wide definition of loci that are required for growth in diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. Here, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to computational analysis of TIS data. PMID:26775926
Self-sequencing of amino acids and origins of polyfunctional protocells
NASA Technical Reports Server (NTRS)
Fox, S. W.
1984-01-01
The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.
Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert
2010-01-01
The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic clusters in the structure may also be related to the differing functional properties of these response regulators. Comparison with the results of previous experimental and simulation analyses on the homologous CheY argues that prematurely-folded unproductive intermediates are a common property of the βα-repeat motif. PMID:20226790
Prediction of enhancer-promoter interactions via natural language processing.
Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui
2018-05-09
Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga
2015-01-01
Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy. PMID:25860802
Forlano, M D; Teixeira, K R S; Scofield, A; Elisei, C; Yotoko, K S C; Fernandes, K R; Linhares, G F C; Ewing, S A; Massard, C L
2007-04-10
To characterize phylogenetically the species which causes canine hepatozoonosis at two rural areas of Rio de Janeiro State, Brazil, we used universal or Hepatozoon spp. primer sets for the 18S SSU rRNA coding region. DNA extracts were obtained from blood samples of thirteen dogs naturally infected, from four experimentally infected, and from five puppies infected by vertical transmission from a dam, that was experimentally infected. DNA of sporozoites of Hepatozoon americanum was used as positive control. The amplification of DNA extracts from blood of dogs infected with sporozoites of Hepatozoon spp. was observed in the presence of primers to 18S SSU rRNA gene of Hepatozoon spp., whereas DNA of H. americanum sporozoites was amplified in the presence of either universal or Hepatozoon spp.-specific primer sets; the amplified products were approximately 600bp in size. Cloned PCR products obtained from DNA extracts of blood from two dogs experimentally infected with Hepatozoon sp. were sequenced. The consensus sequence, derived from six sequence data sets, were blasted against sequences of 18S SSU rRNA of Hepatozoon spp. available at GenBank and aligned to homologous sequences to perform the phylogenetic analysis. This analysis clearly showed that our sequence clustered, independently of H. americanum sequences, within a group comprising other Hepatozoon canis sequences. Our results confirmed the hypothesis that the agent causing hepatozoonosis in the areas studied in Brazil is H. canis, supporting previous reports that were based on morphological and morphometric analyses.
Experimental Influences in the Accurate Measurement of Cartilage Thickness in MRI.
Wang, Nian; Badar, Farid; Xia, Yang
2018-01-01
Objective To study the experimental influences to the measurement of cartilage thickness by magnetic resonance imaging (MRI). Design The complete thicknesses of healthy and trypsin-degraded cartilage were measured at high-resolution MRI under different conditions, using two intensity-based imaging sequences (ultra-short echo [UTE] and multislice-multiecho [MSME]) and 3 quantitative relaxation imaging sequences (T 1 , T 2 , and T 1 ρ). Other variables included different orientations in the magnet, 2 soaking solutions (saline and phosphate buffered saline [PBS]), and external loading. Results With cartilage soaked in saline, UTE and T 1 methods yielded complete and consistent measurement of cartilage thickness, while the thickness measurement by T 2 , T 1 ρ, and MSME methods were orientation dependent. The effect of external loading on cartilage thickness is also sequence and orientation dependent. All variations in cartilage thickness in MRI could be eliminated with the use of a 100 mM PBS or imaged by UTE sequence. Conclusions The appearance of articular cartilage and the measurement accuracy of cartilage thickness in MRI can be influenced by a number of experimental factors in ex vivo MRI, from the use of various pulse sequences and soaking solutions to the health of the tissue. T 2 -based imaging sequence, both proton-intensity sequence and quantitative relaxation sequence, similarly produced the largest variations. With adequate resolution, the accurate measurement of whole cartilage tissue in clinical MRI could be utilized to detect differences between healthy and osteoarthritic cartilage after compression.
Computational and experimental analysis of DNA shuffling
Maheshri, Narendra; Schaffer, David V.
2003-01-01
We describe a computational model of DNA shuffling based on the thermodynamics and kinetics of this process. The model independently tracks a representative ensemble of DNA molecules and records their states at every stage of a shuffling reaction. These data can subsequently be analyzed to yield information on any relevant metric, including reassembly efficiency, crossover number, type and distribution, and DNA sequence length distributions. The predictive ability of the model was validated by comparison to three independent sets of experimental data, and analysis of the simulation results led to several unique insights into the DNA shuffling process. We examine a tradeoff between crossover frequency and reassembly efficiency and illustrate the effects of experimental parameters on this relationship. Furthermore, we discuss conditions that promote the formation of useless “junk” DNA sequences or multimeric sequences containing multiple copies of the reassembled product. This model will therefore aid in the design of optimal shuffling reaction conditions. PMID:12626764
Advances in high throughput DNA sequence data compression.
Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz
2016-06-01
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
An investigation of Hebbian phase sequences as assembly graphs
Almeida-Filho, Daniel G.; Lopes-dos-Santos, Vitor; Vasconcelos, Nivaldo A. P.; Miranda, José G. V.; Tort, Adriano B. L.; Ribeiro, Sidarta
2014-01-01
Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb's theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb's view that assemblies correspond to primitive building blocks of representation, nearly unchanged in the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition. PMID:24782715
NASA Astrophysics Data System (ADS)
Willans, Mathew J.; Sears, Devin N.; Wasylishen, Roderick E.
2008-03-01
The use of continuous-wave (CW) 1H decoupling has generally provided little improvement in the 13C MAS NMR spectroscopy of paramagnetic organic solids. Recent solid-state 13C NMR studies have demonstrated that at rapid magic-angle spinning rates CW decoupling can result in reductions in signal-to-noise and that 1H decoupling should be omitted when acquiring 13C MAS NMR spectra of paramagnetic solids. However, studies of the effectiveness of modern 1H decoupling sequences are lacking, and the performance of such sequences over a variety of experimental conditions must be investigated before 1H decoupling is discounted altogether. We have studied the performance of several commonly used advanced decoupling pulse sequences, namely the TPPM, SPINAL-64, XiX, and eDROOPY sequences, in 13C MAS NMR experiments performed under four combinations of the magnetic field strength (7.05 or 11.75 T), rotor frequency (15 or 30 kHz), and 1H rf-field strength (71, 100, or 140 kHz). The effectiveness of these sequences has been evaluated by comparing the 13C signal intensity, linewidth at half-height, LWHH, and coherence lifetimes, T2', of the methine carbon of copper(II) bis( DL-alanine) monohydrate, Cu(ala) 2·H 2O, and methylene carbon of copper(II) bis( DL-2-aminobutyrate), Cu(ambut) 2, obtained with the advanced sequences to those obtained without 1H decoupling, with CW decoupling, and for fully deuterium labelled samples. The latter have been used as model compounds with perfect 1H decoupling and provide a measure of the efficiency of the 1H decoupling sequence. Overall, the effectiveness of 1H decoupling depends strongly on the decoupling sequence utilized, the experimental conditions and the sample studied. Of the decoupling sequences studied, the XiX sequence consistently yielded the best results, although any of the advanced decoupling sequences strongly outperformed the CW sequence and provided improvements over no 1H decoupling. Experiments performed at 7.05 T demonstrate that the XiX decoupling sequence is the least sensitive to changes in the 1H transmitter frequency and may explain the superior performance of this decoupling sequence. Overall, the most important factor in the effectiveness of 1H decoupling was the carbon type studied, with the methylene carbon of Cu(ambut) 2 being substantially more sensitive to 1H decoupling than the methine carbon of Cu(ala) 2·H 2O. An analysis of the various broadening mechanisms contributing to 13C linewidths has been performed in order to rationalize the different sensitivities of the two carbon sites under the four experimental conditions.
Schmidt Am Busch, Marcel; Sedano, Audrey; Simonson, Thomas
2010-05-05
Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Chen, DaYang; Zhen, HeFu; Qiu, Yong; Liu, Ping; Zeng, Peng; Xia, Jun; Shi, QianYu; Xie, Lin; Zhu, Zhu; Gao, Ya; Huang, GuoDong; Wang, Jian; Yang, HuanMing; Chen, Fang
2018-03-21
Research based on a strategy of single-cell low-coverage whole genome sequencing (SLWGS) has enabled better reproducibility and accuracy for detection of copy number variations (CNVs). The whole genome amplification (WGA) method and sequencing platform are critical factors for successful SLWGS (<0.1 × coverage). In this study, we compared single cell and multiple cells sequencing data produced by the HiSeq2000 and Ion Proton platforms using two WGA kits and then comprehensively evaluated the GC-bias, reproducibility, uniformity and CNV detection among different experimental combinations. Our analysis demonstrated that the PicoPLEX WGA Kit resulted in higher reproducibility, lower sequencing error frequency but more GC-bias than the GenomePlex Single Cell WGA Kit (WGA4 kit) independent of the cell number on the HiSeq2000 platform. While on the Ion Proton platform, the WGA4 kit (both single cell and multiple cells) had higher uniformity and less GC-bias but lower reproducibility than those of the PicoPLEX WGA Kit. Moreover, on these two sequencing platforms, depending on cell number, the performance of the two WGA kits was different for both sensitivity and specificity on CNV detection. The results can help researchers who plan to use SLWGS on single or multiple cells to select appropriate experimental conditions for their applications.
NASA Technical Reports Server (NTRS)
Dost, Ernest F.; Ilcewicz, Larry B.; Avery, William B.; Coxon, Brian R.
1991-01-01
Residual strength of an impacted composite laminate is dependent on details of the damage state. Stacking sequence was varied to judge its effect on damage caused by low-velocity impact. This was done for quasi-isotropic layups of a toughened composite material. Experimental observations on changes in the impact damage state and postimpact compressive performance were presented for seven different laminate stacking sequences. The applicability and limitations of analysis compared to experimental results were also discussed. Postimpact compressive behavior was found to be a strong function of the laminate stacking sequence. This relationship was found to depend on thickness, stacking sequence, size, and location of sublaminates that comprise the impact damage state. The postimpact strength for specimens with a relatively symmetric distribution of damage through the laminate thickness was accurately predicted by models that accounted for sublaminate stability and in-plane stress redistribution. An asymmetric distribution of damage in some laminate stacking sequences tended to alter specimen stability. Geometrically nonlinear finite element analysis was used to predict this behavior.
Jongsma, Marijtje L A; Gerrits, Niels J H M; van Rijn, Clementina M; Quiroga, Rodrigo Quian; Maes, Joseph H R
2012-07-01
The aim of this study was to track recall performance and event-related potentials (ERPs) across multiple trials in a digit-learning task. When a sequence is practiced by repetition, the number of errors typically decreases and a learning curve emerges. Until now, almost all ERP learning and memory research has focused on effects after a single presentation and, therefore, fails to capture the dynamic changes that characterize a learning process. However, the current study used a free-recall task in which a sequence of ten auditory digits was presented repeatedly. Auditory sequences of ten digits were presented in a logical order (control sequences) or in a random order (experimental sequences). Each sequence was presented six times. Participants had to reproduce the sequence after each presentation. EEG recordings were made at the time of the digit presentations. Recall performance for the control sequences was close to asymptote right after the first learning trial, whereas performance for the experimental sequences initially displayed primacy and recency effects. However, these latter effects gradually disappeared over the six repetitions, resulting in near-asymptotic recall performance for all digits. The performance improvement for the middle items of the list was accompanied by an increase in P300 amplitude, implying a close correspondence between this ERP component and the behavioral data. These results, which were discussed in the framework of theories on the functional significance of the P300 amplitude, add to the scarce empirical data on the dynamics of ERP responses in the process of intentional learning. Copyright © 2011 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Singh, Nirbhay N.; Ahrens, Michael G.
1979-01-01
Results at the end of one year showed that the experimental group mastered an average of 32 objectives while the control group averaged 15.5, suggesting that hierarchically sequenced mathematics curricula may provide an effective approach to the teaching of number concepts to the severely and moderately retarded. (Author/DLS)
NASA Astrophysics Data System (ADS)
Carmack, Gay Lynn Dickinson
2000-10-01
This two-part quasi-experimental repeated measures study examined whether computer simulated experiments have an effect on the problem solving skills of high school biology students in a school-within-a-school magnet program. Specifically, the study identified episodes in a simulation sequence where problem solving skills improved. In the Fall academic semester, experimental group students (n = 30) were exposed to two simulations: CaseIt! and EVOLVE!. Control group students participated in an internet research project and a paper Hardy-Weinberg activity. In the Spring academic semester, experimental group students were exposed to three simulations: Genetics Construction Kit, CaseIt! and EVOLVE! . Spring control group students participated in a Drosophila lab, an internet research project, and Advanced Placement lab 8. Results indicate that the Fall and Spring experimental groups experienced significant gains in scientific problem solving after the second simulation in the sequence. These gains were independent of the simulation sequence or the amount of time spent on the simulations. These gains were significantly greater than control group scores in the Fall. The Spring control group significantly outscored all other study groups on both pretest measures. Even so, the Spring experimental group problem solving performance caught up to the Spring control group performance after the third simulation. There were no significant differences between control and experimental groups on content achievement. Results indicate that CSE is as effective as traditional laboratories in promoting scientific problem solving and that CSE is a useful tool for improving students' scientific problem solving skills. Moreover, retention of problem solving skills is enhanced by utilizing more than one simulation.
Best, Katharine; Oakes, Theres; Heather, James M.; Shawe-Taylor, John; Chain, Benny
2015-01-01
The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology. In combination with High Throughput Sequencing (HTS), PCR is widely used to quantify transcript abundance for RNA-seq, and in the context of analysis of T and B cell receptor repertoires. In this study, we combine DNA barcoding with HTS to quantify PCR output from individual target molecules. We develop computational tools that simulate both the PCR branching process itself, and the subsequent subsampling which typically occurs during HTS sequencing. We explore the influence of different types of heterogeneity on sequencing output, and compare them to experimental results where the efficiency of amplification is measured by barcodes uniquely identifying each molecule of starting template. Our results demonstrate that the PCR process introduces substantial amplification heterogeneity, independent of primer sequence and bulk experimental conditions. This heterogeneity can be attributed both to inherited differences between different template DNA molecules, and the inherent stochasticity of the PCR process. The results demonstrate that PCR heterogeneity arises even when reaction and substrate conditions are kept as constant as possible, and therefore single molecule barcoding is essential in order to derive reproducible quantitative results from any protocol combining PCR with HTS. PMID:26459131
Coarse-grained sequences for protein folding and design.
Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa
2003-09-16
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.
Coarse-grained sequences for protein folding and design
Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa
2003-01-01
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815
Evidence for the principle of minimal frustration in the evolution of protein folding landscapes.
Tzul, Franco O; Vasilchuk, Daniel; Makhatadze, George I
2017-02-28
Theoretical and experimental studies have firmly established that protein folding can be described by a funneled energy landscape. This funneled energy landscape is the result of foldable protein sequences evolving following the principle of minimal frustration, which allows proteins to rapidly fold to their native biologically functional conformations. For a protein family with a given functional fold, the principle of minimal frustration suggests that, independent of sequence, all proteins within this family should fold with similar rates. However, depending on the optimal living temperature of the organism, proteins also need to modulate their thermodynamic stability. Consequently, the difference in thermodynamic stability should be primarily caused by differences in the unfolding rates. To test this hypothesis experimentally, we performed comprehensive thermodynamic and kinetic analyses of 15 different proteins from the thioredoxin family. Eight of these thioredoxins were extant proteins from psychrophilic, mesophilic, or thermophilic organisms. The other seven protein sequences were obtained using ancestral sequence reconstruction and can be dated back over 4 billion years. We found that all studied proteins fold with very similar rates but unfold with rates that differ up to three orders of magnitude. The unfolding rates correlate well with the thermodynamic stability of the proteins. Moreover, proteins that unfold slower are more resistant to proteolysis. These results provide direct experimental support to the principle of minimal frustration hypothesis.
Goren, Moran G; Yosef, Ido; Auster, Oren; Qimron, Udi
2012-10-12
We analyzed sequences of newly inserted repeats in an Escherichia coli CRISPR (clustered regularly interspaced short palindromic repeats) array in vivo and showed that a base previously thought to belong to the repeat is actually derived from a protospacer. Based on further experimental results, we propose to use the term "duplicon" for a repeated sequence in a CRISPR array that serves as a template for a new duplicon. Our findings suggest the possibility of redrawing the borders between repeats, spacers, and protospacer adjacent motifs. Copyright © 2012 Elsevier Ltd. All rights reserved.
Distinguishing computable mixtures of quantum states
NASA Astrophysics Data System (ADS)
Grande, Ignacio H. López; Senno, Gabriel; de la Torre, Gonzalo; Larotonda, Miguel A.; Bendersky, Ariel; Figueira, Santiago; Acín, Antonio
2018-05-01
In this article we extend results from our previous work [Bendersky et al., Phys. Rev. Lett. 116, 230402 (2016), 10.1103/PhysRevLett.116.230402] by providing a protocol to distinguish in finite time and with arbitrarily high success probability any algorithmic mixture of pure states from the maximally mixed state. Moreover, we include an experimental realization, using a modified quantum key distribution setup, where two different random sequences of pure states are prepared; these sequences are indistinguishable according to quantum mechanics, but they become distinguishable when randomness is replaced with pseudorandomness within the experimental preparation process.
Integrated circuit layer image segmentation
NASA Astrophysics Data System (ADS)
Masalskis, Giedrius; Petrauskas, Romas
2010-09-01
In this paper we present IC layer image segmentation techniques which are specifically created for precise metal layer feature extraction. During our research we used many samples of real-life de-processed IC metal layer images which were obtained using optical light microscope. We have created sequence of various image processing filters which provides segmentation results of good enough precision for our application. Filter sequences were fine tuned to provide best possible results depending on properties of IC manufacturing process and imaging technology. Proposed IC image segmentation filter sequences were experimentally tested and compared with conventional direct segmentation algorithms.
Equally parsimonious pathways through an RNA sequence space are not equally likely
NASA Technical Reports Server (NTRS)
Lee, Y. H.; DSouza, L. M.; Fox, G. E.
1997-01-01
An experimental system for determining the potential ability of sequences resembling 5S ribosomal RNA (rRNA) to perform as functional 5S rRNAs in vivo in the Escherichia coli cellular environment was devised previously. Presumably, the only 5S rRNA sequences that would have been fixed by ancestral populations are ones that were functionally valid, and hence the actual historical paths taken through RNA sequence space during 5S rRNA evolution would have most likely utilized valid sequences. Herein, we examine the potential validity of all sequence intermediates along alternative equally parsimonious trajectories through RNA sequence space which connect two pairs of sequences that had previously been shown to behave as valid 5S rRNAs in E. coli. The first trajectory requires a total of four changes. The 14 sequence intermediates provide 24 apparently equally parsimonious paths by which the transition could occur. The second trajectory involves three changes, six intermediate sequences, and six potentially equally parsimonious paths. In total, only eight of the 20 sequence intermediates were found to be clearly invalid. As a consequence of the position of these invalid intermediates in the sequence space, seven of the 30 possible paths consisted of exclusively valid sequences. In several cases, the apparent validity/invalidity of the intermediate sequences could not be anticipated on the basis of current knowledge of the 5S rRNA structure. This suggests that the interdependencies in RNA sequence space may be more complex than currently appreciated. If ancestral sequences predicted by parsimony are to be regarded as actual historical sequences, then the present results would suggest that they should also satisfy a validity requirement and that, in at least limited cases, this conjecture can be tested experimentally.
Sequence verification of synthetic DNA by assembly of sequencing reads
Wilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, João C.; Tyler, Brett M.; Peccoud, Jean
2013-01-01
Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org. PMID:23042248
Rohs, Remo; Sklenar, Heinz
2004-04-01
The results presented in this paper on methylene blue (MB) binding to DNA with AT alternating base sequence complement the data obtained in two former modeling studies of MB binding to GC alternating DNA. In the light of the large amount of experimental data for both systems, this theoretical study is focused on a detailed energetic analysis and comparison in order to understand their different behavior. Since experimental high-resolution structures of the complexes are not available, the analysis is based on energy minimized structural models of the complexes in different binding modes. For both sequences, four different intercalation structures and two models for MB binding in the minor and major groove have been proposed. Solvent electrostatic effects were included in the energetic analysis by using electrostatic continuum theory, and the dependence of MB binding on salt concentration was investigated by solving the non-linear Poisson-Boltzmann equation. We find that the relative stability of the different complexes is similar for the two sequences, in agreement with the interpretation of spectroscopic data. Subtle differences, however, are seen in energy decompositions and can be attributed to the change from symmetric 5'-YpR-3' intercalation to minor groove binding with increasing salt concentration, which is experimentally observed for the AT sequence at lower salt concentration than for the GC sequence. According to our results, this difference is due to the significantly lower non-electrostatic energy for the minor groove complex with AT alternating DNA, whereas the slightly lower binding energy to this sequence is caused by a higher deformation energy of DNA. The energetic data are in agreement with the conclusions derived from different spectroscopic studies and can also be structurally interpreted on the basis of the modeled complexes. The simple static modeling technique and the neglect of entropy terms and of non-electrostatic solute-solvent interactions, which are assumed to be nearly constant for the compared complexes of MB with DNA, seem to be justified by the results.
Tomcho, Jeremy C; Tillman, Magdalena R; Znosko, Brent M
2015-09-01
Predicting the secondary structure of RNA is an intermediate in predicting RNA three-dimensional structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence-independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence-dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. These data suggested energy contributions of dinucleotide bulges were sequence-dependent, and a sequence-dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5'-purine-pyrimidine-3', and 2.41 kcal/mol for 5'-pyrimidine-purine-3'). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a -0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence-dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence-independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
NASA Technical Reports Server (NTRS)
Starbuck, J. Michael; Guerdal, Zafer; Pindera, Marek-Jerzy; Poe, Clarence C.
1990-01-01
Damage states in laminated composites were studied by considering the model problem of a laminated beam subjected to three-point bending. A combination of experimental and theoretical research techniques was used to correlate the experimental results with the analytical stress distributions. The analytical solution procedure was based on the stress formulation approach of the mathematical theory of elasticity. The solution procedure is capable of calculating the ply-level stresses and beam displacements for any laminated beam of finite length using the generalized plane deformation or plane stress state assumption. Prior to conducting the experimental phase, the results from preliminary analyses were examined. Significant effects in the ply-level stress distributions were seen depending on the fiber orientation, aspect ratio, and whether or not a grouped or interspersed stacking sequence was used. The experimental investigation was conducted to determine the different damage modes in laminated three-point bend specimens. The test matrix consisted of three-point bend specimens of 0 deg unidirectional, cross-ply, and quasi-isotropic stacking sequences. The dependence of the damage initiation loads and ultimate failure loads were studied, and their relation to damage susceptibility and damage tolerance of the mean configuration was discussed. Damage modes were identified by visual inspection of the damaged specimens using an optical microscope. The four fundamental damage mechanisms identified were delaminations, matrix cracking, fiber breakage, and crushing. The correlation study between the experimental results and the analytical results were performed for the midspan deflection, indentation, damage modes, and damage susceptibility.
Compression of next-generation sequencing quality scores using memetic algorithm
2014-01-01
Background The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files. PMID:25474747
Resolution enhancement using a new multiple-pulse decoupling sequence for quadrupolar nuclei.
Delevoye, L; Trébosc, J; Gan, Z; Montagne, L; Amoureux, J-P
2007-05-01
A new decoupling composite pulse sequence is proposed to remove the broadening on spin S=1/2 magic-angle spinning (MAS) spectra arising from the scalar coupling with a quadrupolar nucleus I. It is illustrated on the (31)P spectrum of an aluminophosphate, AlPO(4)-14, which is broadened by the presence of (27)Al/(31)P scalar couplings. The multiple-pulse (MP) sequence has the advantage over the continuous wave (CW) irradiation to efficiently annul the scalar dephasing without reintroducing the dipolar interaction. The MP decoupling sequence is first described in a rotor-synchronised version (RS-MP) where one parameter only needs to be adjusted. It clearly avoids the dipolar recoupling in order to achieve a better resolution than using the CW sequence. In a second improved version, the MP sequence is experimentally studied in the vicinity of the perfect rotor-synchronised conditions. The linewidth at half maximum (FWHM) of 65 Hz using (27)Al CW decoupling decreases to 48 Hz with RS-MP decoupling and to 30 Hz with rotor-asynchronised MP (RA-MP) decoupling. The main phenomena are explained using both experimental results and numerical simulations.
A multiple-alignment based primer design algorithm for genetically highly variable DNA targets
2013-01-01
Background Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. Results Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. Conclusions PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples. PMID:23965160
Nanowire-nanopore transistor sensor for DNA detection during translocation
NASA Astrophysics Data System (ADS)
Xie, Ping; Xiong, Qihua; Fang, Ying; Qing, Quan; Lieber, Charles
2011-03-01
Nanopore sequencing, as a promising low cost, high throughput sequencing technique, has been proposed more than a decade ago. Due to the incompatibility between small ionic current signal and fast translocation speed and the technical difficulties on large scale integration of nanopore for direct ionic current sequencing, alternative methods rely on integrated DNA sensors have been proposed, such as using capacitive coupling or tunnelling current etc. But none of them have been experimentally demonstrated yet. Here we show that for the first time an amplified sensor signal has been experimentally recorded from a nanowire-nanopore field effect transistor sensor during DNA translocation. Independent multi-channel recording was also demonstrated for the first time. Our results suggest that the signal is from highly localized potential change caused by DNA translocation in none-balanced buffer condition. Given this method may produce larger signal for smaller nanopores, we hope our experiment can be a starting point for a new generation of nanopore sequencing devices with larger signal, higher bandwidth and large-scale multiplexing capability and finally realize the ultimate goal of low cost high throughput sequencing.
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming
2016-01-01
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine
NASA Astrophysics Data System (ADS)
Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing
2017-09-01
Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.
Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie
2017-07-01
Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Wild Birds Use an Ordering Rule to Decode Novel Call Sequences.
Suzuki, Toshitaka N; Wheatcroft, David; Griesser, Michael
2017-08-07
The generative power of human language depends on grammatical rules, such as word ordering, that allow us to produce and comprehend even novel combinations of words [1-3]. Several species of birds and mammals produce sequences of calls [4-6], and, like words in human sentences, their order may influence receiver responses [7]. However, it is unknown whether animals use call ordering to extract meaning from truly novel sequences. Here, we use a novel experimental approach to test this in a wild bird species, the Japanese tit (Parus minor). Japanese tits are attracted to mobbing a predator when they hear conspecific alert and recruitment calls ordered as alert-recruitment sequences [7]. They also approach in response to recruitment calls of heterospecific individuals in mixed-species flocks [8, 9]. Using experimental playbacks, we assess their responses to artificial sequences in which their own alert calls are combined into different orderings with heterospecific recruitment calls. We find that Japanese tits respond similarly to mixed-species alert-recruitment call sequences and to their own alert-recruitment sequences. Importantly, however, tits rarely respond to mixed-species sequences in which the call order is reversed. Thus, Japanese tits extract a compound meaning from novel call sequences using an ordering rule. These results demonstrate a new parallel between animal communication systems and human language, opening new avenues for exploring the evolution of ordering rules and compositionality in animal vocal sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Chongfu; Qiu, Kun; Zhou, Heng; Ling, Yun; Wang, Yawei; Xu, Bo
2010-03-01
In this paper, the tunable multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) is experimentally demonstrated for the first time. The tunable MOOCS-based optical label is performed by using fiber Bragg grating (FBG)-based optical en/decoders group and optical switches configured by using Field Programmable Gate Array (FPGA), and the optical label is erased by using Semiconductor Optical Amplifier (SOA). Some waveforms of the MOOCS-based optical label, optical packet including the MOOCS-based optical label and the payloads are obtained, the switching control mechanism and the switching matrix are discussed, the bit error rate (BER) performance of this system is also studied. These experimental results show that the tunable MOOCS-OPS scheme is effective.
Image encryption using random sequence generated from generalized information domain
NASA Astrophysics Data System (ADS)
Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu
2016-05-01
A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso
2015-07-01
In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
Protocols for efficient simulations of long-time protein dynamics using coarse-grained CABS model.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2014-01-01
Coarse-grained (CG) modeling is a well-acknowledged simulation approach for getting insight into long-time scale protein folding events at reasonable computational cost. Depending on the design of a CG model, the simulation protocols vary from highly case-specific-requiring user-defined assumptions about the folding scenario-to more sophisticated blind prediction methods for which only a protein sequence is required. Here we describe the framework protocol for the simulations of long-term dynamics of globular proteins, with the use of the CABS CG protein model and sequence data. The simulations can start from a random or a selected (e.g., native) structure. The described protocol has been validated using experimental data for protein folding model systems-the prediction results agreed well with the experimental results.
Canale, Aneth S; Venev, Sergey V; Whitfield, Troy W; Caffrey, Daniel R; Marasco, Wayne A; Schiffer, Celia A; Kowalik, Timothy F; Jensen, Jeffrey D; Finberg, Robert W; Zeldovich, Konstantin B; Wang, Jennifer P; Bolon, Daniel N A
2018-04-13
The fitness effects of synonymous mutations can provide insights into biological and evolutionary mechanisms. We analyzed the experimental fitness effects of all single-nucleotide mutations, including synonymous substitutions, at the beginning of the influenza A virus hemagglutinin (HA) gene. Many synonymous substitutions were deleterious both in bulk competition and for individually isolated clones. Investigating protein and RNA levels of a subset of individually expressed HA variants revealed that multiple biochemical properties contribute to the observed experimental fitness effects. Our results indicate that a structural element in the HA segment viral RNA may influence fitness. Examination of naturally evolved sequences in human hosts indicates a preference for the unfolded state of this structural element compared to that found in swine hosts. Our overall results reveal that synonymous mutations may have greater fitness consequences than indicated by simple models of sequence conservation, and we discuss the implications of this finding for commonly used evolutionary tests and analyses. Copyright © 2018. Published by Elsevier Ltd.
Heuristics for multiobjective multiple sequence alignment.
Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B
2016-07-15
Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.
Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A
2016-10-11
Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Sequencing and functional validation of the JGI Brachypodium distachyon T-DNA collection
USDA-ARS?s Scientific Manuscript database
Brachypodium distachyon is a powerful experimental model for the grasses with a large and growing collection of genomic and experimental resources. We have added to these resources by greatly expanding the number of sequence-indexed T-DNA lines. We sequenced 21,165 T-DNA lines, 15,569 of which were ...
Positive Streptobacillus moniliformis PCR in guinea pigs likely due to Leptotrichia spp.
Boot, Ron; Van de Berg, Lia; Reubsaet, Frans A G; Vlemminx, Maurice J
2008-04-30
Streptobacillus moniliformis is a zoonotic bacterium. We obtained positive S. moniliformis PCR results in oral swab samples from guinea pigs from an experimental colony and the breeding colony of origin. Comparison of the DNA sequence of an amplicon with deposited 16S rDNA sequences revealed that Leptotrichia sp. can be the source of a false positive S. moniliformis PCR outcome.
How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity
NASA Technical Reports Server (NTRS)
Fox, G. E.; Wisotzkey, J. D.; Jurtshuk, P. Jr
1992-01-01
16S rRNA (genes coding for rRNA) sequence comparisons were conducted with the following three psychrophilic strains: Bacillus globisporus W25T (T = type strain) and Bacillus psychrophilus W16AT, and W5. These strains exhibited more than 99.5% sequence identity and within experimental uncertainty could be regarded as identical. Their close taxonomic relationship was further documented by phenotypic similarities. In contrast, previously published DNA-DNA hybridization results have convincingly established that these strains do not belong to the same species if current standards are used. These results emphasize the important point that effective identity of 16S rRNA sequences is not necessarily a sufficient criterion to guarantee species identity. Thus, although 16S rRNA sequences can be used routinely to distinguish and establish relationships between genera and well-resolved species, very recently diverged species may not be recognizable.
Hiding message into DNA sequence through DNA coding and chaotic maps.
Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman
2014-09-01
The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
Object tracking using plenoptic image sequences
NASA Astrophysics Data System (ADS)
Kim, Jae Woo; Bae, Seong-Joon; Park, Seongjin; Kim, Do Hyung
2017-05-01
Object tracking is a very important problem in computer vision research. Among the difficulties of object tracking, partial occlusion problem is one of the most serious and challenging problems. To address the problem, we proposed novel approaches to object tracking on plenoptic image sequences. Our approaches take advantage of the refocusing capability that plenoptic images provide. Our approaches input the sequences of focal stacks constructed from plenoptic image sequences. The proposed image selection algorithms select the sequence of optimal images that can maximize the tracking accuracy from the sequence of focal stacks. Focus measure approach and confidence measure approach were proposed for image selection and both of the approaches were validated by the experiments using thirteen plenoptic image sequences that include heavily occluded target objects. The experimental results showed that the proposed approaches were satisfactory comparing to the conventional 2D object tracking algorithms.
Effect of sequence-dependent rigidity on plectoneme localization in dsDNA
NASA Astrophysics Data System (ADS)
Medalion, Shlomi; Rabin, Yitzhak
2016-04-01
We use Monte-Carlo simulations to study the effect of variable rigidity on plectoneme formation and localization in supercoiled double-stranded DNA. We show that the presence of soft sequences increases the number of plectoneme branches and that the edges of the branches tend to be localized at these sequences. We propose an experimental approach to test our results in vitro, and discuss the possible role played by plectoneme localization in the search process of transcription factors for their targets (promoter regions) on the bacterial genome.
SeqCompress: an algorithm for biological sequence compression.
Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz; Bajwa, Hassan
2014-10-01
The growth of Next Generation Sequencing technologies presents significant research challenges, specifically to design bioinformatics tools that handle massive amount of data efficiently. Biological sequence data storage cost has become a noticeable proportion of total cost in the generation and analysis. Particularly increase in DNA sequencing rate is significantly outstripping the rate of increase in disk storage capacity, which may go beyond the limit of storage capacity. It is essential to develop algorithms that handle large data sets via better memory management. This article presents a DNA sequence compression algorithm SeqCompress that copes with the space complexity of biological sequences. The algorithm is based on lossless data compression and uses statistical model as well as arithmetic coding to compress DNA sequences. The proposed algorithm is compared with recent specialized compression tools for biological sequences. Experimental results show that proposed algorithm has better compression gain as compared to other existing algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.
A Statistical Guide to the Design of Deep Mutational Scanning Experiments
Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia
2016-01-01
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710
An improved stochastic fractal search algorithm for 3D protein structure prediction.
Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun
2018-05-03
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Automated design of genetic toggle switches with predetermined bistability.
Chen, Shuobing; Zhang, Haoqian; Shi, Handuo; Ji, Weiyue; Feng, Jingchen; Gong, Yan; Yang, Zhenglin; Ouyang, Qi
2012-07-20
Synthetic biology aims to rationally construct biological devices with required functionalities. Methods that automate the design of genetic devices without post-hoc adjustment are therefore highly desired. Here we provide a method to predictably design genetic toggle switches with predetermined bistability. To accomplish this task, a biophysical model that links ribosome binding site (RBS) DNA sequence to toggle switch bistability was first developed by integrating a stochastic model with RBS design method. Then, to parametrize the model, a library of genetic toggle switch mutants was experimentally built, followed by establishing the equivalence between RBS DNA sequences and switch bistability. To test this equivalence, RBS nucleotide sequences for different specified bistabilities were in silico designed and experimentally verified. Results show that the deciphered equivalence is highly predictive for the toggle switch design with predetermined bistability. This method can be generalized to quantitative design of other probabilistic genetic devices in synthetic biology.
Sharma, Neeraj; Sosnay, Patrick R.; Ramalho, Anabela S.; Douville, Christopher; Franca, Arianna; Gottschalk, Laura B.; Park, Jeenah; Lee, Melissa; Vecchio-Pagan, Briana; Raraigh, Karen S.; Amaral, Margarida D.; Karchin, Rachel; Cutting, Garry R.
2015-01-01
Assessment of the functional consequences of variants near splice sites is a major challenge in the diagnostic laboratory. To address this issue, we created expression minigenes (EMGs) to determine the RNA and protein products generated by splice site variants (n = 10) implicated in cystic fibrosis (CF). Experimental results were compared with the splicing predictions of eight in silico tools. EMGs containing the full-length Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) coding sequence and flanking intron sequences generated wild-type transcript and fully processed protein in Human Embryonic Kidney (HEK293) and CF bronchial epithelial (CFBE41o-) cells. Quantification of variant induced aberrant mRNA isoforms was concordant using fragment analysis and pyrosequencing. The splicing patterns of c.1585−1G>A and c.2657+5G>A were comparable to those reported in primary cells from individuals bearing these variants. Bioinformatics predictions were consistent with experimental results for 9/10 variants (MES), 8/10 variants (NNSplice), and 7/10 variants (SSAT and Sroogle). Programs that estimate the consequences of mis-splicing predicted 11/16 (HSF and ASSEDA) and 10/16 (Fsplice and SplicePort) experimentally observed mRNA isoforms. EMGs provide a robust experimental approach for clinical interpretation of splice site variants and refinement of in silico tools. PMID:25066652
Pitre, S; North, C; Alamgir, M; Jessulat, M; Chan, A; Luo, X; Green, J R; Dumontier, M; Dehne, F; Golshani, A
2008-08-01
Protein-protein interaction (PPI) maps provide insight into cellular biology and have received considerable attention in the post-genomic era. While large-scale experimental approaches have generated large collections of experimentally determined PPIs, technical limitations preclude certain PPIs from detection. Recently, we demonstrated that yeast PPIs can be computationally predicted using re-occurring short polypeptide sequences between known interacting protein pairs. However, the computational requirements and low specificity made this method unsuitable for large-scale investigations. Here, we report an improved approach, which exhibits a specificity of approximately 99.95% and executes 16,000 times faster. Importantly, we report the first all-to-all sequence-based computational screen of PPIs in yeast, Saccharomyces cerevisiae in which we identify 29,589 high confidence interactions of approximately 2 x 10(7) possible pairs. Of these, 14,438 PPIs have not been previously reported and may represent novel interactions. In particular, these results reveal a richer set of membrane protein interactions, not readily amenable to experimental investigations. From the novel PPIs, a novel putative protein complex comprised largely of membrane proteins was revealed. In addition, two novel gene functions were predicted and experimentally confirmed to affect the efficiency of non-homologous end-joining, providing further support for the usefulness of the identified PPIs in biological investigations.
A cost effective 5΄ selective single cell transcriptome profiling approach with improved UMI design
Arguel, Marie-Jeanne; LeBrigand, Kevin; Paquet, Agnès; Ruiz García, Sandra; Zaragosi, Laure-Emmanuelle; Waldmann, Rainer
2017-01-01
Abstract Single cell RNA sequencing approaches are instrumental in studies of cell-to-cell variability. 5΄ selective transcriptome profiling approaches allow simultaneous definition of the transcription start size and have advantages over 3΄ selective approaches which just provide internal sequences close to the 3΄ end. The only currently existing 5΄ selective approach requires costly and labor intensive fragmentation and cell barcoding after cDNA amplification. We developed an optimized 5΄ selective workflow where all the cell indexing is done prior to fragmentation. With our protocol, cell indexing can be performed in the Fluidigm C1 microfluidic device, resulting in a significant reduction of cost and labor. We also designed optimized unique molecular identifiers that show less sequence bias and vulnerability towards sequencing errors resulting in an improved accuracy of molecule counting. We provide comprehensive experimental workflows for Illumina and Ion Proton sequencers that allow single cell sequencing in a cost range comparable to qPCR assays. PMID:27940562
Principles of Quantitative MR Imaging with Illustrated Review of Applicable Modular Pulse Diagrams.
Mills, Andrew F; Sakai, Osamu; Anderson, Stephan W; Jara, Hernan
2017-01-01
Continued improvements in diagnostic accuracy using magnetic resonance (MR) imaging will require development of methods for tissue analysis that complement traditional qualitative MR imaging studies. Quantitative MR imaging is based on measurement and interpretation of tissue-specific parameters independent of experimental design, compared with qualitative MR imaging, which relies on interpretation of tissue contrast that results from experimental pulse sequence parameters. Quantitative MR imaging represents a natural next step in the evolution of MR imaging practice, since quantitative MR imaging data can be acquired using currently available qualitative imaging pulse sequences without modifications to imaging equipment. The article presents a review of the basic physical concepts used in MR imaging and how quantitative MR imaging is distinct from qualitative MR imaging. Subsequently, the article reviews the hierarchical organization of major applicable pulse sequences used in this article, with the sequences organized into conventional, hybrid, and multispectral sequences capable of calculating the main tissue parameters of T1, T2, and proton density. While this new concept offers the potential for improved diagnostic accuracy and workflow, awareness of this extension to qualitative imaging is generally low. This article reviews the basic physical concepts in MR imaging, describes commonly measured tissue parameters in quantitative MR imaging, and presents the major available pulse sequences used for quantitative MR imaging, with a focus on the hierarchical organization of these sequences. © RSNA, 2017.
RiboMaker: computational design of conformation-based riboregulation.
Rodrigo, Guillermo; Jaramillo, Alfonso
2014-09-01
The ability to engineer control systems of gene expression is instrumental for synthetic biology. Thus, bioinformatic methods that assist such engineering are appealing because they can guide the sequence design and prevent costly experimental screening. In particular, RNA is an ideal substrate to de novo design regulators of protein expression by following sequence-to-function models. We have implemented a novel algorithm, RiboMaker, aimed at the computational, automated design of bacterial riboregulation. RiboMaker reads the sequence and structure specifications, which codify for a gene regulatory behaviour, and optimizes the sequences of a small regulatory RNA and a 5'-untranslated region for an efficient intermolecular interaction. To this end, it implements an evolutionary design strategy, where random mutations are selected according to a physicochemical model based on free energies. The resulting sequences can then be tested experimentally, providing a new tool for synthetic biology, and also for investigating the riboregulation principles in natural systems. Web server is available at http://ribomaker.jaramillolab.org/. Source code, instructions and examples are freely available for download at http://sourceforge.net/projects/ribomaker/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
The CRISP theory of hippocampal function in episodic memory
Cheng, Sen
2013-01-01
Over the past four decades, a “standard framework” has emerged to explain the neural mechanisms of episodic memory storage. This framework has been instrumental in driving hippocampal research forward and now dominates the design and interpretation of experimental and theoretical studies. It postulates that cortical inputs drive plasticity in the recurrent cornu ammonis 3 (CA3) synapses to rapidly imprint memories as attractor states in CA3. Here we review a range of experimental studies and argue that the evidence against the standard framework is mounting, notwithstanding the considerable evidence in its support. We propose CRISP as an alternative theory to the standard framework. CRISP is based on Context Reset by dentate gyrus (DG), Intrinsic Sequences in CA3, and Pattern completion in cornu ammonis 1 (CA1). Compared to previous models, CRISP uses a radically different mechanism for storing episodic memories in the hippocampus. Neural sequences are intrinsic to CA3, and inputs are mapped onto these intrinsic sequences through synaptic plasticity in the feedforward projections of the hippocampus. Hence, CRISP does not require plasticity in the recurrent CA3 synapses during the storage process. Like in other theories DG and CA1 play supporting roles, however, their function in CRISP have distinct implications. For instance, CA1 performs pattern completion in the absence of CA3 and DG contributes to episodic memory retrieval, increasing the speed, precision, and robustness of retrieval. We propose the conceptual theory, discuss its implications for experimental results and suggest testable predictions. It appears that CRISP not only accounts for those experimental results that are consistent with the standard framework, but also for results that are at odds with the standard framework. We therefore suggest that CRISP is a viable, and perhaps superior, theory for the hippocampal function in episodic memory. PMID:23653597
3D knee segmentation based on three MRI sequences from different planes.
Zhou, L; Chav, R; Cresson, T; Chartrand, G; de Guise, J
2016-08-01
In clinical practice, knee MRI sequences with 3.5~5 mm slice distance in sagittal, coronal, and axial planes are often requested for the knee examination since its acquisition is faster than high-resolution MRI sequence in a single plane, thereby reducing the probability of motion artifact. In order to take advantage of the three sequences from different planes, a 3D segmentation method based on the combination of three knee models obtained from the three sequences is proposed in this paper. In the method, the sub-segmentation is respectively performed with sagittal, coronal, and axial MRI sequence in the image coordinate system. With each sequence, an initial knee model is hierarchically deformed, and then the three deformed models are mapped to reference coordinate system defined by the DICOM standard and combined to obtain a patient-specific model. The experimental results verified that the three sub-segmentation results can complement each other, and their integration can compensate for the insufficiency of boundary information caused by 3.5~5 mm gap between consecutive slices. Therefore, the obtained patient-specific model is substantially more accurate than each sub-segmentation results.
Reproducibility and quantitation of amplicon sequencing-based detection
Zhou, Jizhong; Wu, Liyou; Deng, Ye; Zhi, Xiaoyang; Jiang, Yi-Huei; Tu, Qichao; Xie, Jianping; Van Nostrand, Joy D; He, Zhili; Yang, Yunfeng
2011-01-01
To determine the reproducibility and quantitation of the amplicon sequencing-based detection approach for analyzing microbial community structure, a total of 24 microbial communities from a long-term global change experimental site were examined. Genomic DNA obtained from each community was used to amplify 16S rRNA genes with two or three barcode tags as technical replicates in the presence of a small quantity (0.1% wt/wt) of genomic DNA from Shewanella oneidensis MR-1 as the control. The technical reproducibility of the amplicon sequencing-based detection approach is quite low, with an average operational taxonomic unit (OTU) overlap of 17.2%±2.3% between two technical replicates, and 8.2%±2.3% among three technical replicates, which is most likely due to problems associated with random sampling processes. Such variations in technical replicates could have substantial effects on estimating β-diversity but less on α-diversity. A high variation was also observed in the control across different samples (for example, 66.7-fold for the forward primer), suggesting that the amplicon sequencing-based detection approach could not be quantitative. In addition, various strategies were examined to improve the comparability of amplicon sequencing data, such as increasing biological replicates, and removing singleton sequences and less-representative OTUs across biological replicates. Finally, as expected, various statistical analyses with preprocessed experimental data revealed clear differences in the composition and structure of microbial communities between warming and non-warming, or between clipping and non-clipping. Taken together, these results suggest that amplicon sequencing-based detection is useful in analyzing microbial community structure even though it is not reproducible and quantitative. However, great caution should be taken in experimental design and data interpretation when the amplicon sequencing-based detection approach is used for quantitative analysis of the β-diversity of microbial communities. PMID:21346791
RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis.
Williams, Alexander G; Thomas, Sean; Wyman, Stacia K; Holloway, Alisha K
2014-10-01
RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development. Copyright © 2014 John Wiley & Sons, Inc.
Tomasso, Maria E.; Tarver, Micheal J.; Devarajan, Deepa; Whitten, Steven T.
2016-01-01
The properties of disordered proteins are thought to depend on intrinsic conformational propensities for polyproline II (PP II) structure. While intrinsic PP II propensities have been measured for the common biological amino acids in short peptides, the ability of these experimentally determined propensities to quantitatively reproduce structural behavior in intrinsically disordered proteins (IDPs) has not been established. Presented here are results from molecular simulations of disordered proteins showing that the hydrodynamic radius (R h) can be predicted from experimental PP II propensities with good agreement, even when charge-based considerations are omitted. The simulations demonstrate that R h and chain propensity for PP II structure are linked via a simple power-law scaling relationship, which was tested using the experimental R h of 22 IDPs covering a wide range of peptide lengths, net charge, and sequence composition. Charge effects on R h were found to be generally weak when compared to PP II effects on R h. Results from this study indicate that the hydrodynamic dimensions of IDPs are evidence of considerable sequence-dependent backbone propensities for PP II structure that qualitatively, if not quantitatively, match conformational propensities measured in peptides. PMID:26727467
Phenomenological study of decoherence in solid-state spin qubits due to nuclear spin diffusion
NASA Astrophysics Data System (ADS)
Biercuk, Michael J.; Bluhm, Hendrik
2011-06-01
We present a study of the prospects for coherence preservation in solid-state spin qubits using dynamical decoupling protocols. Recent experiments have provided the first demonstrations of multipulse dynamical decoupling sequences in this qubit system, but quantitative analyses of potential coherence improvements have been hampered by a lack of concrete knowledge of the relevant noise processes. We present calculations of qubit coherence under the application of arbitrary dynamical decoupling pulse sequences based on an experimentally validated semiclassical model. This phenomenological approach bundles the details of underlying noise processes into a single experimentally relevant noise power spectral density. Our results show that the dominant features of experimental measurements in a two-electron singlet-triplet spin qubit can be replicated using a 1/ω2 noise power spectrum associated with nuclear spin flips in the host material. Beginning with this validation, we address the effects of nuclear programming, high-frequency nuclear spin dynamics, and other high-frequency classical noise sources, with conjectures supported by physical arguments and microscopic calculations where relevant. Our results provide expected performance bounds and identify diagnostic metrics that can be measured experimentally in order to better elucidate the underlying nuclear spin dynamics.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528
Thermodynamic characterization of tandem mismatches found in naturally occurring RNA
Christiansen, Martha E.; Znosko, Brent M.
2009-01-01
Although all sequence symmetric tandem mismatches and some sequence asymmetric tandem mismatches have been thermodynamically characterized and a model has been proposed to predict the stability of previously unmeasured sequence asymmetric tandem mismatches [Christiansen,M.E. and Znosko,B.M. (2008) Biochemistry, 47, 4329–4336], experimental thermodynamic data for frequently occurring tandem mismatches is lacking. Since experimental data is preferred over a predictive model, the thermodynamic parameters for 25 frequently occurring tandem mismatches were determined. These new experimental values, on average, are 1.0 kcal/mol different from the values predicted for these mismatches using the previous model. The data for the sequence asymmetric tandem mismatches reported here were then combined with the data for 72 sequence asymmetric tandem mismatches that were published previously, and the parameters used to predict the thermodynamics of previously unmeasured sequence asymmetric tandem mismatches were updated. The average absolute difference between the measured values and the values predicted using these updated parameters is 0.5 kcal/mol. This updated model improves the prediction for tandem mismatches that were predicted rather poorly by the previous model. This new experimental data and updated predictive model allow for more accurate calculations of the free energy of RNA duplexes containing tandem mismatches, and, furthermore, should allow for improved prediction of secondary structure from sequence. PMID:19509311
Li, Fan; Li, Xinying; Yu, Jianjun; Chen, Lin
2014-09-22
We experimentally demonstrated the transmission of 79.86-Gb/s discrete-Fourier-transform spread 32 QAM discrete multi-tone (DFT-spread 32 QAM-DMT) signal over 20-km standard single-mode fiber (SSMF) utilizing directly modulated laser (DML). The experimental results show DFT-spread effectively reduces Peak-to-Average Power Ratio (PAPR) of DMT signal, and also well overcomes narrowband interference and high frequencies power attenuation. We compared different types of training sequence (TS) symbols and found that the optimized TS for channel estimation is the symbol with digital BPSK/QPSK modulation format due to its best performance against optical link noise during channel estimation.
Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M
2012-09-17
RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
DNA translocation across protein channels: How does a polymer worm through a hole?
NASA Astrophysics Data System (ADS)
Muthukumar, M.
2001-03-01
Free energy barriers control the translocation of polymers through narrow channels. Based on an analogy with the classical nucleation and growth process, we have calculated the translocation time and its dependencies on the length, stiffness, and sequence of the polymer, solution conditions, and the strength of the driving electrochemical potential gradient. Our predictions will be compared with experimental results and prospects of reading polymer sequences.
Denoising Algorithm for CFA Image Sensors Considering Inter-Channel Correlation.
Lee, Min Seok; Park, Sang Wook; Kang, Moon Gi
2017-05-28
In this paper, a spatio-spectral-temporal filter considering an inter-channel correlation is proposed for the denoising of a color filter array (CFA) sequence acquired by CCD/CMOS image sensors. Owing to the alternating under-sampled grid of the CFA pattern, the inter-channel correlation must be considered in the direct denoising process. The proposed filter is applied in the spatial, spectral, and temporal domain, considering the spatio-tempo-spectral correlation. First, nonlocal means (NLM) spatial filtering with patch-based difference (PBD) refinement is performed by considering both the intra-channel correlation and inter-channel correlation to overcome the spatial resolution degradation occurring with the alternating under-sampled pattern. Second, a motion-compensated temporal filter that employs inter-channel correlated motion estimation and compensation is proposed to remove the noise in the temporal domain. Then, a motion adaptive detection value controls the ratio of the spatial filter and the temporal filter. The denoised CFA sequence can thus be obtained without motion artifacts. Experimental results for both simulated and real CFA sequences are presented with visual and numerical comparisons to several state-of-the-art denoising methods combined with a demosaicing method. Experimental results confirmed that the proposed frameworks outperformed the other techniques in terms of the objective criteria and subjective visual perception in CFA sequences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akileswaran, L.; Brock, B.J.; Cereghino, J.L.
1999-02-01
A cDNA clone encoding a quinone reductase (QR) from the white rot basidiomycete Phanerochaete chrysosporium was isolated and sequenced. The cDNA consisted of 1,007 nucleotides and a poly(A) tail and encoded a deduced protein containing 271 amino acids. The experimentally determined eight-amino-acid N-germinal sequence of the purified QR protein from P. chrysosporium matched amino acids 72 to 79 of the predicted translation product of the cDNA. The M{sub r} of the predicted translation product, beginning with Pro-72, was essentially identical to the experimentally determined M{sub r} of one monomer of the QR dimer, and this finding suggested that QR ismore » synthesized as a proenzyme. The results of in vitro transcription-translation experiments suggested that QR is synthesized as a proenzyme with a 71-amino-acid leader sequence. This leader sequence contains two potential KEX2 cleavage sites and numerous potential cleavage sites for dipeptidyl aminopeptidase. The QR activity in cultures of P. chrysosporium increased following the addition of 2-dimethoxybenzoquinone, vanillic acid, or several other aromatic compounds. An immunoblot analysis indicated that induction resulted in an increase in the amount of QR protein, and a Northern blot analysis indicated that this regulation occurs at the level of the qr mRNA.« less
Sources of PCR-induced distortions in high-throughput sequencing data sets
Kebschull, Justus M.; Zador, Anthony M.
2015-01-01
PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991
Semiempirical Theories of the Affinities of Negative Atomic Ions
NASA Technical Reports Server (NTRS)
Edie, John W.
1961-01-01
The determination of the electron affinities of negative atomic ions by means of direct experimental investigation is limited. To supplement the meager experimental results, several semiempirical theories have been advanced. One commonly used technique involves extrapolating the electron affinities along the isoelectronic sequences, The most recent of these extrapolations Is studied by extending the method to Include one more member of the isoelectronic sequence, When the results show that this extension does not increase the accuracy of the calculations, several possible explanations for this situation are explored. A different approach to the problem is suggested by the regularities appearing in the electron affinities. Noting that the regular linear pattern that exists for the ionization potentials of the p electrons as a function of Z, repeats itself for different degrees of ionization q, the slopes and intercepts of these curves are extrapolated to the case of the negative Ion. The method is placed on a theoretical basis by calculating the Slater parameters as functions of q and n, the number of equivalent p-electrons. These functions are no more than quadratic in q and n. The electron affinities are calculated by extending the linear relations that exist for the neutral atoms and positive ions to the negative ions. The extrapolated. slopes are apparently correct, but the intercepts must be slightly altered to agree with experiment. For this purpose one or two experimental affinities (depending on the extrapolation method) are used in each of the two short periods. The two extrapolation methods used are: (A) an isoelectronic sequence extrapolation of the linear pattern as such; (B) the same extrapolation of a linearization of this pattern (configuration centers) combined with an extrapolation of the other terms of the ground configurations. The latter method Is preferable, since it requires only experimental point for each period. The results agree within experimental error with all data, except with the most recent value of C, which lies 10% lower.
An In Vivo Study of Self-Regulated Study Sequencing in Introductory Psychology Courses
de Leeuw, Joshua R.; Motz, Benjamin A.; Goldstone, Robert L.
2016-01-01
Study sequence can have a profound influence on learning. In this study we investigated how students decide to sequence their study in a naturalistic context and whether their choices result in improved learning. In the study reported here, 2061 undergraduate students enrolled in an Introductory Psychology course completed an online homework tutorial on measures of central tendency, a topic relevant to an exam that counted towards their grades. One group of students was enabled to choose their own study sequence during the tutorial (Self-Regulated group), while the other group of students studied the same materials in sequences chosen by other students (Yoked group). Students who chose their sequence of study showed a clear tendency to block their study by concept, and this tendency was positively associated with subsequent exam performance. In the Yoked group, study sequence had no effect on exam performance. These results suggest that despite findings that blocked study is maladaptive when assigned by an experimenter, it may actually be adaptive when chosen by the learner in a naturalistic context. PMID:27003164
An In Vivo Study of Self-Regulated Study Sequencing in Introductory Psychology Courses.
Carvalho, Paulo F; Braithwaite, David W; de Leeuw, Joshua R; Motz, Benjamin A; Goldstone, Robert L
2016-01-01
Study sequence can have a profound influence on learning. In this study we investigated how students decide to sequence their study in a naturalistic context and whether their choices result in improved learning. In the study reported here, 2061 undergraduate students enrolled in an Introductory Psychology course completed an online homework tutorial on measures of central tendency, a topic relevant to an exam that counted towards their grades. One group of students was enabled to choose their own study sequence during the tutorial (Self-Regulated group), while the other group of students studied the same materials in sequences chosen by other students (Yoked group). Students who chose their sequence of study showed a clear tendency to block their study by concept, and this tendency was positively associated with subsequent exam performance. In the Yoked group, study sequence had no effect on exam performance. These results suggest that despite findings that blocked study is maladaptive when assigned by an experimenter, it may actually be adaptive when chosen by the learner in a naturalistic context.
Song, Junfang; Duc, Céline; Storey, Kate G.; McLean, W. H. Irwin; Brown, Sara J.; Simpson, Gordon G.; Barton, Geoffrey J.
2014-01-01
The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3′ untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3′ polyadenylation sites to within +/− 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3′ UTR re-annotation (including extension of one 3′ UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data. PMID:24722185
Martino, Amanda J.; Rhodes, Matthew E.; Biddle, Jennifer F.; Brandt, Leah D.; Tomsho, Lynn P.; House, Christopher H.
2011-01-01
A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. While optimized here for use with Roche 454 technology, the general framework presented may be applicable to other next generation sequencing systems as well (e.g., Illumina, Ion Torrent). The method, which we have called random amplification metagenomic PCR (RAMP), involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3′ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10×. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin), and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa identified illustrates well the generally accepted view that community analysis is sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low-biomass samples. PMID:22319519
Speciation and Neutral Molecular Evolution in One-Dimensional Closed Population
NASA Astrophysics Data System (ADS)
Semovski, Sergei V.; Bukin, Yuri S.; Sherbakov, Dmitry Yu.
Models are presented suitable for a description of speciation processes arising due to reproductive isolation depending on genetic distance. The main attention is paid to the model of a one-dimensional closed population, which describes the evolution of littoral benthic organisms. In order to correspond the modeling results to the results obtained in the course of experimental phylogenetic studies, all individual-based models described here involve neutrally evolving and maternally inherited DNA sequence. Sub-samples of the resulting sequences were used for a posteriori phylogenetic inferences which then were compared to the "true" evolutionary histories.
Kowalsky, Caitlin A; Whitehead, Timothy A
2016-12-01
The comprehensive sequence determinants of binding affinity for type I cohesin toward dockerin from Clostridium thermocellum and Clostridium cellulolyticum was evaluated using deep mutational scanning coupled to yeast surface display. We measured the relative binding affinity to dockerin for 2970 and 2778 single point mutants of C. thermocellum and C. cellulolyticum, respectively, representing over 96% of all possible single point mutants. The interface ΔΔG for each variant was reconstructed from sequencing counts and compared with the three independent experimental methods. This reconstruction results in a narrow dynamic range of -0.8-0.5 kcal/mol. The computational software packages FoldX and Rosetta were used to predict mutations that disrupt binding by more than 0.4 kcal/mol. The area under the curve of receiver operator curves was 0.82 for FoldX and 0.77 for Rosetta, showing reasonable agreements between predictions and experimental results. Destabilizing mutations to core and rim positions were predicted with higher accuracy than support positions. This benchmark dataset may be useful for developing new computational prediction tools for the prediction of the mutational effect on binding affinities for protein-protein interactions. Experimental considerations to improve precision and range of the reconstruction method are discussed. Proteins 2016; 84:1914-1928. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Flexible theta sequence compression mediated via phase precessing interneurons
Chadwick, Angus; van Rossum, Mark CW; Nolan, Matthew F
2016-01-01
Encoding of behavioral episodes as spike sequences during hippocampal theta oscillations provides a neural substrate for computations on events extended across time and space. However, the mechanisms underlying the numerous and diverse experimentally observed properties of theta sequences remain poorly understood. Here we account for theta sequences using a novel model constrained by the septo-hippocampal circuitry. We show that when spontaneously active interneurons integrate spatial signals and theta frequency pacemaker inputs, they generate phase precessing action potentials that can coordinate theta sequences in place cell populations. We reveal novel constraints on sequence generation, predict cellular properties and neural dynamics that characterize sequence compression, identify circuit organization principles for high capacity sequential representation, and show that theta sequences can be used as substrates for association of conditioned stimuli with recent and upcoming events. Our results suggest mechanisms for flexible sequence compression that are suited to associative learning across an animal’s lifespan. DOI: http://dx.doi.org/10.7554/eLife.20349.001 PMID:27929374
2016-09-09
evaluating 18 mutants using either the A or B conformer is only r = ~ 0.2. Given the poor performance of approximating the observed experimental ...1 Sequence Tolerance of a Highly Stable Single Domain Antibody: Comparison of Computational and Experimental Profiles Mark A. Olson,1 Patricia...unusually high thermal stability is explored by a combined computational and experimental study. Starting with the crystallographic structure
Dridi, M; Rosseel, T; Orton, R; Johnson, P; Lecollinet, S; Muylkens, B; Lambrecht, B; Van Borm, S
2015-10-01
West Nile virus (WNV) occurs as a population of genetic variants (quasispecies) infecting a single animal. Previous low-resolution viral genetic diversity estimates in sampled wild birds and mosquitoes, and in multiple-passage adaptation studies in vivo or in cell culture, suggest that WNV genetic diversification is mostly limited to the mosquito vector. This study investigated genetic diversification of WNV in avian hosts during a single passage using next-generation sequencing. Wild-captured carrion crows were subcutaneously infected using a clonal Middle-East WNV. Blood samples were collected 2 and 4 days post-infection. A reverse-transcription (RT)-PCR approach was used to amplify the WNV genome directly from serum samples prior to next-generation sequencing resulting in an average depth of at least 700 × in each sample. Appropriate controls were sequenced to discriminate biologically relevant low-frequency variants from experimentally introduced errors. The WNV populations in the wild crows showed significant diversification away from the inoculum virus quasispecies structure. By contrast, WNV populations in intracerebrally infected day-old chickens did not diversify from that of the inoculum. Where previous studies concluded that WNV genetic diversification is only experimentally demonstrated in its permissive insect vector species, we have experimentally shown significant diversification of WNV populations in a wild bird reservoir species.
Adaptive compressive learning for prediction of protein-protein interactions from primary sequence.
Zhang, Ya-Nan; Pan, Xiao-Yong; Huang, Yan; Shen, Hong-Bin
2011-08-21
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems. Copyright © 2011 Elsevier Ltd. All rights reserved.
Guidance on Nanomaterial Hazards and Risks
2015-05-21
and at room temperature and 37 C°– solid separation by centrifugation, filtration , or chemical techniques (more experimental techniques combining...members in this potency sequence using bolus in vivo testing, verify the bolus results with selective inhalation testing. The potency of members of...measures in in vitro and limited in vivo experimental systems would facilitate the characterization of dose-response relationships across a set of ENMs
Iterative cross section sequence graph for handwritten character segmentation.
Dawoud, Amer
2007-08-01
The iterative cross section sequence graph (ICSSG) is an algorithm for handwritten character segmentation. It expands the cross section sequence graph concept by applying it iteratively at equally spaced thresholds. The iterative thresholding reduces the effect of information loss associated with image binarization. ICSSG preserves the characters' skeletal structure by preventing the interference of pixels that causes flooding of adjacent characters' segments. Improving the structural quality of the characters' skeleton facilitates better feature extraction and classification, which improves the overall performance of optical character recognition (OCR). Experimental results showed significant improvements in OCR recognition rates compared to other well-established segmentation algorithms.
A Statistical Guide to the Design of Deep Mutational Scanning Experiments.
Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia
2016-09-01
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.
SubCellProt: predicting protein subcellular localization using machine learning approaches.
Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan
2009-01-01
High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
A long-term target detection approach in infrared image sequence
NASA Astrophysics Data System (ADS)
Li, Hang; Zhang, Qi; Wang, Xin; Hu, Chao
2016-10-01
An automatic target detection method used in long term infrared (IR) image sequence from a moving platform is proposed. Firstly, based on POME(the principle of maximum entropy), target candidates are iteratively segmented. Then the real target is captured via two different selection approaches. At the beginning of image sequence, the genuine target with litter texture is discriminated from other candidates by using contrast-based confidence measure. On the other hand, when the target becomes larger, we apply online EM method to estimate and update the distributions of target's size and position based on the prior detection results, and then recognize the genuine one which satisfies both the constraints of size and position. Experimental results demonstrate that the presented method is accurate, robust and efficient.
NASA Astrophysics Data System (ADS)
Hartmann, Jürgen; Nawroth, Thomas; Dose, Klaus
1984-12-01
Carbodiimide-mediated peptide synthesis in aqueous solution has been studied with respect to self-ordering of amino acids. The copolymerisation of amino acids in the presence of glutamic acid or pyroglutamic acid leads to short pyroglutamyl peptides. Without pyroglutamic acid the formation of higher polymers is favoured. The interactions of the amino acids and the peptides, however, are very complex. Therefore, the experimental results are rather difficult to explain. Some of the experimental results, however, can be explained with the aid of computer simulation programs. Regarding only the tripeptide fraction the copolymerisation of pyroGlu, Ala and Leu, as well as the simulated copolymerisation lead to pyroGlu-Ala-Leu as the main reaction product. The amino acid composition of the insoluble peptides formed during the copolymerisation of Ser, Gly, Ala, Val, Phe, Leu and Ile corresponds in part to the computer-simulated copolymerisation data.
Lapchuk, A; Pashkevich, G A; Prygun, O V; Yurlov, V; Borodin, Y; Kryuchyn, A; Korchovyi, A A; Shylo, S
2015-10-01
The quasi-spiral 2D diffractive optical element (DOE) based on M-sequence of length N=15 is designed and manufactured. The speckle suppression efficiency by the DOE rotation is measured. The speckle suppression coefficients of 10.5, 6, and 4 are obtained for green, violet, and red laser beams, respectively. The results of numerical simulation and experimental data show that the quasi-spiral binary DOE structure can be as effective in speckle reduction as a periodic 2D DOE structure. The numerical simulation and experimental results show that the speckle suppression efficiency of the 2D DOE structure decreases approximately twice at the boundaries of the visible range. It is shown that a replacement of this structure with the bilateral 1D DOE allows obtaining the maximum speckle suppression efficiency in the entire visible range of light.
2010-01-01
Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer. PMID:20525162
Oakes, Theres; Heather, James M.; Best, Katharine; Byng-Maddick, Rachel; Husovsky, Connor; Ismail, Mazlina; Joshi, Kroopa; Maxwell, Gavin; Noursadeghi, Mahdad; Riddell, Natalie; Ruehl, Tabea; Turner, Carolin T.; Uddin, Imran; Chain, Benny
2017-01-01
The T cell receptor (TCR) repertoire can provide a personalized biomarker for infectious and non-infectious diseases. We describe a protocol for amplifying, sequencing, and analyzing TCRs which is robust, sensitive, and versatile. The key experimental step is ligation of a single-stranded oligonucleotide to the 3′ end of the TCR cDNA. This allows amplification of all possible rearrangements using a single set of primers per locus. It also introduces a unique molecular identifier to label each starting cDNA molecule. This molecular identifier is used to correct for sequence errors and for effects of differential PCR amplification efficiency, thus producing more accurate measures of the true TCR frequency within the sample. This integrated experimental and computational pipeline is applied to the analysis of human memory and naive subpopulations, and results in consistent measures of diversity and inequality. After error correction, the distribution of TCR sequence abundance in all subpopulations followed a power law over a wide range of values. The power law exponent differed between naïve and memory populations, but was consistent between individuals. The integrated experimental and analysis pipeline we describe is appropriate to studies of T cell responses in a broad range of physiological and pathological contexts. PMID:29075258
Measurement Marker Recognition In A Time Sequence Of Infrared Images For Biomedical Applications
NASA Astrophysics Data System (ADS)
Fiorini, A. R.; Fumero, R.; Marchesi, R.
1986-03-01
In thermographic measurements, quantitative surface temperature evaluation is often uncertain. The main reason is in the lack of available reference points in transient conditions. Reflective markers were used for automatic marker recognition and pixel coordinate computations. An algorithm selects marker icons to match marker references where particular luminance conditions are satisfied. Automatic marker recognition allows luminance compensation and temperature calibration of recorded infrared images. A biomedical application is presented: the dynamic behaviour of the surface temperature distributions is investigated in order to study the performance of two different pumping systems for extracorporeal circulation. Sequences of images are compared and results are discussed. Finally, the algorithm allows to monitor the experimental environment and to alert for the presence of unusual experimental conditions.
Li, Chuang; Chen, Tao; He, Qiang; Zhu, Yunping; Li, Kenli
2017-03-15
Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot. MRUniNovo is an open source software tool implemented in java. The source code and the parameter settings are available at http://bioinfo.hupo.org.cn/MRUniNovo/index.php. s131020002@hnu.edu.cn ; taochen1019@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.
Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying
2013-05-01
Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend
2007-01-01
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995
A generalized global alignment algorithm.
Huang, Xiaoqiu; Chao, Kun-Mao
2003-01-22
Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.
How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives.
Dal Molin, Alessandra; Di Camillo, Barbara
2018-01-31
The sequencing of the transcriptome of single cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types in heterogeneous cell populations or for the study of stochastic gene expression. In recent years, various experimental methods and computational tools for analysing single-cell RNA-sequencing data have been proposed. However, most of them are tailored to different experimental designs or biological questions, and in many cases, their performance has not been benchmarked yet, thus increasing the difficulty for a researcher to choose the optimal single-cell transcriptome sequencing (scRNA-seq) experiment and analysis workflow. In this review, we aim to provide an overview of the current available experimental and computational methods developed to handle single-cell RNA-sequencing data and, based on their peculiarities, we suggest possible analysis frameworks depending on specific experimental designs. Together, we propose an evaluation of challenges and open questions and future perspectives in the field. In particular, we go through the different steps of scRNA-seq experimental protocols such as cell isolation, messenger RNA capture, reverse transcription, amplification and use of quantitative standards such as spike-ins and Unique Molecular Identifiers (UMIs). We then analyse the current methodological challenges related to preprocessing, alignment, quantification, normalization, batch effect correction and methods to control for confounding effects. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Issues on machine learning for prediction of classes among molecular sequences of plants and animals
NASA Astrophysics Data System (ADS)
Stehlik, Milan; Pant, Bhasker; Pant, Kumud; Pardasani, K. R.
2012-09-01
Nowadays major laboratories of the world are turning towards in-silico experimentation due to their ease, reproducibility and accuracy. The ethical issues concerning wet lab experimentations are also minimal in in-silico experimentations. But before we turn fully towards dry lab simulations it is necessary to understand the discrepancies and bottle necks involved with dry lab experimentations. It is necessary before reporting any result using dry lab simulations to perform in-depth statistical analysis of the data. Keeping same in mind here we are presenting a collaborative effort to correlate findings and results of various machine learning algorithms and checking underlying regressions and mutual dependencies so as to develop an optimal classifier and predictors.
NASA Astrophysics Data System (ADS)
Andronesi, Ovidiu C.; Mintzopoulos, Dionyssios; Struppe, Jochem; Black, Peter M.; Tzika, A. Aria
2008-08-01
We propose a solid-state NMR method that maximizes the advantages of high-resolution magic-angle-spinning (HRMAS) applied to intact biopsies when compared to more conventional liquid-state NMR approaches. Theoretical treatment, numerical simulations and experimental results on intact human brain biopsies are presented. Experimentally, it is proven that an optimized adiabatic TOBSY (TOtal through Bond correlation SpectroscopY) solid-state NMR pulse sequence for two-dimensional 1H- 1H homonuclear scalar-coupling longitudinal isotropic mixing provides a 20%-50% improvement in signal-to-noise ratio relative to its liquid-state analogue TOCSY (TOtal Correlation SpectroscopY). For this purpose we have refined the C9151 symmetry-based 13C TOBSY pulse sequence for 1H MRS use and compared it to MLEV-16 TOCSY sequence. Both sequences were rotor-synchronized and implemented using WURST-8 adiabatic inversion pulses. As discussed theoretically and shown in simulations, the improved magnetization-transfer comes from actively removing residual dipolar couplings from the average Hamiltonian. Importantly, the solid-state NMR techniques are tailored to perform measurements at low temperatures where sample degradation is reduced. This is the first demonstration of such a concept for HRMAS metabolic profiling of disease processes, including cancer, from biopsies requiring reduced sample degradation for further genomic analysis.
Mesoscopic modeling of DNA denaturation rates: Sequence dependence and experimental comparison
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dahlen, Oda, E-mail: oda.dahlen@ntnu.no; Erp, Titus S. van, E-mail: titus.van.erp@ntnu.no
Using rare event simulation techniques, we calculated DNA denaturation rate constants for a range of sequences and temperatures for the Peyrard-Bishop-Dauxois (PBD) model with two different parameter sets. We studied a larger variety of sequences compared to previous studies that only consider DNA homopolymers and DNA sequences containing an equal amount of weak AT- and strong GC-base pairs. Our results show that, contrary to previous findings, an even distribution of the strong GC-base pairs does not always result in the fastest possible denaturation. In addition, we applied an adaptation of the PBD model to study hairpin denaturation for which experimentalmore » data are available. This is the first quantitative study in which dynamical results from the mesoscopic PBD model have been compared with experiments. Our results show that present parameterized models, although giving good results regarding thermodynamic properties, overestimate denaturation rates by orders of magnitude. We believe that our dynamical approach is, therefore, an important tool for verifying DNA models and for developing next generation models that have higher predictive power than present ones.« less
Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning
Cornish, Hannah; Dale, Rick; Kirby, Simon; Christiansen, Morten H.
2017-01-01
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language. PMID:28118370
Initial sequencing and comparative analysis of the mouse genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan
2002-12-15
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less
Sequence Memory Constraints Give Rise to Language-Like Structure through Iterated Learning.
Cornish, Hannah; Dale, Rick; Kirby, Simon; Christiansen, Morten H
2017-01-01
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language.
Xu, Yilei; Roy-Chowdhury, Amit K
2007-05-01
In this paper, we present a theory for combining the effects of motion, illumination, 3D structure, albedo, and camera parameters in a sequence of images obtained by a perspective camera. We show that the set of all Lambertian reflectance functions of a moving object, at any position, illuminated by arbitrarily distant light sources, lies "close" to a bilinear subspace consisting of nine illumination variables and six motion variables. This result implies that, given an arbitrary video sequence, it is possible to recover the 3D structure, motion, and illumination conditions simultaneously using the bilinear subspace formulation. The derivation builds upon existing work on linear subspace representations of reflectance by generalizing it to moving objects. Lighting can change slowly or suddenly, locally or globally, and can originate from a combination of point and extended sources. We experimentally compare the results of our theory with ground truth data and also provide results on real data by using video sequences of a 3D face and the entire human body with various combinations of motion and illumination directions. We also show results of our theory in estimating 3D motion and illumination model parameters from a video sequence.
Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa
2013-01-01
High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
de Bruin, Donny; Bossert, Nelli; Aartsma-Rus, Annemieke; Bouwmeester, Dirk
2018-04-06
Short nucleic acid oligomers have found a wide range of applications in experimental physics, biology and medicine, and show potential for the treatment of acquired and genetic diseases. These applications rely heavily on the predictability of hybridization through Watson-Crick base pairing to allow positioning on a nanometer scale, as well as binding to the target transcripts, but also off-target binding to transcripts with partial homology. These effects are of particular importance in the development of therapeutic oligonucleotides, where off-target effects caused by the binding of mismatched sequences need to be avoided. We employ a novel method of probing DNA hybridization using optically active DNA-stabilized silver clusters (Ag-DNA) to measure binding efficiencies through a change in fluorescence intensity. In this way we can determine their location-specific sensitivity to individual mismatches in the sequence. The results reveal a strong dependence of the hybridization on the location of the mismatch, whereby mismatches close to the edges and center show a relatively minor impact. In parallel, we propose a simple model for calculating the annealing ratios of mismatched DNA sequences, which supports our experimental results. The primary result shown in this work is a demonstration of a novel technique to measure DNA hybridization using fluorescent Ag-DNA. With this technique, we investigated the effect of mismatches on the hybridization efficiency, and found a significant dependence on the location of individual mismatches. These effects are strongly influenced by the length of the used oligonucleotides. The novel probe method based on fluorescent Ag-DNA functions as a reliable tool in measuring this behavior. As a secondary result, we formulated a simple model that is consistent with the experimental data.
Merino, Susana; Knirel, Yuriy A.; Regué, Miguel; Tomás, Juan M.
2013-01-01
We experimentally identified the activities of six predicted heptosyltransferases in Actinobacillus pleuropneumoniae genome serotype 5b strain L20 and serotype 3 strain JL03. The initial identification was based on a bioinformatic analysis of the amino acid similarity between these putative heptosyltrasferases with others of known function from enteric bacteria and Aeromonas. The putative functions of all the Actinobacillus pleuropneumoniae heptosyltrasferases were determined by using surrogate LPS acceptor molecules from well-defined A. hydrophyla AH-3 and A. salmonicida A450 mutants. Our results show that heptosyltransferases APL_0981 and APJL_1001 are responsible for the transfer of the terminal outer core D-glycero-D-manno-heptose (D,D-Hep) residue although they are not currently included in the CAZY glycosyltransferase 9 family. The WahF heptosyltransferase group signature sequence [S(T/S)(GA)XXH] differs from the heptosyltransferases consensus signature sequence [D(TS)(GA)XXH], because of the substitution of D261 for S261, being unique. PMID:23383222
Inquiry-based experiments for large-scale introduction to PCR and restriction enzyme digests.
Johanson, Kelly E; Watt, Terry J
2015-01-01
Polymerase chain reaction and restriction endonuclease digest are important techniques that should be included in all Biochemistry and Molecular Biology laboratory curriculums. These techniques are frequently taught at an advanced level, requiring many hours of student and faculty time. Here we present two inquiry-based experiments that are designed for introductory laboratory courses and combine both techniques. In both approaches, students must determine the identity of an unknown DNA sequence, either a gene sequence or a primer sequence, based on a combination of PCR product size and restriction digest pattern. The experimental design is flexible, and can be adapted based on available instructor preparation time and resources, and both approaches can accommodate large numbers of students. We implemented these experiments in our courses with a combined total of 584 students and have an 85% success rate. Overall, students demonstrated an increase in their understanding of the experimental topics, ability to interpret the resulting data, and proficiency in general laboratory skills. © 2015 The International Union of Biochemistry and Molecular Biology.
Sustained State-Independent Quantum Contextual Correlations from a Single Ion
NASA Astrophysics Data System (ADS)
Leupold, F. M.; Malinowski, M.; Zhang, C.; Negnevitsky, V.; Alonso, J.; Home, J. P.; Cabello, A.
2018-05-01
We use a single trapped-ion qutrit to demonstrate the quantum-state-independent violation of noncontextuality inequalities using a sequence of randomly chosen quantum nondemolition projective measurements. We concatenate 53 ×106 sequential measurements of 13 observables, and unambiguously violate an optimal noncontextual bound. We use the same data set to characterize imperfections including signaling and repeatability of the measurements. The experimental sequence was generated in real time with a quantum random number generator integrated into our control system to select the subsequent observable with a latency below 50 μ s , which can be used to constrain contextual hidden-variable models that might describe our results. The state-recycling experimental procedure is resilient to noise and independent of the qutrit state, substantiating the fact that the contextual nature of quantum physics is connected to measurements and not necessarily to designated states. The use of extended sequences of quantum nondemolition measurements finds applications in the fields of sensing and quantum information.
Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K
2016-09-01
S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.
Robustness of high-fidelity Rydberg gates with single-site addressability
NASA Astrophysics Data System (ADS)
Goerz, Michael H.; Halperin, Eli J.; Aytac, Jon M.; Koch, Christiane P.; Whaley, K. Birgitta
2014-09-01
Controlled-phase (cphase) gates can be realized with trapped neutral atoms by making use of the Rydberg blockade. Achieving the ultrahigh fidelities required for quantum computation with such Rydberg gates, however, is compromised by experimental inaccuracies in pulse amplitudes and timings, as well as by stray fields that cause fluctuations of the Rydberg levels. We report here a comparative study of analytic and numerical pulse sequences for the Rydberg cphase gate that specifically examines the robustness of the gate fidelity with respect to such experimental perturbations. Analytical pulse sequences of both simultaneous and stimulated Raman adiabatic passage (STIRAP) are found to be at best moderately robust under these perturbations. In contrast, optimal control theory is seen to allow generation of numerical pulses that are inherently robust within a predefined tolerance window. The resulting numerical pulse shapes display simple modulation patterns and can be rationalized in terms of an interference between distinct two-photon Rydberg excitation pathways. Pulses of such low complexity should be experimentally feasible, allowing gate fidelities of order 99.90-99.99% to be achievable under realistic experimental conditions.
Lingner, Thomas; Kataya, Amr R. A.; Reumann, Sigrun
2012-01-01
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences.1 As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity.” Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals. PMID:22415050
Lingner, Thomas; Kataya, Amr R A; Reumann, Sigrun
2012-02-01
We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.
You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing
2013-01-01
Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.
Experimental studies of two-stage centrifugal dust concentrator
NASA Astrophysics Data System (ADS)
Vechkanova, M. V.; Fadin, Yu M.; Ovsyannikov, Yu G.
2018-03-01
The article presents data of experimental results of two-stage centrifugal dust concentrator, describes its design, and shows the development of a method of engineering calculation and laboratory investigations. For the experiments, the authors used quartz, ceramic dust and slag. Experimental dispersion analysis of dust particles was obtained by sedimentation method. To build a mathematical model of the process, dust collection was built using central composite rotatable design of the four factorial experiment. A sequence of experiments was conducted in accordance with the table of random numbers. Conclusion were made.
Data acquisition and processing history for the Explorer 33 (AIMP-D) satellite
NASA Technical Reports Server (NTRS)
Karras, T. J.
1972-01-01
The quality control monitoring system, using accounting and quality control data bases, made it possible to perform an in-depth analysis. Results show that the percentage of useable data files for experimenter analysis was 97.7%; only 0.4% of the data sequences supplied to the experimenter exhibited missing data. The 50 percentile probability delay values (referenced to station record data) indicate that the analog tapes arrived within 11 days, the data were digitized within 4.2 weeks, and the experimenter tapes were delivered in 8.95 weeks or less.
Using ProMED-Mail and MedWorm blogs for cross-domain pattern analysis in epidemic intelligence.
Stewart, Avaré; Denecke, Kerstin
2010-01-01
In this work we motivate the use of medical blog user generated content for gathering facts about disease reporting events to support biosurveillance investigation. Given the characteristics of blogs, the extraction of such events is made more difficult due to noise and data abundance. We address the problem of automatically inferring disease reporting event extraction patterns in this more noisy setting. The sublanguage used in outbreak reports is exploited to align with the sequences of disease reporting sentences in blogs. Based our Cross Domain Pattern Analysis Framework, experimental results show that Phase-Level sequences tend to produce more overlap across the domains than Word-Level sequences. The cross domain alignment process is effective at filtering noisy sequences from blogs and extracting good candidate sequence patterns from an abundance of text.
Mutations that Cause Human Disease: A Computational/Experimental Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beernink, P; Barsky, D; Pesavento, B
International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
Theory and procedures for finding a correct kinetic model for the bacteriorhodopsin photocycle.
Hendler, R W; Shrager, R; Bose, S
2001-04-26
In this paper, we present the implementation and results of new methodology based on linear algebra. The theory behind these methods is covered in detail in the Supporting Information, available electronically (Shragerand Hendler). In brief, the methods presented search through all possible forward sequential submodels in order to find candidates that can be used to construct a complete model for the BR-photocycle. The methodology is limited only to forward sequential models. If no such models are compatible with the experimental data,none will be found. The procedures apply objective tests and filters to eliminate possibilities that cannot be correct, thus cutting the total number of candidate sequences to be considered. In the current application,which uses six exponentials, the total sequences were cut from 1950 to 49. The remaining sequences were further screened using known experimental criteria. The approach led to a solution which consists of a pair of sequences, one with 5 exponentials showing BR* f L(f) M(f) N O BR and the other with three exponentials showing BR* L(s) M(s) BR. The deduced complete kinetic model for the BR photocycle is thus either a single photocycle branched at the L intermediate or a pair of two parallel photocycles. Reasons for preferring the parallel photocycles are presented. Synthetic data constructed on the basis of the parallel photocycles were indistinguishable from the experimental data in a number of analytical tests that were applied.
Synthesis of Methyl Diantilis, a Commercially Important Fragrance
ERIC Educational Resources Information Center
Miles, William H.; Connell, Katelyn B.
2006-01-01
Synthetic sequences in the undergraduate organic chemistry laboratory illustrate important synthetic strategies, reagents, or experimental techniques, oftentimes resulting in the synthesis of commercially important compounds. A fragrance with a 'spicy, carnation, sweet, vanilla', named after carnations (Dianthus caryophllus), Methyl Diantillis is…
Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.
Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao
2011-09-20
Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
Memetic algorithms for de novo motif-finding in biomedical sequences.
Bi, Chengpeng
2012-09-01
The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.
2011-01-01
Purpose To theoretically develop and experimentally validate a formulism based on a fractional order calculus (FC) diffusion model to characterize anomalous diffusion in brain tissues measured with a twice-refocused spin-echo (TRSE) pulse sequence. Materials and Methods The FC diffusion model is the fractional order generalization of the Bloch-Torrey equation. Using this model, an analytical expression was derived to describe the diffusion-induced signal attenuation in a TRSE pulse sequence. To experimentally validate this expression, a set of diffusion-weighted (DW) images was acquired at 3 Tesla from healthy human brains using a TRSE sequence with twelve b-values ranging from 0 to 2,600 s/mm2. For comparison, DW images were also acquired using a Stejskal-Tanner diffusion gradient in a single-shot spin-echo echo planar sequence. For both datasets, a Levenberg-Marquardt fitting algorithm was used to extract three parameters: diffusion coefficient D, fractional order derivative in space β, and a spatial parameter μ (in units of μm). Using adjusted R-squared values and standard deviations, D, β and μ values and the goodness-of-fit in three specific regions of interest (ROI) in white matter, gray matter, and cerebrospinal fluid were evaluated for each of the two datasets. In addition, spatially resolved parametric maps were assessed qualitatively. Results The analytical expression for the TRSE sequence, derived from the FC diffusion model, accurately characterized the diffusion-induced signal loss in brain tissues at high b-values. In the selected ROIs, the goodness-of-fit and standard deviations for the TRSE dataset were comparable with the results obtained from the Stejskal-Tanner dataset, demonstrating the robustness of the FC model across multiple data acquisition strategies. Qualitatively, the D, β, and μ maps from the TRSE dataset exhibited fewer artifacts, reflecting the improved immunity to eddy currents. Conclusion The diffusion-induced signal attenuation in a TRSE pulse sequence can be described by an FC diffusion model at high b-values. This model performs equally well for data acquired from the human brain tissues with a TRSE pulse sequence or a conventional Stejskal-Tanner sequence. PMID:21509877
Quantized phase coding and connected region labeling for absolute phase retrieval.
Chen, Xiangcheng; Wang, Yuwei; Wang, Yajun; Ma, Mengchao; Zeng, Chunnian
2016-12-12
This paper proposes an absolute phase retrieval method for complex object measurement based on quantized phase-coding and connected region labeling. A specific code sequence is embedded into quantized phase of three coded fringes. Connected regions of different codes are labeled and assigned with 3-digit-codes combining the current period and its neighbors. Wrapped phase, more than 36 periods, can be restored with reference to the code sequence. Experimental results verify the capability of the proposed method to measure multiple isolated objects.
Lossless Video Sequence Compression Using Adaptive Prediction
NASA Technical Reports Server (NTRS)
Li, Ying; Sayood, Khalid
2007-01-01
We present an adaptive lossless video compression algorithm based on predictive coding. The proposed algorithm exploits temporal, spatial, and spectral redundancies in a backward adaptive fashion with extremely low side information. The computational complexity is further reduced by using a caching strategy. We also study the relationship between the operational domain for the coder (wavelet or spatial) and the amount of temporal and spatial redundancy in the sequence being encoded. Experimental results show that the proposed scheme provides significant improvements in compression efficiencies.
Earth field NMR with chemical shift spectral resolution: theory and proof of concept.
Katz, Itai; Shtirberg, Lazar; Shakour, Gubrail; Blank, Aharon
2012-06-01
A new method for obtaining an NMR signal in the Earth's magnetic field (EF) is presented. The method makes use of a simple pulse sequence with only DC fields which is much less demanding than previous approaches in terms of the pulses' rise and fall times. Furthermore, it offers the possibility of obtaining NMR data with enough spectral resolution to allow retrieving high resolution molecular chemical shift (CS) information - a capability that was not considered possible in EF NMR until now. Details of the pulse sequence, the experimental system, and our specially tailored EF NMR probe are provided. The experimental results demonstrate the capability to differentiate between three types of samples made of common fluorine compounds, based on their CS data. Copyright © 2012 Elsevier Inc. All rights reserved.
Structure and properties of CaMnO3/SrMnO3/BaMnO3 superlattices from first principles
NASA Astrophysics Data System (ADS)
Li, Shen; Oh, Seongshik; Rabe, Karin
2008-03-01
Previous theoretical and experimental studies have shown that three-component, or ``tri-color'' superlattices can exhibit intrinsic electric polarization due to inversion-symmetry breaking in the layer sequence. In ferromagnetic inversion-symmetry-breaking superlattices, controlled symmetry lowering is similarly expected to lead to interesting new and tunable properties. Here, we present results of first-principles density-functional-theory calculations for short-period CaMnO3/SrMnO3/BaMnO3 superlattices, using VASP. The ground state structure, magnetic ordering, polarization and dielectric response will be presented. The role of epitaxial strain in the individual layers and the role of layer sequence will be explored. Connections to experimental studies and prospects for future work will be discussed.
Aluminum U-groove weld enhancement based on experimental stress analyses
NASA Technical Reports Server (NTRS)
Verderaime, V.; Vaughan, R.
1995-01-01
Though butt-welds are among the most preferred joining methods in aerostructures because of their sealing and assembly integrity and general elastic performance; their inelastic mechanics are generally the least understood. This study investigated experimental strain distributions across a thick aluminum U-grooved weld and identified two weld process considerations for improving the multipass weld strength. The extreme thermal expansion and contraction gradient of the fusion heat input across the tab thickness between the grooves produce severe peaking, which induces bending moment under uniaxial loading. The filler strain hardening decreased with increasing filler pass sequence. These combined effects reduce the weld strength, and a depeaking index model was developed to select filler pass thicknesses, pass numbers, and sequences to improve the welding process results over the current normal weld schedule.
DOE Office of Scientific and Technical Information (OSTI.GOV)
James, A.E. Jr.; Strecker, E.P.; Miller, F.J. Jr.
1975-07-01
Recent communications have related the diagnosis of small bowel intussusceptions to abnormal accumulations of the radiopharmaceutical /sup 99m/Tc pertechnetate on abdominal scans. Considering the pathophysiological alterations attendant to intussusceptions, we have attempted an experimental model to examine these changes in temporal sequence. This study was initiated to understand the etiology better and to characterize the abnormalities noted on the /sup 99m/Tc pertechnetate abdominal scans.
2013-01-01
Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620
Pan, Xiaoyong; Shen, Hong-Bin
2018-05-02
RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
First Pass Annotation of Promoters on Human Chromosome 22
Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas
2001-01-01
The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158
Buschmann, Dominik; Haberberger, Anna; Kirchner, Benedikt; Spornraft, Melanie; Riedmaier, Irmgard; Schelling, Gustav; Pfaffl, Michael W.
2016-01-01
Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions. PMID:27317696
A computational framework to empower probabilistic protein design
Fromer, Menachem; Yanover, Chen
2008-01-01
Motivation: The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult. Results: In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future. Contact: fromer@cs.huji.ac.il PMID:18586717
Repp, Bruno H
2004-10-01
In a task that requires in-phase synchronization of finger taps with an isochronous sequence of target tones that is interleaved with a sequence of distractor tones at various fixed phase relationships, the taps tend to be attracted to the distractor tones, especially when the distractor tones closely precede the target tones [Repp, B. H. (2003a). Phase attraction in sensorimotor synchronization with auditory sequences: Effects of single and periodic distractors on synchronization accuracy. Journal of Experimental Psychology: Human Perception and Performance, 29, 290-309]. The present research addressed two related questions about this distractor effect: (1) Is it a function of the absolute temporal separation or of the relative phase of the two stimulus sequences? (2) Is it the result of perceptual grouping (integration) of target and distractor tones or of simultaneous attraction to two independent sequences? In three experiments, distractor effects were compared across two different sequence rates. The results suggest that absolute temporal separation, not relative phase, is the critical variable. Experiment 3 also included an anti-phase tapping task that addressed the second question directly. The results suggest that the attraction of taps to distractor tones is caused mainly by temporal integration of target and distractor tones within a fixed window of 100-150 ms duration, with the earlier-occurring tone being weighted more strongly than the later-occurring one.
Vlachos, Ioannis S; Paraskevopoulou, Maria D; Karagkouni, Dimitra; Georgakilas, Georgios; Vergoulis, Thanasis; Kanellos, Ilias; Anastasopoulos, Ioannis-Laertis; Maniou, Sofia; Karathanou, Konstantina; Kalfakakou, Despina; Fevgas, Athanasios; Dalamagas, Theodore; Hatzigeorgiou, Artemis G
2015-01-01
microRNAs (miRNAs) are short non-coding RNA species, which act as potent gene expression regulators. Accurate identification of miRNA targets is crucial to understanding their function. Currently, hundreds of thousands of miRNA:gene interactions have been experimentally identified. However, this wealth of information is fragmented and hidden in thousands of manuscripts and raw next-generation sequencing data sets. DIANA-TarBase was initially released in 2006 and it was the first database aiming to catalog published experimentally validated miRNA:gene interactions. DIANA-TarBase v7.0 (http://www.microrna.gr/tarbase) aims to provide for the first time hundreds of thousands of high-quality manually curated experimentally validated miRNA:gene interactions, enhanced with detailed meta-data. DIANA-TarBase v7.0 enables users to easily identify positive or negative experimental results, the utilized experimental methodology, experimental conditions including cell/tissue type and treatment. The new interface provides also advanced information ranging from the binding site location, as identified experimentally as well as in silico, to the primer sequences used for cloning experiments. More than half a million miRNA:gene interactions have been curated from published experiments on 356 different cell types from 24 species, corresponding to 9- to 250-fold more entries than any other relevant database. DIANA-TarBase v7.0 is freely available. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sequence Matters but How Exactly? A Method for Evaluating Activity Sequences from Data
ERIC Educational Resources Information Center
Doroudi, Shayan; Holstein, Kenneth; Aleven, Vincent; Brunskill, Emma
2016-01-01
How should a wide variety of educational activities be sequenced to maximize student learning? Although some experimental studies have addressed this question, educational data mining methods may be able to evaluate a wider range of possibilities and better handle many simultaneous sequencing constraints. We introduce Sequencing Constraint…
Improving transmission efficiency of large sequence alignment/map (SAM) files.
Sakib, Muhammad Nazmus; Tang, Jijun; Zheng, W Jim; Huang, Chin-Tser
2011-01-01
Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly.
Thermodynamics-based models of transcriptional regulation with gene sequence.
Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing
2015-12-01
Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.
A laboratory information management system for DNA barcoding workflows.
Vu, Thuy Duong; Eberhardt, Ursula; Szöke, Szániszló; Groenewald, Marizeth; Robert, Vincent
2012-07-01
This paper presents a laboratory information management system for DNA sequences (LIMS) created and based on the needs of a DNA barcoding project at the CBS-KNAW Fungal Biodiversity Centre (Utrecht, the Netherlands). DNA barcoding is a global initiative for species identification through simple DNA sequence markers. We aim at generating barcode data for all strains (or specimens) included in the collection (currently ca. 80 k). The LIMS has been developed to better manage large amounts of sequence data and to keep track of the whole experimental procedure. The system has allowed us to classify strains more efficiently as the quality of sequence data has improved, and as a result, up-to-date taxonomic names have been given to strains and more accurate correlation analyses have been carried out.
Characterization and prediction of residues determining protein functional specificity.
Capra, John A; Singh, Mona
2008-07-01
Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolutionary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs. Datasets and GroupSim code are available online at http://compbio.cs.princeton.edu/specificity/. Supplementary data are available at Bioinformatics online.
Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min
2012-01-01
DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226
Rényi continuous entropy of DNA sequences.
Vinga, Susana; Almeida, Jonas S
2004-12-07
Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.
Scannell, Devin R.; Zill, Oliver A.; Rokas, Antonis; Payen, Celia; Dunham, Maitreya J.; Eisen, Michael B.; Rine, Jasper; Johnston, Mark; Hittinger, Chris Todd
2011-01-01
High-quality, well-annotated genome sequences and standardized laboratory strains fuel experimental and evolutionary research. We present improved genome sequences of three species of Saccharomyces sensu stricto yeasts: S. bayanus var. uvarum (CBS 7001), S. kudriavzevii (IFO 1802T and ZP 591), and S. mikatae (IFO 1815T), and describe their comparison to the genomes of S. cerevisiae and S. paradoxus. The new sequences, derived by assembling millions of short DNA sequence reads together with previously published Sanger shotgun reads, have vastly greater long-range continuity and far fewer gaps than the previously available genome sequences. New gene predictions defined a set of 5261 protein-coding orthologs across the five most commonly studied Saccharomyces yeasts, enabling a re-examination of the tempo and mode of yeast gene evolution and improved inferences of species-specific gains and losses. To facilitate experimental investigations, we generated genetically marked, stable haploid strains for all three of these Saccharomyces species. These nearly complete genome sequences and the collection of genetically marked strains provide a valuable toolset for comparative studies of gene function, metabolism, and evolution, and render Saccharomyces sensu stricto the most experimentally tractable model genus. These resources are freely available and accessible through www.SaccharomycesSensuStricto.org. PMID:22384314
Promoter Sequences Prediction Using Relational Association Rule Mining
Czibula, Gabriela; Bocicor, Maria-Iuliana; Czibula, Istvan Gergely
2012-01-01
In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal. PMID:22563233
Single molecule sequencing-guided scaffolding and correction of draft assemblies.
Zhu, Shenglong; Chen, Danny Z; Emrich, Scott J
2017-12-06
Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.
An Adapting Auditory-motor Feedback Loop Can Contribute to Generating Vocal Repetition
Brainard, Michael S.; Jin, Dezhe Z.
2015-01-01
Consecutive repetition of actions is common in behavioral sequences. Although integration of sensory feedback with internal motor programs is important for sequence generation, if and how feedback contributes to repetitive actions is poorly understood. Here we study how auditory feedback contributes to generating repetitive syllable sequences in songbirds. We propose that auditory signals provide positive feedback to ongoing motor commands, but this influence decays as feedback weakens from response adaptation during syllable repetitions. Computational models show that this mechanism explains repeat distributions observed in Bengalese finch song. We experimentally confirmed two predictions of this mechanism in Bengalese finches: removal of auditory feedback by deafening reduces syllable repetitions; and neural responses to auditory playback of repeated syllable sequences gradually adapt in sensory-motor nucleus HVC. Together, our results implicate a positive auditory-feedback loop with adaptation in generating repetitive vocalizations, and suggest sensory adaptation is important for feedback control of motor sequences. PMID:26448054
Spectra library assisted de novo peptide sequencing for HCD and ETD spectra pairs.
Yan, Yan; Zhang, Kaizhong
2016-12-23
De novo peptide sequencing via tandem mass spectrometry (MS/MS) has been developed rapidly in recent years. With the use of spectra pairs from the same peptide under different fragmentation modes, performance of de novo sequencing is greatly improved. Currently, with large amount of spectra sequenced everyday, spectra libraries containing tens of thousands of annotated experimental MS/MS spectra become available. These libraries provide information of the spectra properties, thus have the potential to be used with de novo sequencing to improve its performance. In this study, an improved de novo sequencing method assisted with spectra library is proposed. It uses spectra libraries as training datasets and introduces significant scores of the features used in our previous de novo sequencing method for HCD and ETD spectra pairs. Two pairs of HCD and ETD spectral datasets were used to test the performance of the proposed method and our previous method. The results show that this proposed method achieves better sequencing accuracy with higher ranked correct sequences and less computational time. This paper proposed an advanced de novo sequencing method for HCD and ETD spectra pair and used information from spectra libraries and significant improved previous similar methods.
Peptide de novo sequencing of mixture tandem mass spectra
Hotta, Stéphanie Yuki Kolbeck; Verano‐Braga, Thiago; Kjeldsen, Frank
2016-01-01
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications. PMID:27329701
Improved Quality in Aerospace Testing Through the Modern Design of Experiments
NASA Technical Reports Server (NTRS)
DeLoach, R.
2000-01-01
This paper illustrates how, in the presence of systematic error, the quality of an experimental result can be influenced by the order in which the independent variables are set. It is suggested that in typical experimental circumstances in which systematic errors are significant, the common practice of organizing the set point order of independent variables to maximize data acquisition rate results in a test matrix that fails to produce the highest quality research result. With some care to match the volume of data required to satisfy inference error risk tolerances, it is possible to accept a lower rate of data acquisition and still produce results of higher technical quality (lower experimental error) with less cost and in less time than conventional test procedures, simply by optimizing the sequence in which independent variable levels are set.
Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Zhou, Yong; An, Ji-Yong
2017-07-05
Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.
Precise control of flexible manipulators
NASA Technical Reports Server (NTRS)
Cannon, R. H., Jr.
1984-01-01
Experimental apparatus were developed for physically testing control systems for pointing flexible structures, such as limber spacecraft, for the case that control actuators cannot be collocated with sensors. Structural damping ratios are less than 0.003, each basic configuration of sensor/actuator noncollocation is available, and inertias can be halved or doubled abruptly during control maneuvers, thereby imposing, in particular, a sudden reversal in the plant's pole-zero sequence. First experimental results are presented, including stable control with both collocation and noncollocation.
MEGANTE: A Web-Based System for Integrated Plant Genome Annotation
Numa, Hisataka; Itoh, Takeshi
2014-01-01
The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915
Experimental confirmation of long-memory correlations in star-wander data.
Zunino, Luciano; Gulich, Damián; Funes, Gustavo; Ziad, Aziz
2014-07-01
In this Letter we have analyzed the temporal correlations of the angle-of-arrival fluctuations of stellar images. Experimentally measured data were carefully examined by implementing multifractal detrended fluctuation analysis. This algorithm is able to discriminate the presence of fractal and multifractal structures in recorded time sequences. We have confirmed that turbulence-degraded stellar wavefronts are compatible with a long-memory correlated monofractal process. This experimental result is quite significant for the accurate comprehension and modeling of the atmospheric turbulence effects on the stellar images. It can also be of great utility within the adaptive optics field.
Characterising experimental time series using local intrinsic dimension
NASA Astrophysics Data System (ADS)
Buzug, Thorsten M.; von Stamm, Jens; Pfister, Gerd
1995-02-01
Experimental strange attractors are analysed with the averaged local intrinsic dimension proposed by A. Passamante et al. [Phys. Rev. A 39 (1989) 3640] which is based on singular value decomposition of local trajectory matrices. The results are compared to the values of Kaplan-Yorke and the correlation dimension. The attractors, reconstructed with Takens' delay time coordinates from scalar velocity time series, are measured in the hydrodynamic Taylor-Couette system. A period doubling route towards chaos obtained from a very short Taylor-Couette cylinder yields a sequence of experimental time series where the local intrinsic dimension is applied.
Reanalysis of RNA-Sequencing Data Reveals Several Additional Fusion Genes with Multiple Isoforms
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
2012-01-01
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts. PMID:23119097
Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.
Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli
2012-01-01
RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.
Roe, Daisy; Miles, Christopher; Johnson, Andrew J
2017-07-01
The present paper examines the effect of within-sequence item repetitions in tactile order memory. Employing an immediate serial recall procedure, participants reconstructed a six-item sequence tapped upon their fingers by moving those fingers in the order of original stimulation. In Experiment 1a, within-sequence repetition of an item separated by two-intervening items resulted in a significant reduction in recall accuracy for that repeated item (i.e., the Ranschburg effect). In Experiment 1b, within-sequence repetition of an adjacent item resulted in significant recall facilitation for that repeated item. These effects mirror those reported for verbal stimuli (e.g., Henson, 1998a . Item repetition in short-term memory: Ranschburg repeated. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(5), 1162-1181. doi:doi.org/10.1037/0278-7393.24.5.1162). These data are the first to demonstrate the Ranschburg effect with non-verbal stimuli and suggest further cross-modal similarities in order memory.
Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao
2012-05-01
Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
A long-term target detection approach in infrared image sequence
NASA Astrophysics Data System (ADS)
Li, Hang; Zhang, Qi; Li, Yuanyuan; Wang, Liqiang
2015-12-01
An automatic target detection method used in long term infrared (IR) image sequence from a moving platform is proposed. Firstly, based on non-linear histogram equalization, target candidates are coarse-to-fine segmented by using two self-adapt thresholds generated in the intensity space. Then the real target is captured via two different selection approaches. At the beginning of image sequence, the genuine target with litter texture is discriminated from other candidates by using contrast-based confidence measure. On the other hand, when the target becomes larger, we apply online EM method to iteratively estimate and update the distributions of target's size and position based on the prior detection results, and then recognize the genuine one which satisfies both the constraints of size and position. Experimental results demonstrate that the presented method is accurate, robust and efficient.
Streamlined Genome Sequence Compression using Distributed Source Coding
Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel
2014-01-01
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552
Face recognition based on matching of local features on 3D dynamic range sequences
NASA Astrophysics Data System (ADS)
Echeagaray-Patrón, B. A.; Kober, Vitaly
2016-09-01
3D face recognition has attracted attention in the last decade due to improvement of technology of 3D image acquisition and its wide range of applications such as access control, surveillance, human-computer interaction and biometric identification systems. Most research on 3D face recognition has focused on analysis of 3D still data. In this work, a new method for face recognition using dynamic 3D range sequences is proposed. Experimental results are presented and discussed using 3D sequences in the presence of pose variation. The performance of the proposed method is compared with that of conventional face recognition algorithms based on descriptors.
Efficient Mining of Interesting Patterns in Large Biological Sequences
Rashid, Md. Mamunur; Karim, Md. Rezaul; Jeong, Byeong-Soo
2012-01-01
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. PMID:23105928
Efficient mining of interesting patterns in large biological sequences.
Rashid, Md Mamunur; Karim, Md Rezaul; Jeong, Byeong-Soo; Choi, Ho-Jin
2012-03-01
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
2010-01-01
Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148
Bashir, Ali; Bansal, Vikas; Bafna, Vineet
2010-06-18
Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
Predicting overload-affected fatigue crack growth in steels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skorupa, M.; Skorupa, A.; Ladecki, B.
1996-12-01
The ability of semi-empirical crack closure models to predict the effect of overloads on fatigue crack growth in low-alloy steels has been investigated. With this purpose, the CORPUS model developed for aircraft metals and spectra has been checked first through comparisons between the simulated and observed results for a low-alloy steel. The CORPUS predictions of crack growth under several types of simple load histories containing overloads appeared generally unconservative which prompted the authors to formulate a new model, more suitable for steels. With the latter approach, the assumed evolution of the crack opening stress during the delayed retardation stage hasmore » been based on experimental results reported for various steels. For all the load sequences considered, the predictions from the proposed model appeared to be by far more accurate than those from CORPUS. Based on the analysis results, the capability of semi-empirical prediction concepts to cover experimentally observed trends that have been reported for sequences with overloads is discussed. Finally, possibilities of improving the model performance are considered.« less
Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang
2018-02-01
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Panek, Marina; Čipčić Paljetak, Hana; Barešić, Anja; Perić, Mihaela; Matijašić, Mario; Lojkić, Ivana; Vranešić Bender, Darija; Krznarić, Željko; Verbanac, Donatella
2018-03-23
The information on microbiota composition in the human gastrointestinal tract predominantly originates from the analyses of human faeces by application of next generation sequencing (NGS). However, the detected composition of the faecal bacterial community can be affected by various factors including experimental design and procedures. This study evaluated the performance of different protocols for collection and storage of faecal samples (native and OMNIgene.GUT system) and bacterial DNA extraction (MP Biomedicals, QIAGEN and MO BIO kits), using two NGS platforms for 16S rRNA gene sequencing (Ilumina MiSeq and Ion Torrent PGM). OMNIgene.GUT proved as a reliable and convenient system for collection and storage of faecal samples although favouring Sutterella genus. MP provided superior DNA yield and quality, MO BIO depleted Gram positive organisms while using QIAGEN with OMNIgene.GUT resulted in greatest variability compared to other two kits. MiSeq and IT platforms in their supplier recommended setups provided comparable reproducibility of donor faecal microbiota. The differences included higher diversity observed with MiSeq and increased capacity of MiSeq to detect Akkermansia muciniphila, [Odoribacteraceae], Erysipelotrichaceae and Ruminococcaceae (primarily Faecalibacterium prausnitzii). The results of our study could assist the investigators using NGS technologies to make informed decisions on appropriate tools for their experimental pipelines.
Long Read Alignment with Parallel MapReduce Cloud Platform
Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki
2015-01-01
Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887
Long Read Alignment with Parallel MapReduce Cloud Platform.
Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki
2015-01-01
Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.
Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C
2003-01-01
Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626
Hockenberry, Adam J; Pah, Adam R; Jewett, Michael C; Amaral, Luís A N
2017-01-01
Studies dating back to the 1970s established that sequence complementarity between the anti-Shine-Dalgarno (aSD) sequence on prokaryotic ribosomes and the 5' untranslated region of mRNAs helps to facilitate translation initiation. The optimal location of aSD sequence binding relative to the start codon, the full extents of the aSD sequence and the functional form of the relationship between aSD sequence complementarity and translation efficiency have not been fully resolved. Here, we investigate these relationships by leveraging the sequence diversity of endogenous genes and recently available genome-wide estimates of translation efficiency. We show that-after accounting for predicted mRNA structure-aSD sequence complementarity increases the translation of endogenous mRNAs by roughly 50%. Further, we observe that this relationship is nonlinear, with translation efficiency maximized for mRNAs with intermediate levels of aSD sequence complementarity. The mechanistic insights that we observe are highly robust: we find nearly identical results in multiple datasets spanning three distantly related bacteria. Further, we verify our main conclusions by re-analysing a controlled experimental dataset. © 2017 The Authors.
Manku, H K; Dhanoa, J K; Kaur, S; Arora, J S; Mukhopadhyay, C S
2017-10-01
MicroRNAs (miRNAs) are small (19-25 base long), non-coding RNAs that regulate post-transcriptional gene expression by cleaving targeted mRNAs in several eukaryotes. The miRNAs play vital roles in multiple biological and metabolic processes, including developmental timing, signal transduction, cell maintenance and differentiation, diseases and cancers. Experimental identification of microRNAs is expensive and lab-intensive. Alternatively, computational approaches for predicting putative miRNAs from genomic or exomic sequences rely on features of miRNAs viz. secondary structures, sequence conservation, minimum free energy index (MFEI) etc. To date, not a single miRNA has been identified in bubaline (Bubalus bubalis), which is an economically important livestock. The present study aims at predicting the putative miRNAs of buffalo using comparative computational approach from buffalo whole genome shotgun sequencing data (INSDC: AWWX00000000.1). The sequences were blasted against the known mammalian miRNA. The obtained miRNAs were then passed through a series of filtration criteria to obtain the set of predicted (putative and novel) bubaline miRNA. Eight miRNAs were selected based on lowest E-value and validated by real time PCR (SYBR green chemistry) using RNU6 as endogenous control. The results from different trails of real time PCR shows that out of selected 8 miRNAs, only 2 (hsa-miR-1277-5p; bta-miR-2285b) are not expressed in bubaline PBMCs. The potential target genes based on their sequence complementarities were then predicted using miRanda. This work is the first report on prediction of bubaline miRNA from whole genome sequencing data followed by experimental validation. The finding could pave the way to future studies in economically important traits in buffalo. Copyright © 2017 Elsevier Ltd. All rights reserved.
FA-SAT Is an Old Satellite DNA Frozen in Several Bilateria Genomes
Chaves, Raquel; Ferreira, Daniela; Mendes-da-Silva, Ana; Meles, Susana; Adega, Filomena
2017-01-01
Abstract In recent years, a growing body of evidence has recognized the tandem repeat sequences, and specifically satellite DNA, as a functional class of sequences in the genomic “dark matter.” Using an original, complementary, and thus an eclectic experimental design, we show that the cat archetypal satellite DNA sequence, FA-SAT, is “frozen” conservatively in several Bilateria genomes. We found different genomic FA-SAT architectures, and the interspersion pattern was conserved. In Carnivora genomes, the FA-SAT-related sequences are also amplified, with the predominance of a specific FA-SAT variant, at the heterochromatic regions. We inspected the cat genome project to locate FA-SAT array flanking regions and revealed an intensive intermingling with transposable elements. Our results also show that FA-SAT-related sequences are transcribed and that the most abundant FA-SAT variant is not always the most transcribed. We thus conclude that the DNA sequences of FA-SAT and their transcripts are “frozen” in these genomes. Future work is needed to disclose any putative function that these sequences may play in these genomes. PMID:29608678
Applying Agrep to r-NSA to solve multiple sequences approximate matching.
Ni, Bing; Wong, Man-Hon; Lam, Chi-Fai David; Leung, Kwong-Sak
2014-01-01
This paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.
An experimental phylogeny to benchmark ancestral sequence reconstruction
Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.
2016-01-01
Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Evaluation of normalization methods in mammalian microRNA-Seq data
Garmire, Lana Xia; Subramaniam, Shankar
2012-01-01
Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. PMID:22532701
Chanu, A; Aboussouan, E; Tamaz, S; Martel, S
2006-01-01
Software architecture for the navigation of a ferromagnetic untethered device in a 1D and 2D phantom environment is briefly described. Navigation is achieved using the real-time capabilities of a Siemens 1.5 T Avanto MRI system coupled with a dedicated software environment and a specially developed 3D tracking pulse sequence. Real-time control of the magnetic core is executed through the implementation of a simple PID controller. 1D and 2D experimental results are presented.
NASA Astrophysics Data System (ADS)
Dupont-Nivet, M.; Demur, R.; Westbrook, C. I.; Schwartz, S.
2018-04-01
We report the experimental study of an atom-chip interferometer using ultracold rubidium 87 atoms above the Bose–Einstein condensation threshold. The observed dependence of the contrast decay time with temperature and with the degree of symmetry of the traps during the interferometer sequence is in good agreement with theoretical predictions published in Dupont-Nivet et al (2016 New J. Phys. 18 113012). These results pave the way for precision measurements with trapped thermal atoms.
Simultaneous phylogeny reconstruction and multiple sequence alignment
Yue, Feng; Shi, Jian; Tang, Jijun
2009-01-01
Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
G4RNA: an RNA G-quadruplex database
Garant, Jean-Michel; Luce, Mikael J.; Scott, Michelle S.
2015-01-01
Abstract G-quadruplexes (G4) are tetrahelical structures formed from planar arrangement of guanines in nucleic acids. A simple, regular motif was originally proposed to describe G4-forming sequences. More recently, however, formation of G4 was discovered to depend, at least in part, on the contextual backdrop of neighboring sequences. Prediction of G4 folding is thus becoming more challenging as G4 outlier structures, not described by the originally proposed motif, are increasingly reported. Recent observations thus call for a comprehensive tool, capable of consolidating the expanding information on tested G4s, in order to conduct systematic comparative analyses of G4-promoting sequences. The G4RNA Database we propose was designed to help meet the need for easily-retrievable data on known RNA G4s. A user-friendly, flexible query system allows for data retrieval on experimentally tested sequences, from many separate genes, to assess G4-folding potential. Query output sorts data according to sequence position, G4 likelihood, experimental outcomes and associated bibliographical references. G4RNA also provides an ideal foundation to collect and store additional sequence and experimental data, considering the growing interest G4s currently generate. Database URL: scottgroup.med.usherbrooke.ca/G4RNA PMID:26200754
Stavrou, Elissaios; Yao, Yansun; Zaug, Joseph M; Bastea, Sorin; Kalkan, Bora; Konôpková, Zuzana; Kunz, Martin
2016-08-12
Magnesium chloride (MgCl2) with the rhombohedral layered CdCl2-type structure (α-MgCl2) has been studied experimentally using synchrotron angle-dispersive powder x-ray diffraction and Raman spectroscopy using a diamond-anvil cell up to 100 GPa at room temperature and theoretically using first-principles density functional calculations. The results reveal a pressure-induced second-order structural phase transition to a hexagonal layered CdI2-type structure (β-MgCl2) at 0.7 GPa: the stacking sequence of the Cl anions are altered resulting in a reduction of the c-axis length. Theoretical calculations confirm this phase transition sequence and the calculated transition pressure is in excellent agreement with the experiment. Lattice dynamics calculations also reproduce the experimental Raman spectra measured for the ambient and high-pressure phase. According to our experimental results MgCl2 remains in a 2D layered phase up to 100 GPa and further, the 6-fold coordination of Mg cations is retained. Theoretical calculations of relative enthalpy suggest that this extensive pressure stability is due to a low enthalpy of the layered structure ruling out kinetic barrier effects. This observation is unusual, as it contradicts with the general structural behavior of highly compressed AB2 compounds.
Stavrou, Elissaios; Yao, Yansun; Zaug, Joseph M.; ...
2016-08-12
We studied magnesium chloride (MgCl 2) with the rhombohedral layered CdCl 2-type structure (α-MgCl 2), experimentally, using synchrotron angle-dispersive powder x-ray diffraction and Raman spectroscopy using a diamond-anvil cell up to 100 GPa at room temperature and theoretically using first-principles density functional calculations. Our results reveal a pressure-induced second-order structural phase transition to a hexagonal layered CdI 2-type structure (β-MgCl 2) at 0.7 GPa: the stacking sequence of the Cl anions are altered resulting in a reduction of the c-axis length. Theoretical calculations confirm this phase transition sequence and the calculated transition pressure is in excellent agreement with the experiment.more » Lattice dynamics calculations also reproduce the experimental Raman spectra measured for the ambient and high-pressure phase. According to our experimental results MgCl 2 remains in a 2D layered phase up to 100 GPa and further, the 6-fold coordination of Mg cations is retained. Theoretical calculations of relative enthalpy suggest that this extensive pressure stability is due to a low enthalpy of the layered structure ruling out kinetic barrier effects. Our observation is unusual, as it contradicts with the general structural behavior of highly compressed AB 2 compounds.« less
Schadt, Eric E; Edwards, Stephen W; GuhaThakurta, Debraj; Holder, Dan; Ying, Lisa; Svetnik, Vladimir; Leonardson, Amy; Hart, Kyle W; Russell, Archie; Li, Guoya; Cavet, Guy; Castle, John; McDonagh, Paul; Kan, Zhengyan; Chen, Ronghua; Kasarskis, Andrew; Margarint, Mihai; Caceres, Ramon M; Johnson, Jason M; Armour, Christopher D; Garrett-Engele, Philip W; Tsinoremas, Nicholas F; Shoemaker, Daniel D
2004-01-01
Background Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22. Results The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale. Conclusions These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized. PMID:15461792
Protein Sequence Classification with Improved Extreme Learning Machine Algorithms
2014-01-01
Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms. PMID:24795876
Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database
2017-01-01
Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799
Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines.
Oliveira, Francisco P M; Tavares, João Manuel R S
2013-03-01
This article presents an enhanced methodology to align plantar pressure image sequences simultaneously in time and space. The temporal alignment of the sequences is accomplished using B-splines in the time modeling, and the spatial alignment can be attained using several geometric transformation models. The methodology was tested on a dataset of 156 real plantar pressure image sequences (3 sequences for each foot of the 26 subjects) that was acquired using a common commercial plate during barefoot walking. In the alignment of image sequences that were synthetically deformed both in time and space, an outstanding accuracy was achieved with the cubic B-splines. This accuracy was significantly better (p < 0.001) than the one obtained using the best solution proposed in our previous work. When applied to align real image sequences with unknown transformation involved, the alignment based on cubic B-splines also achieved superior results than our previous methodology (p < 0.001). The consequences of the temporal alignment on the dynamic center of pressure (COP) displacement was also assessed by computing the intraclass correlation coefficients (ICC) before and after the temporal alignment of the three image sequence trials of each foot of the associated subject at six time instants. The results showed that, generally, the ICCs related to the medio-lateral COP displacement were greater when the sequences were temporally aligned than the ICCs of the original sequences. Based on the experimental findings, one can conclude that the cubic B-splines are a remarkable solution for the temporal alignment of plantar pressure image sequences. These findings also show that the temporal alignment can increase the consistency of the COP displacement on related acquired plantar pressure image sequences.
Radiation effects on MOS devices - dosimetry, annealing, irradiation sequence, and sources
NASA Technical Reports Server (NTRS)
Stassinopoulos, E. G.; Brucker, G. J.; Van Gunten, O.; Knudson, A. R.; Jordan, T. M.
1983-01-01
This paper reports on some investigations of dosimetry, annealing, irradiation sequences, and radioactive sources, involved in the determination of radiation effects on MOS devices. Results show that agreement in the experimental and theoretical surface to average doses support the use of thermo-luminescent dosimeters (manganese activated calcium fluoride) in specifying the surface dose delivered to thin gate insulators of MOS devices. Annealing measurements indicate the existence of at least two energy levels,,s or a activation energies, for recovery of soft oxide MOS devices after irradiation by electrons, protons, and gammas. Damage sensitivities of MOS devices were found to be independent of combinations and sequences of radiation type or energies. Comparison of various gamma sources indicated a small dependence of damage sensitivity on the Cobalt facility, but a more significant dependence in the case of the Cesium source. These results were attributed to differences in the spectral content of the several sources.
Vakil, Eli; Bloch, Ayala; Cohen, Haggar
2017-03-01
The serial reaction time (SRT) task has generated a very large amount of research. Nevertheless the debate continues as to the exact cognitive processes underlying implicit sequence learning. Thus, the first goal of this study is to elucidate the underlying cognitive processes enabling sequence acquisition. We therefore compared reaction time (RT) in sequence learning in a standard manual activated (MA) to that in an ocular activated (OA) version of the task, within a single experimental setting. The second goal is to use eye movement measures to compare anticipation, as an additional indication of sequence learning, between the two versions of the SRT. Performance of the group given the MA version of the task (n = 29) was compared with that of the group given the OA version (n = 30). The results showed that although overall, RT was faster for the OA group, the rate of sequence learning was similar to that of the MA group performing the standard version of the SRT. Because the stimulus-response association is automatic and exists prior to training in the OA task, the decreased reaction time in this version of the task reflects a purer measure of the sequence learning that occurs in the SRT task. The results of this study show that eye tracking anticipation can be measured directly and can serve as a direct measure of sequence learning. Finally, using the OA version of the SRT to study sequence learning presents a significant methodological contribution by making sequence learning studies possible among populations that struggle to perform manual responses.
Faridnasr, Maryam; Ghanbari, Bastam; Sassani, Ardavan
2016-05-01
A novel approach was applied for optimization of a moving-bed biofilm sequencing batch reactor (MBSBR) to treat sugar-industry wastewater (BOD5=500-2500 and COD=750-3750 mg/L) at 2-4 h of cycle time (CT). Although the experimental data showed that MBSBR reached high BOD5 and COD removal performances, it failed to achieve the standard limits at the mentioned CTs. Thus, optimization of the reactor was rendered by kinetic computational modeling and using statistical error indicator normalized root mean square error (NRMSE). The results of NRMSE revealed that Stover-Kincannon (error=6.40%) and Grau (error=6.15%) models provide better fits to the experimental data and may be used for CT optimization in the reactor. The models predicted required CTs of 4.5, 6.5, 7 and 7.5 h for effluent standardization of 500, 1000, 1500 and 2500 mg/L influent BOD5 concentrations, respectively. Similar pattern of the experimental data also confirmed these findings. Copyright © 2016 Elsevier Ltd. All rights reserved.
Peptide de novo sequencing of mixture tandem mass spectra.
Gorshkov, Vladimir; Hotta, Stéphanie Yuki Kolbeck; Verano-Braga, Thiago; Kjeldsen, Frank
2016-09-01
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co-isolation and thus prone to false identifications. The deconvolution approach matched complementary b-, y-ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co-isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20-35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mashiyama, Susan T.; Koupparis, Kyriacos; Caffrey, Conor R.; McKerrow, James H.; Babbitt, Patricia C.
2012-01-01
We performed a genome-level computational study of sequence and structure similarity, the latter using crystal structures and models, of the proteases of Homo sapiens and the human parasite Trypanosoma brucei. Using sequence and structure similarity networks to summarize the results, we constructed global views that show visually the relative abundance and variety of proteases in the degradome landscapes of these two species, and provide insights into evolutionary relationships between proteases. The results also indicate how broadly these sequence sets are covered by three-dimensional structures. These views facilitate cross-species comparisons and offer clues for drug design from knowledge about the sequences and structures of potential drug targets and their homologs. Two protease groups (“M32” and “C51”) that are very different in sequence from human proteases are examined in structural detail, illustrating the application of this global approach in mining new pathogen genomes for potential drug targets. Based on our analyses, a human ACE2 inhibitor was selected for experimental testing on one of these parasite proteases, TbM32, and was shown to inhibit it. These sequence and structure data, along with interactive versions of the protein similarity networks generated in this study, are available at http://babbittlab.ucsf.edu/resources.html. PMID:23236535
Robust dynamical decoupling for quantum computing and quantum memory.
Souza, Alexandre M; Alvarez, Gonzalo A; Suter, Dieter
2011-06-17
Dynamical decoupling (DD) is a popular technique for protecting qubits from the environment. However, unless special care is taken, experimental errors in the control pulses used in this technique can destroy the quantum information instead of preserving it. Here, we investigate techniques for making DD sequences robust against different types of experimental errors while retaining good decoupling efficiency in a fluctuating environment. We present experimental data from solid-state nuclear spin qubits and introduce a new DD sequence that is suitable for quantum computing and quantum memory.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules.
Li, Yueqi; Xiang, Limin; Palma, Julio L; Asai, Yoshihiro; Tao, Nongjian
2016-04-15
Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules
Li, Yueqi; Xiang, Limin; Palma, Julio L.; Asai, Yoshihiro; Tao, Nongjian
2016-01-01
Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models. PMID:27079152
Experimental design and quantitative analysis of microbial community multiomics.
Mallick, Himel; Ma, Siyuan; Franzosa, Eric A; Vatanen, Tommi; Morgan, Xochitl C; Huttenhower, Curtis
2017-11-30
Studies of the microbiome have become increasingly sophisticated, and multiple sequence-based, molecular methods as well as culture-based methods exist for population-scale microbiome profiles. To link the resulting host and microbial data types to human health, several experimental design considerations, data analysis challenges, and statistical epidemiological approaches must be addressed. Here, we survey current best practices for experimental design in microbiome molecular epidemiology, including technologies for generating, analyzing, and integrating microbiome multiomics data. We highlight studies that have identified molecular bioactives that influence human health, and we suggest steps for scaling translational microbiome research to high-throughput target discovery across large populations.
Simultaneous digital super-resolution and nonuniformity correction for infrared imaging systems.
Meza, Pablo; Machuca, Guillermo; Torres, Sergio; Martin, Cesar San; Vera, Esteban
2015-07-20
In this article, we present a novel algorithm to achieve simultaneous digital super-resolution and nonuniformity correction from a sequence of infrared images. We propose to use spatial regularization terms that exploit nonlocal means and the absence of spatial correlation between the scene and the nonuniformity noise sources. We derive an iterative optimization algorithm based on a gradient descent minimization strategy. Results from infrared image sequences corrupted with simulated and real fixed-pattern noise show a competitive performance compared with state-of-the-art methods. A qualitative analysis on the experimental results obtained with images from a variety of infrared cameras indicates that the proposed method provides super-resolution images with significantly less fixed-pattern noise.
CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition
Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe
2013-01-01
Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764
Predicting RNA pseudoknot folding thermodynamics
Cao, Song; Chen, Shi-Jie
2006-01-01
Based on the experimentally determined atomic coordinates for RNA helices and the self-avoiding walks of the P (phosphate) and C4 (carbon) atoms in the diamond lattice for the polynucleotide loop conformations, we derive a set of conformational entropy parameters for RNA pseudoknots. Based on the entropy parameters, we develop a folding thermodynamics model that enables us to compute the sequence-specific RNA pseudoknot folding free energy landscape and thermodynamics. The model is validated through extensive experimental tests both for the native structures and for the folding thermodynamics. The model predicts strong sequence-dependent helix-loop competitions in the pseudoknot stability and the resultant conformational switches between different hairpin and pseudoknot structures. For instance, for the pseudoknot domain of human telomerase RNA, a native-like and a misfolded hairpin intermediates are found to coexist on the (equilibrium) folding pathways, and the interplay between the stabilities of these intermediates causes the conformational switch that may underlie a human telomerase disease. PMID:16709732
Soler, Miguel A; de Marco, Ario; Fortuna, Sara
2016-10-10
Nanobodies (VHHs) have proved to be valuable substitutes of conventional antibodies for molecular recognition. Their small size represents a precious advantage for rational mutagenesis based on modelling. Here we address the problem of predicting how Camelidae nanobody sequences can tolerate mutations by developing a simulation protocol based on all-atom molecular dynamics and whole-molecule docking. The method was tested on two sets of nanobodies characterized experimentally for their biophysical features. One set contained point mutations introduced to humanize a wild type sequence, in the second the CDRs were swapped between single-domain frameworks with Camelidae and human hallmarks. The method resulted in accurate scoring approaches to predict experimental yields and enabled to identify the structural modifications induced by mutations. This work is a promising tool for the in silico development of single-domain antibodies and opens the opportunity to customize single functional domains of larger macromolecules.
NASA Astrophysics Data System (ADS)
Soler, Miguel A.; De Marco, Ario; Fortuna, Sara
2016-10-01
Nanobodies (VHHs) have proved to be valuable substitutes of conventional antibodies for molecular recognition. Their small size represents a precious advantage for rational mutagenesis based on modelling. Here we address the problem of predicting how Camelidae nanobody sequences can tolerate mutations by developing a simulation protocol based on all-atom molecular dynamics and whole-molecule docking. The method was tested on two sets of nanobodies characterized experimentally for their biophysical features. One set contained point mutations introduced to humanize a wild type sequence, in the second the CDRs were swapped between single-domain frameworks with Camelidae and human hallmarks. The method resulted in accurate scoring approaches to predict experimental yields and enabled to identify the structural modifications induced by mutations. This work is a promising tool for the in silico development of single-domain antibodies and opens the opportunity to customize single functional domains of larger macromolecules.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Weizhao; Zhang, Zixuan; Lu, Jie
Carbon fiber composites have received growing attention because of their high performance. One economic method to manufacturing the composite parts is the sequence of forming followed by the compression molding process. In this sequence, the preforming procedure forms the prepreg, which is the composite with the uncured resin, to the product geometry while the molding process cures the resin. Slip between different prepreg layers is observed in the preforming step and this paper reports a method to characterize the properties of the interaction between different prepreg layers, which is critical to predictive modeling and design optimization. An experimental setup wasmore » established to evaluate the interactions at various industrial production conditions. The experimental results were analyzed for an in-depth understanding about how the temperature, the relative sliding speed, and the fiber orientation affect the tangential interaction between two prepreg layers. The interaction factors measured from these experiments will be implemented in the computational preforming program.« less
Bomboi, Francesca; Romano, Flavio; Leo, Manuela; Fernandez-Castanon, Javier; Cerbino, Roberto; Bellini, Tommaso; Bordi, Federico; Filetici, Patrizia; Sciortino, Francesco
2016-01-01
DNA is acquiring a primary role in material development, self-assembling by design into complex supramolecular aggregates, the building block of a new-materials world. Using DNA nanoconstructs to translate sophisticated theoretical intuitions into experimental realizations by closely matching idealized models of colloidal particles is a much less explored avenue. Here we experimentally show that an appropriate selection of competing interactions enciphered in multiple DNA sequences results into the successful design of a one-pot DNA hydrogel that melts both on heating and on cooling. The relaxation time, measured by light scattering, slows down dramatically in a limited window of temperatures. The phase diagram displays a peculiar re-entrant shape, the hallmark of the competition between different bonding patterns. Our study shows that it is possible to rationally design biocompatible bulk materials with unconventional phase diagrams and tuneable properties by encoding into DNA sequences both the particle shape and the physics of the collective response. PMID:27767029
Integration, warehousing, and analysis strategies of Omics data.
Gedela, Srinubabu
2011-01-01
"-Omics" is a current suffix for numerous types of large-scale biological data generation procedures, which naturally demand the development of novel algorithms for data storage and analysis. With next generation genome sequencing burgeoning, it is pivotal to decipher a coding site on the genome, a gene's function, and information on transcripts next to the pure availability of sequence information. To explore a genome and downstream molecular processes, we need umpteen results at the various levels of cellular organization by utilizing different experimental designs, data analysis strategies and methodologies. Here comes the need for controlled vocabularies and data integration to annotate, store, and update the flow of experimental data. This chapter explores key methodologies to merge Omics data by semantic data carriers, discusses controlled vocabularies as eXtensible Markup Languages (XML), and provides practical guidance, databases, and software links supporting the integration of Omics data.
Kawano, Yasuhiro; Neeley, Shane; Adachi, Kei; Nakai, Hiroyuki
2013-01-01
Overlapping open reading frames (ORFs) in viral genomes undergo co-evolution; however, how individual amino acids coded by overlapping ORFs are structurally, functionally, and co-evolutionarily constrained remains difficult to address by conventional homologous sequence alignment approaches. We report here a new experimental and computational evolution-based methodology to address this question and report its preliminary application to elucidating a mode of co-evolution of the frame-shifted overlapping ORFs in the adeno-associated virus (AAV) serotype 2 viral genome. These ORFs encode both capsid VP protein and non-structural assembly-activating protein (AAP). To show proof of principle of the new method, we focused on the evolutionarily conserved QVKEVTQ and KSKRSRR motifs, a pair of overlapping heptapeptides in VP and AAP, respectively. In the new method, we first identified a large number of capsid-forming VP3 mutants and functionally competent AAP mutants of these motifs from mutant libraries by experimental directed evolution under no co-evolutionary constraints. We used Illumina sequencing to obtain a large dataset and then statistically assessed the viability of VP and AAP heptapeptide mutants. The obtained heptapeptide information was then integrated into an evolutionary algorithm, with which VP and AAP were co-evolved from random or native nucleotide sequences in silico. As a result, we demonstrate that these two heptapeptide motifs could exhibit high degeneracy if coded by separate nucleotide sequences, and elucidate how overlap-evoked co-evolutionary constraints play a role in making the VP and AAP heptapeptide sequences into the present shape. Specifically, we demonstrate that two valine (V) residues and β-strand propensity in QVKEVTQ are structurally important, the strongly negative and hydrophilic nature of KSKRSRR is functionally important, and overlap-evoked co-evolution imposes strong constraints on serine (S) residues in KSKRSRR, despite high degeneracy of the motifs in the absence of co-evolutionary constraints.
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo
2018-01-01
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423
Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.
Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W
2017-02-01
Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Real-time UAV trajectory generation using feature points matching between video image sequences
NASA Astrophysics Data System (ADS)
Byun, Younggi; Song, Jeongheon; Han, Dongyeob
2017-09-01
Unmanned aerial vehicles (UAVs), equipped with navigation systems and video capability, are currently being deployed for intelligence, reconnaissance and surveillance mission. In this paper, we present a systematic approach for the generation of UAV trajectory using a video image matching system based on SURF (Speeded up Robust Feature) and Preemptive RANSAC (Random Sample Consensus). Video image matching to find matching points is one of the most important steps for the accurate generation of UAV trajectory (sequence of poses in 3D space). We used the SURF algorithm to find the matching points between video image sequences, and removed mismatching by using the Preemptive RANSAC which divides all matching points to outliers and inliers. The inliers are only used to determine the epipolar geometry for estimating the relative pose (rotation and translation) between image sequences. Experimental results from simulated video image sequences showed that our approach has a good potential to be applied to the automatic geo-localization of the UAVs system
Effects of sequence on DNA wrapping around histones
NASA Astrophysics Data System (ADS)
Ortiz, Vanessa
2011-03-01
A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).
Computational modeling of RNA 3D structures, with the aid of experimental restraints
Magnus, Marcin; Matelska, Dorota; Łach, Grzegorz; Chojnowski, Grzegorz; Boniecki, Michal J; Purta, Elzbieta; Dawson, Wayne; Dunin-Horkawicz, Stanislaw; Bujnicki, Janusz M
2014-01-01
In addition to mRNAs whose primary function is transmission of genetic information from DNA to proteins, numerous other classes of RNA molecules exist, which are involved in a variety of functions, such as catalyzing biochemical reactions or performing regulatory roles. In analogy to proteins, the function of RNAs depends on their structure and dynamics, which are largely determined by the ribonucleotide sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore, the majority of known RNAs remain structurally uncharacterized. To address this problem, computational structure prediction methods were developed that simulate either the physical process of RNA structure formation (“Greek science” approach) or utilize information derived from known structures of other RNA molecules (“Babylonian science” approach). All computational methods suffer from various limitations that make them generally unreliable for structure prediction of long RNA sequences. However, in many cases, the limitations of computational and experimental methods can be overcome by combining these two complementary approaches with each other. In this work, we review computational approaches for RNA structure prediction, with emphasis on implementations (particular programs) that can utilize restraints derived from experimental analyses. We also list experimental approaches, whose results can be relatively easily used by computational methods. Finally, we describe case studies where computational and experimental analyses were successfully combined to determine RNA structures that would remain out of reach for each of these approaches applied separately. PMID:24785264
Adaptive precompensators for flexible-link manipulator control
NASA Technical Reports Server (NTRS)
Tzes, Anthony P.; Yurkovich, Stephen
1989-01-01
The application of input precompensators to flexible manipulators is considered. Frequency domain compensators color the input around the flexible mode locations, resulting in a bandstop or notch filter in cascade with the system. Time domain compensators apply a sequence of impulses at prespecified times related to the modal frequencies. The resulting control corresponds to a feedforward term that convolves in real-time the desired reference input with a sequence of impulses and produces a vibration-free output. An adaptive precompensator can be implemented by combining a frequency domain identification scheme which is used to estimate online the modal frequencies and subsequently update the bandstop interval or the spacing between the impulses. The combined adaptive input preshaping scheme provides the most rapid slew that results in a vibration-free output. Experimental results are presented to verify the results.
Wavelet Fusion for Concealed Object Detection Using Passive Millimeter Wave Sequence Images
NASA Astrophysics Data System (ADS)
Chen, Y.; Pang, L.; Liu, H.; Xu, X.
2018-04-01
PMMW imaging system can create interpretable imagery on the objects concealed under clothing, which gives the great advantage to the security check system. Paper addresses wavelet fusion to detect concealed objects using passive millimeter wave (PMMW) sequence images. According to PMMW real-time imager acquired image characteristics and storage methods firstly, using the sum of squared difference (SSD) as the image-related parameters to screen the sequence images. Secondly, the selected images are optimized using wavelet fusion algorithm. Finally, the concealed objects are detected by mean filter, threshold segmentation and edge detection. The experimental results show that this method improves the detection effect of concealed objects by selecting the most relevant images from PMMW sequence images and using wavelet fusion to enhance the information of the concealed objects. The method can be effectively applied to human body concealed object detection in millimeter wave video.
Motion Estimation Using the Firefly Algorithm in Ultrasonic Image Sequence of Soft Tissue
Chao, Chih-Feng; Horng, Ming-Huwi; Chen, Yu-Chan
2015-01-01
Ultrasonic image sequence of the soft tissue is widely used in disease diagnosis; however, the speckle noises usually influenced the image quality. These images usually have a low signal-to-noise ratio presentation. The phenomenon gives rise to traditional motion estimation algorithms that are not suitable to measure the motion vectors. In this paper, a new motion estimation algorithm is developed for assessing the velocity field of soft tissue in a sequence of ultrasonic B-mode images. The proposed iterative firefly algorithm (IFA) searches for few candidate points to obtain the optimal motion vector, and then compares it to the traditional iterative full search algorithm (IFSA) via a series of experiments of in vivo ultrasonic image sequences. The experimental results show that the IFA can assess the vector with better efficiency and almost equal estimation quality compared to the traditional IFSA method. PMID:25873987
Motion estimation using the firefly algorithm in ultrasonic image sequence of soft tissue.
Chao, Chih-Feng; Horng, Ming-Huwi; Chen, Yu-Chan
2015-01-01
Ultrasonic image sequence of the soft tissue is widely used in disease diagnosis; however, the speckle noises usually influenced the image quality. These images usually have a low signal-to-noise ratio presentation. The phenomenon gives rise to traditional motion estimation algorithms that are not suitable to measure the motion vectors. In this paper, a new motion estimation algorithm is developed for assessing the velocity field of soft tissue in a sequence of ultrasonic B-mode images. The proposed iterative firefly algorithm (IFA) searches for few candidate points to obtain the optimal motion vector, and then compares it to the traditional iterative full search algorithm (IFSA) via a series of experiments of in vivo ultrasonic image sequences. The experimental results show that the IFA can assess the vector with better efficiency and almost equal estimation quality compared to the traditional IFSA method.
Denoising time-resolved microscopy image sequences with singular value thresholding.
Furnival, Tom; Leary, Rowan K; Midgley, Paul A
2017-07-01
Time-resolved imaging in microscopy is important for the direct observation of a range of dynamic processes in both the physical and life sciences. However, the image sequences are often corrupted by noise, either as a result of high frame rates or a need to limit the radiation dose received by the sample. Here we exploit both spatial and temporal correlations using low-rank matrix recovery methods to denoise microscopy image sequences. We also make use of an unbiased risk estimator to address the issue of how much thresholding to apply in a robust and automated manner. The performance of the technique is demonstrated using simulated image sequences, as well as experimental scanning transmission electron microscopy data, where surface adatom motion and nanoparticle structural dynamics are recovered at rates of up to 32 frames per second. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Bioinformatic Analysis of the Contribution of Primer Sequences to Aptamer Structures
Ellington, Andrew D.
2009-01-01
Aptamers are nucleic acid molecules selected in vitro to bind a particular ligand. While numerous experimental studies have examined the sequences, structures, and functions of individual aptamers, considerably fewer studies have applied bioinformatics approaches to try to infer more general principles from these individual studies. We have used a large Aptamer Database to parse the contributions of both random and constant regions to the secondary structures of more than 2000 aptamers. We find that the constant, primer-binding regions do not, in general, contribute significantly to aptamer structures. These results suggest that (a) binding function is not contributed to nor constrained by constant regions; (b) in consequence, the landscape of functional binding sequences is sparse but robust, favoring scenarios for short, functional nucleic acid sequences near origins; and (c) many pool designs for the selection of aptamers are likely to prove robust. PMID:18594898
Mapping Base Modifications in DNA by Transverse-Current Sequencing
NASA Astrophysics Data System (ADS)
Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.
2018-02-01
Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.
Structure and Sequence Search on Aptamer-Protein Docking
NASA Astrophysics Data System (ADS)
Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie
2015-03-01
Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.
De novo peptide sequencing using CID and HCD spectra pairs.
Yan, Yan; Kusalik, Anthony J; Wu, Fang-Xiang
2016-10-01
In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision-induced dissociation (CID) higher energy collisional dissociation (HCD), electron-capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full-length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Neural Sequence Generation Using Spatiotemporal Patterns of Inhibition.
Cannon, Jonathan; Kopell, Nancy; Gardner, Timothy; Markowitz, Jeffrey
2015-11-01
Stereotyped sequences of neural activity are thought to underlie reproducible behaviors and cognitive processes ranging from memory recall to arm movement. One of the most prominent theoretical models of neural sequence generation is the synfire chain, in which pulses of synchronized spiking activity propagate robustly along a chain of cells connected by highly redundant feedforward excitation. But recent experimental observations in the avian song production pathway during song generation have shown excitatory activity interacting strongly with the firing patterns of inhibitory neurons, suggesting a process of sequence generation more complex than feedforward excitation. Here we propose a model of sequence generation inspired by these observations in which a pulse travels along a spatially recurrent excitatory chain, passing repeatedly through zones of local feedback inhibition. In this model, synchrony and robust timing are maintained not through redundant excitatory connections, but rather through the interaction between the pulse and the spatiotemporal pattern of inhibition that it creates as it circulates the network. These results suggest that spatially and temporally structured inhibition may play a key role in sequence generation.
Evaluating the protein coding potential of exonized transposable element sequences
Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King
2007-01-01
Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.
2013-01-01
For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960
Combinatorial pooling enables selective sequencing of the barley gene space.
Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J
2013-04-01
For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.
Oxidation kinetics of a continuous carbon phase in a nonreactive matrix
NASA Technical Reports Server (NTRS)
Eckel, Andrew J.; Cawley, James D.; Parthasarathy, Triplicane A.
1995-01-01
Analytical solutions of and experimental results on the oxidation kinetics of carbon in a pore are presented. Reaction rate, reaction sequence, oxidant partial pressure, total system pressure, pore/crack dimensions, and temperature are analyzed with respect to the influence of each on an overall linear-parabolic rate relationship. Direct measurement of carbon recession is performed using two microcomposite model systems oxidized in the temperature range of 700 to 1200 C, and for times to 35 h. Experimental results are evaluated using the derived analytical solutions. Implications on the oxidation resistance of continuous-fiber-reinforced ceramic-matrix composites containing a carbon constituent are discussed.
Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling.
Schudoma, Christian; May, Patrick; Nikiforova, Viktoria; Walther, Dirk
2010-01-01
The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence-structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
The promise and challenge of high-throughput sequencing of the antibody repertoire
Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R
2014-01-01
Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474
The neural dynamics of song syntax in songbirds
NASA Astrophysics Data System (ADS)
Jin, Dezhe
2010-03-01
Songbird is ``the hydrogen atom'' of the neuroscience of complex, learned vocalizations such as human speech. Songs of Bengalese finch consist of sequences of syllables. While syllables are temporally stereotypical, syllable sequences can vary and follow complex, probabilistic syntactic rules, which are rudimentarily similar to grammars in human language. Songbird brain is accessible to experimental probes, and is understood well enough to construct biologically constrained, predictive computational models. In this talk, I will discuss the structure and dynamics of neural networks underlying the stereotypy of the birdsong syllables and the flexibility of syllable sequences. Recent experiments and computational models suggest that a syllable is encoded in a chain network of projection neurons in premotor nucleus HVC (proper name). Precisely timed spikes propagate along the chain, driving vocalization of the syllable through downstream nuclei. Through a computational model, I show that that variable syllable sequences can be generated through spike propagations in a network in HVC in which the syllable-encoding chain networks are connected into a branching chain pattern. The neurons mutually inhibit each other through the inhibitory HVC interneurons, and are driven by external inputs from nuclei upstream of HVC. At a branching point that connects the final group of a chain to the first groups of several chains, the spike activity selects one branch to continue the propagation. The selection is probabilistic, and is due to the winner-take-all mechanism mediated by the inhibition and noise. The model predicts that the syllable sequences statistically follow partially observable Markov models. Experimental results supporting this and other predictions of the model will be presented. We suggest that the syntax of birdsong syllable sequences is embedded in the connection patterns of HVC projection neurons.
Association of coral algal symbionts with a diverse viral community responsive to heat shock.
Brüwer, Jan D; Agrawal, Shobhit; Liew, Yi Jin; Aranda, Manuel; Voolstra, Christian R
2017-08-17
Stony corals provide the structural foundation of coral reef ecosystems and are termed holobionts given they engage in symbioses, in particular with photosynthetic dinoflagellates of the genus Symbiodinium. Besides Symbiodinium, corals also engage with bacteria affecting metabolism, immunity, and resilience of the coral holobiont, but the role of associated viruses is largely unknown. In this regard, the increase of studies using RNA sequencing (RNA-Seq) to assess gene expression provides an opportunity to elucidate viral signatures encompassed within the data via careful delineation of sequence reads and their source of origin. Here, we re-analyzed an RNA-Seq dataset from a cultured coral symbiont (Symbiodinium microadriaticum, Clade A1) across four experimental treatments (control, cold shock, heat shock, dark shock) to characterize associated viral diversity, abundance, and gene expression. Our approach comprised the filtering and removal of host sequence reads, subsequent phylogenetic assignment of sequence reads of putative viral origin, and the assembly and analysis of differentially expressed viral genes. About 15.46% (123 million) of all sequence reads were non-host-related, of which <1% could be classified as archaea, bacteria, or virus. Of these, 18.78% were annotated as virus and comprised a diverse community consistent across experimental treatments. Further, non-host related sequence reads assembled into 56,064 contigs, including 4856 contigs of putative viral origin that featured 43 differentially expressed genes during heat shock. The differentially expressed genes included viral kinases, ubiquitin, and ankyrin repeat proteins (amongst others), which are suggested to help the virus proliferate and inhibit the algal host's antiviral response. Our results suggest that a diverse viral community is associated with coral algal endosymbionts of the genus Symbiodinium, which prompts further research on their ecological role in coral health and resilience.
Bae, Daeryeong; Kim, Shino; Lee, Wonoh; Yi, Jin Woo; Um, Moon Kwang; Seong, Dong Gi
2018-05-21
A fast-cure carbon fiber/epoxy prepreg was thermoformed against a replicated automotive roof panel mold (square-cup) to investigate the effect of the stacking sequence of prepreg layers with unidirectional and plane woven fabrics and mold geometry with different drawing angles and depths on the fiber deformation and formability of the prepreg. The optimum forming condition was determined via analysis of the material properties of epoxy resin. The non-linear mechanical properties of prepreg at the deformation modes of inter- and intra-ply shear, tensile and bending were measured to be used as input data for the commercial virtual forming simulation software. The prepreg with a stacking sequence containing the plain-woven carbon prepreg on the outer layer of the laminate was successfully thermoformed against a mold with a depth of 20 mm and a tilting angle of 110°. Experimental results for the shear deformations at each corner of the thermoformed square-cup product were compared with the simulation and a similarity in the overall tendency of the shear angle in the path at each corner was observed. The results are expected to contribute to the optimization of parameters on materials, mold design and processing in the thermoforming mass-production process for manufacturing high quality automotive parts with a square-cup geometry.
Bae, Daeryeong; Kim, Shino; Lee, Wonoh; Yi, Jin Woo; Um, Moon Kwang; Seong, Dong Gi
2018-01-01
A fast-cure carbon fiber/epoxy prepreg was thermoformed against a replicated automotive roof panel mold (square-cup) to investigate the effect of the stacking sequence of prepreg layers with unidirectional and plane woven fabrics and mold geometry with different drawing angles and depths on the fiber deformation and formability of the prepreg. The optimum forming condition was determined via analysis of the material properties of epoxy resin. The non-linear mechanical properties of prepreg at the deformation modes of inter- and intra-ply shear, tensile and bending were measured to be used as input data for the commercial virtual forming simulation software. The prepreg with a stacking sequence containing the plain-woven carbon prepreg on the outer layer of the laminate was successfully thermoformed against a mold with a depth of 20 mm and a tilting angle of 110°. Experimental results for the shear deformations at each corner of the thermoformed square-cup product were compared with the simulation and a similarity in the overall tendency of the shear angle in the path at each corner was observed. The results are expected to contribute to the optimization of parameters on materials, mold design and processing in the thermoforming mass-production process for manufacturing high quality automotive parts with a square-cup geometry. PMID:29883413
Simultaneous flow of water and solutes through geological membranes-I. Experimental investigation
Kharaka, Y.K.; Berry, F.A.P.
1973-01-01
The relative retardation by geological membranes of cations and anions generally present in subsurface waters was investigated using a high pressure and high temperature 'filtration cell'. The solutions were forced through different clays and a disaggregated shale subjected to compaction pressures up to 9500 psi and to temperatures from 20 to 70??C. The overall efficiences measured increased with increase of exchange capacity of the material used and with decrease in concentration of the input solution. The efficiency of a given membrane increased with increasing compaction pressure but decreased slightly at higher temperatures for solutions of the same ionic concentration. The results further show that geological membranes are specific for different dissolved species. The retardation sequences varied depending on the material used and on experimental conditions. The sequences for monovalent and divalent cations at laboratory temperatures were generally as follows: Li < Na < NH3 < K < Rb < Cs Mg < Ca < Sr < Ba. The sequences for anions at room temperature were variable, but at 70??C, the sequence was: HCO3 < I < B < SO4 < Cl < Br. Monovalent cations contrary to some field data were generally retarded with respect to divalent cations. The differences in the filtration ratios among the divalent cations were smaller than those between the monovalent cations. The passage rate of B, HCO3, I and NH3 was greatly increased at 70??C. ?? 1973.
Løvoll, Marie; Wiik-Nielsen, Jannicke; Grove, Søren; Wiik-Nielsen, Christer R; Kristoffersen, Anja B; Faller, Randi; Poppe, Trygve; Jung, Joonil; Pedamallu, Chandra S; Nederbragt, Alexander J; Meyerson, Matthew; Rimstad, Espen; Tengs, Torstein
2010-11-10
Cardiomyopathy syndrome (CMS) is a severe disease affecting large farmed Atlantic salmon. Mortality often appears without prior clinical signs, typically shortly prior to slaughter. We recently reported the finding and the complete genomic sequence of a novel piscine reovirus (PRV), which is associated with another cardiac disease in Atlantic salmon; heart and skeletal muscle inflammation (HSMI). In the present work we have studied whether PRV or other infectious agents may be involved in the etiology of CMS. Using high throughput sequencing on heart samples from natural outbreaks of CMS and from fish experimentally challenged with material from fish diagnosed with CMS a high number of sequence reads identical to the PRV genome were identified. In addition, a sequence contig from a novel totivirus could also be constructed. Using RT-qPCR, levels of PRV in tissue samples were quantified and the totivirus was detected in all samples tested from CMS fish but not in controls. In situ hybridization supported this pattern indicating a possible association between CMS and the novel piscine totivirus. Although causality for CMS in Atlantic salmon could not be proven for either of the two viruses, our results are compatible with a hypothesis where, in the experimental challenge studied, PRV behaves as an opportunist whereas the totivirus might be more directly linked with the development of CMS.
The Emergence of Sub-Syllabic Representations
ERIC Educational Resources Information Center
Lee, Yongeun; Goldrick, Matthew
2008-01-01
In a variety of experimental paradigms speakers do not treat all sub-syllabic sequences equally. In languages like English, participants tend to group vowels and codas together to the exclusion of onsets (i.e., /bet/=/b/-/et/). Three possible accounts of these patterns are examined. A hierarchical account attributes these results to the presence…
Teaching/Learning Geometric Transformations in High-School with DGS
ERIC Educational Resources Information Center
Ferrarello, Daniela; Mammana, Maria Flavia; Pennisi, Mario
2014-01-01
In this paper, we present the results of an experimental sequence of classroom activities on geometric transformations, proposed to high-school students. The activity is based on the use of a dynamic geometry system. Planned by a group of four people, two university professors and two research teachers, the activity has been elaborated together…
Dai, Qi; Yang, Yanchun; Wang, Tianming
2008-10-15
Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.
Levels of integration in cognitive control and sequence processing in the prefrontal cortex.
Bahlmann, Jörg; Korb, Franziska M; Gratton, Caterina; Friederici, Angela D
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex.
Levels of Integration in Cognitive Control and Sequence Processing in the Prefrontal Cortex
Bahlmann, Jörg; Korb, Franziska M.; Gratton, Caterina; Friederici, Angela D.
2012-01-01
Cognitive control is necessary to flexibly act in changing environments. Sequence processing is needed in language comprehension to build the syntactic structure in sentences. Functional imaging studies suggest that sequence processing engages the left ventrolateral prefrontal cortex (PFC). In contrast, cognitive control processes additionally recruit bilateral rostral lateral PFC regions. The present study aimed to investigate these two types of processes in one experimental paradigm. Sequence processing was manipulated using two different sequencing rules varying in complexity. Cognitive control was varied with different cue-sets that determined the choice of a sequencing rule. Univariate analyses revealed distinct PFC regions for the two types of processing (i.e. sequence processing: left ventrolateral PFC and cognitive control processing: bilateral dorsolateral and rostral PFC). Moreover, in a common brain network (including left lateral PFC and intraparietal sulcus) no interaction between sequence and cognitive control processing was observed. In contrast, a multivariate pattern analysis revealed an interaction of sequence and cognitive control processing, such that voxels in left lateral PFC and parietal cortex showed different tuning functions for tasks involving different sequencing and cognitive control demands. These results suggest that the difference between the process of rule selection (i.e. cognitive control) and the process of rule-based sequencing (i.e. sequence processing) find their neuronal underpinnings in distinct activation patterns in lateral PFC. Moreover, the combination of rule selection and rule sequencing can shape the response of neurons in lateral PFC and parietal cortex. PMID:22952762
Constructing storyboards based on hierarchical clustering analysis
NASA Astrophysics Data System (ADS)
Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu
2005-07-01
There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.
Measurements and predictions of the 6s6p{sup 1,3}P{sub 1} lifetimes in the Hg isoelectronic sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, L. J.; Irving, R. E.; Henderson, M.
2001-04-01
Experimental and theoretical values for the lifetimes of the 6s6p{sup 1}P{sub 1} and {sup 3}P{sub 1} levels in the Hg isoelectronic sequence are examined in the context of a data-based isoelectronic systematization. New beam-foil measurements for lifetimes in Pb III and Bi IV are reported and included in a critical evaluation of the available database. These results are combined with ab initio theoretical calculations and linearizing parametrizations to make predictive extrapolations for ions with 84{<=}Z{le}92.
In situ multi-axial loading frame to probe elastomers using X-ray scattering.
Pannier, Yannick; Proudhon, Henry; Mocuta, Cristian; Thiaudière, Dominique; Cantournet, Sabine
2011-11-01
An in situ tensile-shear loading device has been designed to study elastomer crystallization using synchrotron X-ray scattering at the Synchrotron Soleil on the DiffAbs beamline. Elastomer tape specimens of thickness 2 mm can be elongated by up to 500% in the longitudinal direction and sheared by up to 200% in the transverse direction. The device is fully automated and plugged into the TANGO control system of the beamline allowing synchronization between acquisition and loading sequences. Experimental results revealing the evolution of crystallization peaks under load are presented for several tension/shear loading sequences.
CuGene as a tool to view and explore genomic data
NASA Astrophysics Data System (ADS)
Haponiuk, Michał; Pawełkowicz, Magdalena; Przybecki, Zbigniew; Nowak, Robert M.
2017-08-01
Integrated CuGene is an easy-to-use, open-source, on-line tool that can be used to browse, analyze, and query genomic data and annotations. It places annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. It also allows users to upload and display their own experimental results or annotation sets. An important functionality of the application is a possibility to find similarity between sequences by applying four different algorithms of different accuracy. The presented tool was tested on real genomic data and is extensively used by Polish Consortium of Cucumber Genome Sequencing.
The twilight zone of cis element alignments.
Sebastian, Alvaro; Contreras-Moreira, Bruno
2013-02-01
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
The twilight zone of cis element alignments
Sebastian, Alvaro; Contreras-Moreira, Bruno
2013-01-01
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments. PMID:23268451
Sequence diversity of wheat mosaic virus isolates.
Stewart, Lucy R
2016-02-02
Wheat mosaic virus (WMoV), transmitted by eriophyid wheat curl mites (Aceria tosichella) is the causal agent of High Plains disease in wheat and maize. WMoV and other members of the genus Emaravirus evaded thorough molecular characterization for many years due to the experimental challenges of mite transmission and manipulating multisegmented negative sense RNA genomes. Recently, the complete genome sequence of a Nebraska isolate of WMoV revealed eight segments, plus a variant sequence of the nucleocapsid protein-encoding segment. Here, near-complete and partial consensus sequences of five more WMoV isolates are reported and compared to the Nebraska isolate: an Ohio maize isolate (GG1), a Kansas barley isolate (KS7), and three Ohio wheat isolates (H1, K1, W1). Results show two distinct groups of WMoV isolates: Ohio wheat isolate RNA segments had 84% or lower nucleotide sequence identity to the NE isolate, whereas GG1 and KS7 had 98% or higher nucleotide sequence identity to the NE isolate. Knowledge of the sequence variability of WMoV isolates is a step toward understanding virus biology, and potentially explaining observed biological variation. Published by Elsevier B.V.
Chamrad, Daniel C; Körting, Gerhard; Schäfer, Heike; Stephan, Christian; Thiele, Herbert; Apweiler, Rolf; Meyer, Helmut E; Marcus, Katrin; Blüggel, Martin
2006-09-01
A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.
The ENCODE Project at UC Santa Cruz.
Thomas, Daryl J; Rosenbloom, Kate R; Clawson, Hiram; Hinrichs, Angie S; Trumbower, Heather; Raney, Brian J; Karolchik, Donna; Barber, Galt P; Harte, Rachel A; Hillman-Jackson, Jennifer; Kuhn, Robert M; Rhead, Brooke L; Smith, Kayla E; Thakkapallayil, Archana; Zweig, Ann S; Haussler, David; Kent, W James
2007-01-01
The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.
Cavitation-induced fragmentation of an acoustically-levitated droplet
NASA Astrophysics Data System (ADS)
Gonzalez Avila, Silvestre Roberto; Ohl, Claus-Dieter
2015-12-01
In this paper we investigate the initial sequence of events that lead to the fragmentation of a millimetre sized water droplets when interacting with a focused ns-laser pulse. The experimental results show complex processes that result from the reflection of an initial shock wave from plasma generation with the soft boundary of the levitating droplet; furthermore, when the reflected waves from the walls of the droplet refocus they leave behind a trail of microbubbles that later act as cavitation inception regions. Numerical simulations of a shock wave impacting and reflecting from a soft boundary are also reported; the simulated results show that the lowest pressure inside the droplet occurs at the equatorial plane. The results of the numerical model display good agreement with the experimental results both in time and in space.
NASA Technical Reports Server (NTRS)
Rogers, J. W.
1975-01-01
The results of an experimental investigation on recording information on thermoplastic are given. A description was given of a typical fabrication configuration, the recording sequence, and the samples which were examined. There are basically three configurations which can be used for the recording of information on thermoplastic. The most popular technique uses corona which furnishes free charge. The necessary energy for deformation is derived from a charge layer atop the thermoplastic. The other two techniques simply use a dc potential in place of the corona for deformation energy.
Outer planet Grand Tour missions photometry/polarimetry experiment critical components study
NASA Technical Reports Server (NTRS)
Pellicori, S. F.; Russell, E. E.; Watts, L. A.
1972-01-01
Work performed during this effort was limited to two primary areas of technical concern: optical design optimization, and sensor selection. An optical system concept was established, and various system components were evaluated through experimental test sequences. Photodetectors were investigated for the applicability in meeting OPGT requirements as constrained by the photometry/polarimetry team directives. The most promising (gallium arsenide PMT) was further experimentally tested to ascertain its behavior with respect to anticipated environmental conditions. Results of testing and summary of the preceding tradeoff study effort are presented.
BPP: a sequence-based algorithm for branch point prediction.
Zhang, Qing; Fan, Xiaodan; Wang, Yejun; Sun, Ming-An; Shao, Jianlin; Guo, Dianjing
2017-10-15
Although high-throughput sequencing methods have been proposed to identify splicing branch points in the human genome, these methods can only detect a small fraction of the branch points subject to the sequencing depth, experimental cost and the expression level of the mRNA. An accurate computational model for branch point prediction is therefore an ongoing objective in human genome research. We here propose a novel branch point prediction algorithm that utilizes information on the branch point sequence and the polypyrimidine tract. Using experimentally validated data, we demonstrate that our proposed method outperforms existing methods. Availability and implementation: https://github.com/zhqingit/BPP. djguo@cuhk.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
USDA-ARS?s Scientific Manuscript database
Many studies leverage targeted whole genome sequencing (WGS) experiments in order to identify rare and causal variants within populations. As a natural consequence of experimental design, many of these surveys tend to sequence redundant haplotype segments due to high frequency in the base population...
QCScreen: a software tool for data quality control in LC-HRMS based metabolomics.
Simader, Alexandra Maria; Kluger, Bernhard; Neumann, Nora Katharina Nicole; Bueschl, Christoph; Lemmens, Marc; Lirk, Gerald; Krska, Rudolf; Schuhmacher, Rainer
2015-10-24
Metabolomics experiments often comprise large numbers of biological samples resulting in huge amounts of data. This data needs to be inspected for plausibility before data evaluation to detect putative sources of error e.g. retention time or mass accuracy shifts. Especially in liquid chromatography-high resolution mass spectrometry (LC-HRMS) based metabolomics research, proper quality control checks (e.g. for precision, signal drifts or offsets) are crucial prerequisites to achieve reliable and comparable results within and across experimental measurement sequences. Software tools can support this process. The software tool QCScreen was developed to offer a quick and easy data quality check of LC-HRMS derived data. It allows a flexible investigation and comparison of basic quality-related parameters within user-defined target features and the possibility to automatically evaluate multiple sample types within or across different measurement sequences in a short time. It offers a user-friendly interface that allows an easy selection of processing steps and parameter settings. The generated results include a coloured overview plot of data quality across all analysed samples and targets and, in addition, detailed illustrations of the stability and precision of the chromatographic separation, the mass accuracy and the detector sensitivity. The use of QCScreen is demonstrated with experimental data from metabolomics experiments using selected standard compounds in pure solvent. The application of the software identified problematic features, samples and analytical parameters and suggested which data files or compounds required closer manual inspection. QCScreen is an open source software tool which provides a useful basis for assessing the suitability of LC-HRMS data prior to time consuming, detailed data processing and subsequent statistical analysis. It accepts the generic mzXML format and thus can be used with many different LC-HRMS platforms to process both multiple quality control sample types as well as experimental samples in one or more measurement sequences.
Provost, Jean; Gurev, Viatcheslav; Trayanova, Natalia; Konofagou, Elisa E.
2011-01-01
Background Electromechanical Wave Imaging (EWI) is an entirely non-invasive, ultrasound-based imaging method capable of mapping the electromechanical activation sequence of the ventricles in vivo. Given the broad accessibility of ultrasound scanners in the clinic, the application of EWI could constitute a flexible surrogate for the 3D electrical activation. Objective The purpose of this report is to reproduce the electromechanical wave (EW) using an anatomically-realistic electromechanical model, and establish the capability of EWI to map the electrical activation sequence in vivo when pacing from different locations. Methods EWI was performed in one canine during pacing from three different sites. A high-resolution dynamic model of coupled cardiac electromechanics of the canine heart was used to predict the experimentally recorded electromechanical wave. The simulated 3D electrical activation sequence was then compared with the experimental EW. Results The electrical activation sequence and the EW were highly correlated for all pacing sites. The relationship between the electrical activation and the EW onset was found to be linear with a slope of 1.01 to 1.17 for different pacing schemes and imaging angles. Conclusions The accurate reproduction of the EW in simulations indicates that the model framework is capable of accurately representing the cardiac electromechanics and thus testing new hypotheses. The one-to-one correspondence between the electrical activation sequence and the EW indicates that EWI could be used to map the cardiac electrical activity. This opens the door for further exploration of the technique in assisting in the early detection, diagnosis and treatment monitoring of rhythm dysfunction. PMID:21185403
Christensen, Signe; Horowitz, Scott; Bardwell, James C.A.; Olsen, Johan G.; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.
2017-01-01
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations. PMID:27659562
Johansson, Kristoffer E; Tidemand Johansen, Nicolai; Christensen, Signe; Horowitz, Scott; Bardwell, James C A; Olsen, Johan G; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R
2016-10-23
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S
2011-11-30
Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/
Single-molecule protein sequencing through fingerprinting: computational assessment
NASA Astrophysics Data System (ADS)
Yao, Yao; Docter, Margreet; van Ginkel, Jetty; de Ridder, Dick; Joo, Chirlmin
2015-10-01
Proteins are vital in all biological systems as they constitute the main structural and functional components of cells. Recent advances in mass spectrometry have brought the promise of complete proteomics by helping draft the human proteome. Yet, this commonly used protein sequencing technique has fundamental limitations in sensitivity. Here we propose a method for single-molecule (SM) protein sequencing. A major challenge lies in the fact that proteins are composed of 20 different amino acids, which demands 20 molecular reporters. We computationally demonstrate that it suffices to measure only two types of amino acids to identify proteins and suggest an experimental scheme using SM fluorescence. When achieved, this highly sensitive approach will result in a paradigm shift in proteomics, with major impact in the biological and medical sciences.
NASA Astrophysics Data System (ADS)
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Kobeissy, Firas
2017-01-01
The crucial biological role of proteases has been visible with the development of degradomics discipline involved in the determination of the proteases/substrates resulting in breakdown-products (BDPs) that can be utilized as putative biomarkers associated with different biological-clinical significance. In the field of cancer biology, matrix metalloproteinases (MMPs) have shown to result in MMPs-generated protein BDPs that are indicative of malignant growth in cancer, while in the field of neural injury, calpain-2 and caspase-3 proteases generate BDPs fragments that are indicative of different neural cell death mechanisms in different injury scenarios. Advanced proteomic techniques have shown a remarkable progress in identifying these BDPs experimentally. In this work, we present a bioinformatics-based prediction method that identifies protease-associated BDPs with high precision and efficiency. The method utilizes state-of-the-art sequence matching and alignment algorithms. It starts by locating consensus sequence occurrences and their variants in any set of protein substrates, generating all fragments resulting from cleavage. The complexity exists in space O(mn) as well as in O(Nmn) time, where N, m, and n are the number of protein sequences, length of the consensus sequence, and length per protein sequence, respectively. Finally, the proposed methodology is validated against βII-spectrin protein, a brain injury validated biomarker.
The DNA-encoded nucleosome organization of a eukaryotic genome.
Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran
2009-03-19
Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
2014-01-01
Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Gao, Qing; Srinivasan, Girish; Magin, Richard L; Zhou, Xiaohong Joe
2011-05-01
To theoretically develop and experimentally validate a formulism based on a fractional order calculus (FC) diffusion model to characterize anomalous diffusion in brain tissues measured with a twice-refocused spin-echo (TRSE) pulse sequence. The FC diffusion model is the fractional order generalization of the Bloch-Torrey equation. Using this model, an analytical expression was derived to describe the diffusion-induced signal attenuation in a TRSE pulse sequence. To experimentally validate this expression, a set of diffusion-weighted (DW) images was acquired at 3 Tesla from healthy human brains using a TRSE sequence with twelve b-values ranging from 0 to 2600 s/mm(2). For comparison, DW images were also acquired using a Stejskal-Tanner diffusion gradient in a single-shot spin-echo echo planar sequence. For both datasets, a Levenberg-Marquardt fitting algorithm was used to extract three parameters: diffusion coefficient D, fractional order derivative in space β, and a spatial parameter μ (in units of μm). Using adjusted R-squared values and standard deviations, D, β, and μ values and the goodness-of-fit in three specific regions of interest (ROIs) in white matter, gray matter, and cerebrospinal fluid, respectively, were evaluated for each of the two datasets. In addition, spatially resolved parametric maps were assessed qualitatively. The analytical expression for the TRSE sequence, derived from the FC diffusion model, accurately characterized the diffusion-induced signal loss in brain tissues at high b-values. In the selected ROIs, the goodness-of-fit and standard deviations for the TRSE dataset were comparable with the results obtained from the Stejskal-Tanner dataset, demonstrating the robustness of the FC model across multiple data acquisition strategies. Qualitatively, the D, β, and μ maps from the TRSE dataset exhibited fewer artifacts, reflecting the improved immunity to eddy currents. The diffusion-induced signal attenuation in a TRSE pulse sequence can be described by an FC diffusion model at high b-values. This model performs equally well for data acquired from the human brain tissues with a TRSE pulse sequence or a conventional Stejskal-Tanner sequence. Copyright © 2011 Wiley-Liss, Inc.
Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality
NASA Astrophysics Data System (ADS)
Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.
2018-03-01
This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.
Buckling and Failure of Compression-Loaded Composite Laminated Shells With Cutouts
NASA Technical Reports Server (NTRS)
Hilburger, Mark W.
2007-01-01
Results from a numerical and experimental study that illustrate the effects of laminate orthotropy on the buckling and failure response of compression-loaded composite cylindrical shells with a cutout are presented. The effects of orthotropy on the overall response of compression-loaded shells is described. In general, preliminary numerical results appear to accurately predict the buckling and failure characteristics of the shell considered herein. In particular, some of the shells exhibit stable post-local-buckling behavior accompanied by interlaminar material failures near the free edges of the cutout. In contrast another shell with a different laminate stacking sequence appears to exhibit catastrophic interlaminar material failure at the onset of local buckling near the cutout and this behavior correlates well with corresponding experimental results.
A computational proposal for designing structured RNA pools for in vitro selection of RNAs.
Kim, Namhee; Gan, Hin Hark; Schlick, Tamar
2007-04-01
Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.
A remark on copy number variation detection methods.
Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin
2018-01-01
Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samudrala, Ram; Heffron, Fred; McDermott, Jason E.
2009-04-24
The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates, effector proteins, are not. We have used a machine learning approach to identify new secreted effectors. The method integrates evolutionary measures, such as the pattern of homologs in a range of other organisms, and sequence-based features, such as G+C content, amino acid composition and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from Salmonella typhimurium and validated on a corresponding set of effectors from Pseudomonas syringae, aftermore » eliminating effectors with detectable sequence similarity. The method was able to identify all of the known effectors in P. syringae with a specificity of 84% and sensitivity of 82%. The reciprocal validation, training on P. syringae and validating on S. typhimurium, gave similar results with a specificity of 86% when the sensitivity level was 87%. These results show that type III effectors in disparate organisms share common features. We found that maximal performance is attained by including an N-terminal sequence of only 30 residues, which agrees with previous studies indicating that this region contains the secretion signal. We then used the method to define the most important residues in this putative secretion signal. Finally, we present novel predictions of secreted effectors in S. typhimurium, some of which have been experimentally validated, and apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis. This approach is a novel and effective way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.« less
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
2011-01-01
Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108
2013-01-01
Background The wild grass Brachypodium distachyon has emerged as a model system for temperate grasses and biofuel plants. However, the global analysis of miRNAs, molecules known to be key for eukaryotic gene regulation, has been limited in B. distachyon to studies examining a few samples or that rely on computational predictions. Similarly an in-depth global analysis of miRNA-mediated target cleavage using parallel analysis of RNA ends (PARE) data is lacking in B. distachyon. Results B. distachyon small RNAs were cloned and deeply sequenced from 17 libraries that represent different tissues and stresses. Using a computational pipeline, we identified 116 miRNAs including not only conserved miRNAs that have not been reported in B. distachyon, but also non-conserved miRNAs that were not found in other plants. To investigate miRNA-mediated cleavage function, four PARE libraries were constructed from key tissues and sequenced to a total depth of approximately 70 million sequences. The roughly 5 million distinct genome-matched sequences that resulted represent an extensive dataset for analyzing small RNA-guided cleavage events. Analysis of the PARE and miRNA data provided experimental evidence for miRNA-mediated cleavage of 264 sites in predicted miRNA targets. In addition, PARE analysis revealed that differentially expressed miRNAs in the same family guide specific target RNA cleavage in a correspondingly tissue-preferential manner. Conclusions B. distachyon miRNAs and target RNAs were experimentally identified and analyzed. Knowledge gained from this study should provide insights into the roles of miRNAs and the regulation of their targets in B. distachyon and related plants. PMID:24367943
Ruff, Kiersten M.; Harmon, Tyler S.; Pappu, Rohit V.
2015-01-01
We report the development and deployment of a coarse-graining method that is well suited for computer simulations of aggregation and phase separation of protein sequences with block-copolymeric architectures. Our algorithm, named CAMELOT for Coarse-grained simulations Aided by MachinE Learning Optimization and Training, leverages information from converged all atom simulations that is used to determine a suitable resolution and parameterize the coarse-grained model. To parameterize a system-specific coarse-grained model, we use a combination of Boltzmann inversion, non-linear regression, and a Gaussian process Bayesian optimization approach. The accuracy of the coarse-grained model is demonstrated through direct comparisons to results from all atom simulations. We demonstrate the utility of our coarse-graining approach using the block-copolymeric sequence from the exon 1 encoded sequence of the huntingtin protein. This sequence comprises of 17 residues from the N-terminal end of huntingtin (N17) followed by a polyglutamine (polyQ) tract. Simulations based on the CAMELOT approach are used to show that the adsorption and unfolding of the wild type N17 and its sequence variants on the surface of polyQ tracts engender a patchy colloid like architecture that promotes the formation of linear aggregates. These results provide a plausible explanation for experimental observations, which show that N17 accelerates the formation of linear aggregates in block-copolymeric N17-polyQ sequences. The CAMELOT approach is versatile and is generalizable for simulating the aggregation and phase behavior of a range of block-copolymeric protein sequences. PMID:26723608
ERIC Educational Resources Information Center
Davis, Charles E.; And Others
A coherent system of decision making is described that may be incorporated into an instructional sequence to provide a supplement to the experience-based judgment of the classroom teacher. The elements of this decision process incorporate prior information such as a teacher's past experience, experimental results such as a test score, and…
rpiCOOL: A tool for In Silico RNA-protein interaction detection using random forest.
Akbaripour-Elahabad, Mohammad; Zahiri, Javad; Rafeh, Reza; Eslami, Morteza; Azari, Mahboobeh
2016-08-07
Understanding the principle of RNA-protein interactions (RPIs) is of critical importance to provide insights into post-transcriptional gene regulation and is useful to guide studies about many complex diseases. The limitations and difficulties associated with experimental determination of RPIs, call an urgent need to computational methods for RPI prediction. In this paper, we proposed a machine learning method to detect RNA-protein interactions based on sequence information. We used motif information and repetitive patterns, which have been extracted from experimentally validated RNA-protein interactions, in combination with sequence composition as descriptors to build a model to RPI prediction via a random forest classifier. About 20% of the "sequence motifs" and "nucleotide composition" features have been selected as the informative features with the feature selection methods. These results suggest that these two feature types contribute effectively in RPI detection. Results of 10-fold cross-validation experiments on three non-redundant benchmark datasets show a better performance of the proposed method in comparison with the current state-of-the-art methods in terms of various performance measures. In addition, the results revealed that the accuracy of the RPI prediction methods could vary considerably across different organisms. We have implemented the proposed method, namely rpiCOOL, as a stand-alone tool with a user friendly graphical user interface (GUI) that enables the researchers to predict RNA-protein interaction. The rpiCOOL is freely available at http://biocool.ir/rpicool.html for non-commercial uses. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.
DeMaere, Matthew Z; Darling, Aaron E
2018-02-01
Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K
2010-01-01
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383
Current challenges in genome annotation through structural biology and bioinformatics.
Furnham, Nicholas; de Beer, Tjaart A P; Thornton, Janet M
2012-10-01
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry. Copyright © 2012. Published by Elsevier Ltd.
Stresses and deformations in angle-ply composite tubes
NASA Technical Reports Server (NTRS)
Rousseau, Carl Q.; Hyer, Michael W.; Tompkins, Stephen S.
1987-01-01
The stress and deformations in angle-ply composite tubes subjected to axisymmetric thermal loading were investigated both experimentally and analytically. For the theoretical portion a generalized plane strain elasticity analysis was developed. The analysis included mechanical and thermal loading, and temperature-dependent material properties. The elasticity analysis was also used to study the effect of including a thin metallic coating on a graphite-epoxy tube. The stresses in the coatings were found to be quite high, exceeding the yield stress of aluminum. An important finding in the analytical studies was the fact that even tubes with a balanced-symmetric lamination sequence exhibit shear deformation, or twist. For the experimental portion an apparatus was developed to measure torsional and axial response in the temperature range of 140 to 360 K. Eighteen specimens were tested, combining three material systems, eight lamination sequences, and three off-axis ply orientation angles. For the twist response, agreement between analysis and experiment was found to be good. The axial response of the tubes tested was found to be greater than predicted by a factor of three. As a result, it is recommended that the thermally induced axial deformations be investigated, both experimentally and analytically.
Unlocking Short Read Sequencing for Metagenomics
Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; ...
2010-07-28
We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H
2017-01-01
Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Image correlation method for DNA sequence alignment.
Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván
2012-01-01
The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kimelman, Aya; Levy, Asaf; Sberro, Hila
In the process of clone-based genome sequencing, initial assemblies frequently contain cloning gaps that can be resolved using cloning-independent methods, but the reason for their occurrence is largely unknown. By analyzing 9,328,693 sequencing clones from 393 microbial genomes we systematically mapped more than 15,000 genes residing in cloning gaps and experimentally showed that their expression products are toxic to the Escherichia coli host. A subset of these toxic sequences was further evaluated through a series of functional assays exploring the mechanisms of their toxicity. Among these genes our assays revealed novel toxins and restriction enzymes, and new classes of smallmore » non-coding toxic RNAs that reproducibly inhibit E. coli growth. Further analyses also revealed abundant, short toxic DNA fragments that were predicted to suppress E. coli growth by interacting with the replication initiator dnaA. Our results show that cloning gaps, once considered the result of technical problems, actually serve as a rich source for the discovery of biotechnologically valuable functions, and suggest new modes of antimicrobial interventions.« less
A novel chaotic image encryption scheme using DNA sequence operations
NASA Astrophysics Data System (ADS)
Wang, Xing-Yuan; Zhang, Ying-Qian; Bao, Xue-Mei
2015-10-01
In this paper, we propose a novel image encryption scheme based on DNA (Deoxyribonucleic acid) sequence operations and chaotic system. Firstly, we perform bitwise exclusive OR operation on the pixels of the plain image using the pseudorandom sequences produced by the spatiotemporal chaos system, i.e., CML (coupled map lattice). Secondly, a DNA matrix is obtained by encoding the confused image using a kind of DNA encoding rule. Then we generate the new initial conditions of the CML according to this DNA matrix and the previous initial conditions, which can make the encryption result closely depend on every pixel of the plain image. Thirdly, the rows and columns of the DNA matrix are permuted. Then, the permuted DNA matrix is confused once again. At last, after decoding the confused DNA matrix using a kind of DNA decoding rule, we obtain the ciphered image. Experimental results and theoretical analysis show that the scheme is able to resist various attacks, so it has extraordinarily high security.
Hierarchy and extremes in selections from pools of randomized proteins
Boyer, Sébastien; Biswas, Dipanwita; Kumar Soshee, Ananda; Scaramozzino, Natale; Nizak, Clément; Rivoire, Olivier
2016-01-01
Variation and selection are the core principles of Darwinian evolution, but quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display, and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings. First, libraries with the same sequence diversity but built around different “frameworks” typically have vastly different responses; second, the distribution of responses of the best binders in a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes the latter finding. Our results have implications for designing synthetic protein libraries, estimating the density of functional biomolecules in sequence space, characterizing diversity in natural populations, and experimentally investigating evolvability (i.e., the potential for future evolution). PMID:26969726
Hierarchy and extremes in selections from pools of randomized proteins.
Boyer, Sébastien; Biswas, Dipanwita; Kumar Soshee, Ananda; Scaramozzino, Natale; Nizak, Clément; Rivoire, Olivier
2016-03-29
Variation and selection are the core principles of Darwinian evolution, but quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display, and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings. First, libraries with the same sequence diversity but built around different "frameworks" typically have vastly different responses; second, the distribution of responses of the best binders in a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes the latter finding. Our results have implications for designing synthetic protein libraries, estimating the density of functional biomolecules in sequence space, characterizing diversity in natural populations, and experimentally investigating evolvability (i.e., the potential for future evolution).
NASA Astrophysics Data System (ADS)
Mirabi, Mohammad; Fatemi Ghomi, S. M. T.; Jolai, F.
2014-04-01
Flow-shop scheduling problem (FSP) deals with the scheduling of a set of n jobs that visit a set of m machines in the same order. As the FSP is NP-hard, there is no efficient algorithm to reach the optimal solution of the problem. To minimize the holding, delay and setup costs of large permutation flow-shop scheduling problems with sequence-dependent setup times on each machine, this paper develops a novel hybrid genetic algorithm (HGA) with three genetic operators. Proposed HGA applies a modified approach to generate a pool of initial solutions, and also uses an improved heuristic called the iterated swap procedure to improve the initial solutions. We consider the make-to-order production approach that some sequences between jobs are assumed as tabu based on maximum allowable setup cost. In addition, the results are compared to some recently developed heuristics and computational experimental results show that the proposed HGA performs very competitively with respect to accuracy and efficiency of solution.
Barbanera, Filippo; Guerrini, Monica; Bertoncini, Franco; Cappelli, Fabio; Muzzeddu, Marco; Dini, Fernando
2011-01-01
In the Alectoris partridges (Phasianidae), hybridization occurs occasionally as a result of the natural breakdown of isolating mechanisms but more frequently as a result of human activity. No genetic record of hybridization is known for the barbary partridge (A. barbara). This species is distributed mostly in North Africa and, in Europe, on the island of Sardinia (Italy) and on Gibraltar. The risk of hybridization between barbary and red-legged partridge (A. rufa: Iberian Peninsula, France, Italy) is high in Sardinia and in Spain. We developed two random amplified polymorphic DNA (RAPD) markers to detect A. barbara × A. rufa hybrid partridges. We tested them on 125 experimental hybrids, sequenced the relative species-specific bands and found that the bands and their corresponding sequences were reliably transmitted through a number of generations (F1, F2, F3, BC1, BC2). Our markers represent a highly valuable tool for the preservation of the A. barbara genome from the pressing threat of A. rufa pollution. © 2010 Blackwell Publishing Ltd.
Chaaraoui, Alexandros Andre; Flórez-Revuelta, Francisco
2014-01-01
This paper presents a novel silhouette-based feature for vision-based human action recognition, which relies on the contour of the silhouette and a radial scheme. Its low-dimensionality and ease of extraction result in an outstanding proficiency for real-time scenarios. This feature is used in a learning algorithm that by means of model fusion of multiple camera streams builds a bag of key poses, which serves as a dictionary of known poses and allows converting the training sequences into sequences of key poses. These are used in order to perform action recognition by means of a sequence matching algorithm. Experimentation on three different datasets returns high and stable recognition rates. To the best of our knowledge, this paper presents the highest results so far on the MuHAVi-MAS dataset. Real-time suitability is given, since the method easily performs above video frequency. Therefore, the related requirements that applications as ambient-assisted living services impose are successfully fulfilled.
Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.
2012-01-01
High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309
The quest for rare variants: pooled multiplexed next generation sequencing in plants.
Marroni, Fabio; Pinosio, Sara; Morgante, Michele
2012-01-01
Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by individual Sanger sequencing. The aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method, we will explain in detail the possible experimental and analytical approaches and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled NGS can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity, and Tajima's D. Finally, we will discuss applications and future perspectives of the multiplexed NGS approach.
Frequency-locked pulse sequencer for high-frame-rate monochromatic tissue motion imaging.
Azar, Reza Zahiri; Baghani, Ali; Salcudean, Septimiu E; Rohling, Robert
2011-04-01
To overcome the inherent low frame rate of conventional ultrasound, we have previously presented a system that can be implemented on conventional ultrasound scanners for high-frame-rate imaging of monochromatic tissue motion. The system employs a sector subdivision technique in the sequencer to increase the acquisition rate. To eliminate the delays introduced during data acquisition, a motion phase correction algorithm has also been introduced to create in-phase displacement images. Previous experimental results from tissue- mimicking phantoms showed that the system can achieve effective frame rates of up to a few kilohertz on conventional ultrasound systems. In this short communication, we present a new pulse sequencing strategy that facilitates high-frame-rate imaging of monochromatic motion such that the acquired echo signals are inherently in-phase. The sequencer uses the knowledge of the excitation frequency to synchronize the acquisition of the entire imaging plane to that of an external exciter. This sequencing approach eliminates any need for synchronization or phase correction and has applications in tissue elastography, which we demonstrate with tissue-mimicking phantoms. © 2011 IEEE
G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS.
Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin
2015-01-01
Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%.
G2LC: Resources Autoscaling for Real Time Bioinformatics Applications in IaaS
Hu, Rongdong; Liu, Guangming; Jiang, Jingfei; Wang, Lixin
2015-01-01
Cloud computing has started to change the way how bioinformatics research is being carried out. Researchers who have taken advantage of this technology can process larger amounts of data and speed up scientific discovery. The variability in data volume results in variable computing requirements. Therefore, bioinformatics researchers are pursuing more reliable and efficient methods for conducting sequencing analyses. This paper proposes an automated resource provisioning method, G2LC, for bioinformatics applications in IaaS. It enables application to output the results in a real time manner. Its main purpose is to guarantee applications performance, while improving resource utilization. Real sequence searching data of BLAST is used to evaluate the effectiveness of G2LC. Experimental results show that G2LC guarantees the application performance, while resource is saved up to 20.14%. PMID:26504488
Assembly-history dynamics of a pitcher-plant protozoan community in experimental microcosms.
Kadowaki, Kohmei; Inouye, Brian D; Miller, Thomas E
2012-01-01
History drives community assembly through differences both in density (density effects) and in the sequence in which species arrive (sequence effects). Density effects arise from predictable population dynamics, which are free of history, but sequence effects are due to a density-free mechanism, arising solely from the order and timing of immigration events. Few studies have determined how components of immigration history (timing, number of individuals, frequency) alter local dynamics to determine community assembly, beyond addressing when immigration history produces historically contingent assembly. We varied density and sequence effects independently in a two-way factorial design to follow community assembly in a three-species aquatic protozoan community. A superior competitor, Colpoda steinii, mediated alternative community states; early arrival or high introduction density allowed this species to outcompete or suppress the other competitors (Poterioochromonas malhamensis and Eimeriidae gen. sp.). Multivariate analysis showed that density effects caused greater variation in community states, whereas sequence effects altered the mean community composition. A significant interaction between density and sequence effects suggests that we should refine our understanding of priority effects. These results highlight a practical need to understand not only the "ingredients" (species) in ecological communities but their "recipes" as well.
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna; ...
2016-10-30
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less
Experimental Evidence and In Silico Identification of Tryptophan Decarboxylase in Citrus Genus.
De Masi, Luigi; Castaldo, Domenico; Pignone, Domenico; Servillo, Luigi; Facchiano, Angelo
2017-02-11
Plant tryptophan decarboxylase (TDC) converts tryptophan into tryptamine, precursor of indolealkylamine alkaloids. The recent finding of tryptamine metabolites in Citrus plants leads to hypothesize the existence of TDC activity in this genus. Here, we report for the first time that, in Citrus x limon seedlings, deuterium labeled tryptophan is decarboxylated into tryptamine, from which successively deuterated N , N , N -trimethyltryptamine is formed. These results give an evidence of the occurrence of the TDC activity and the successive methylation pathway of the tryptamine produced from the tryptophan decarboxylation. In addition, with the aim to identify the genetic basis for the presence of TDC, we carried out a sequence similarity search for TDC in the Citrus genomes using as a probe the TDC sequence reported for the plant Catharanthus roseus . We analyzed the genomes of both Citrus clementina and Citrus sinensis , available in public database, and identified putative protein sequences of aromatic l-amino acid decarboxylase. Similarly, 42 aromatic l-amino acid decarboxylase sequences from 23 plant species were extracted from public databases. Potential sequence signatures for functional TDC were then identified. With this research, we propose for the first time a putative protein sequence for TDC in the genus Citrus .
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less
Effects of temperature and mass conservation on the typical chemical sequences of hydrogen oxidation
NASA Astrophysics Data System (ADS)
Nicholson, Schuyler B.; Alaghemandi, Mohammad; Green, Jason R.
2018-01-01
Macroscopic properties of reacting mixtures are necessary to design synthetic strategies, determine yield, and improve the energy and atom efficiency of many chemical processes. The set of time-ordered sequences of chemical species are one representation of the evolution from reactants to products. However, only a fraction of the possible sequences is typical, having the majority of the joint probability and characterizing the succession of chemical nonequilibrium states. Here, we extend a variational measure of typicality and apply it to atomistic simulations of a model for hydrogen oxidation over a range of temperatures. We demonstrate an information-theoretic methodology to identify typical sequences under the constraints of mass conservation. Including these constraints leads to an improved ability to learn the chemical sequence mechanism from experimentally accessible data. From these typical sequences, we show that two quantities defining the variational typical set of sequences—the joint entropy rate and the topological entropy rate—increase linearly with temperature. These results suggest that, away from explosion limits, data over a narrow range of thermodynamic parameters could be sufficient to extrapolate these typical features of combustion chemistry to other conditions.
Gönner, Lorenz; Vitay, Julien; Hamker, Fred H.
2017-01-01
Hippocampal place-cell sequences observed during awake immobility often represent previous experience, suggesting a role in memory processes. However, recent reports of goals being overrepresented in sequential activity suggest a role in short-term planning, although a detailed understanding of the origins of hippocampal sequential activity and of its functional role is still lacking. In particular, it is unknown which mechanism could support efficient planning by generating place-cell sequences biased toward known goal locations, in an adaptive and constructive fashion. To address these questions, we propose a model of spatial learning and sequence generation as interdependent processes, integrating cortical contextual coding, synaptic plasticity and neuromodulatory mechanisms into a map-based approach. Following goal learning, sequential activity emerges from continuous attractor network dynamics biased by goal memory inputs. We apply Bayesian decoding on the resulting spike trains, allowing a direct comparison with experimental data. Simulations show that this model (1) explains the generation of never-experienced sequence trajectories in familiar environments, without requiring virtual self-motion signals, (2) accounts for the bias in place-cell sequences toward goal locations, (3) highlights their utility in flexible route planning, and (4) provides specific testable predictions. PMID:29075187
Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide
2014-01-01
The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
Monitoring of microbial communities in anaerobic digestion sludge for biogas optimisation.
Lim, Jun Wei; Ge, Tianshu; Tong, Yen Wah
2018-01-01
This study characterised and compared the microbial communities of anaerobic digestion (AD) sludge using three different methods - (1) Clone library; (2) Pyrosequencing; and (3) Terminal restriction fragment length polymorphism (T-RFLP). Although high-throughput sequencing techniques are becoming increasingly popular and affordable, the reliance of such techniques for frequent monitoring of microbial communities may be a financial burden for some. Furthermore, the depth of microbial analysis revealed by high-throughput sequencing may not be required for monitoring purposes. This study aims to develop a rapid, reliable and economical approach for the monitoring of microbial communities in AD sludge. A combined approach where genetic information of sequences from clone library was used to assign phylogeny to T-RFs determined experimentally was developed in this study. In order to assess the effectiveness of the combined approach, microbial communities determined by the combined approach was compared to that characterised by pyrosequencing. Results showed that both pyrosequencing and clone library methods determined the dominant bacteria phyla to be Proteobacteria, Firmicutes, Bacteroidetes, and Thermotogae. Both methods also found that sludge A and B were predominantly dominated by acetogenic methanogens followed by hydrogenotrophic methanogens. The number of OTUs detected by T-RFLP was significantly lesser than that detected by the clone library. In this study, T-RFLP analysis identified majority of the dominant species of the archaeal consortia. However, many of the more highly diverse bacteria consortia were missed. Nevertheless, the combined approach developed in this study where clone sequences from the clone library were used to assign phylogeny to T-RFs determined experimentally managed to accurately predict the same dominant microbial groups for both sludge A and sludge B, as compared to the pyrosequencing results. Results showed that the combined approach of clone library and T-RFLP accurately predicted the dominant microbial groups and thus is a reliable and more economical way to monitor the evolution of microbial systems in AD sludge. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Kang, Seung-Hui; Park, Chan Hee; Jeung, Hei Cheul; Kim, Ki-Yeol; Rha, Sun Young; Chung, Hyun Cheol
2007-06-01
In array-CGH, various factors may act as variables influencing the result of experiments. Among them, Cot-1 DNA, which has been used as a repetitive sequence-blocking agent, may become an artifact-inducing factor in BAC array-CGH. To identify the effect of Cot-1 DNA on Microarray-CGH experiments, Cot-1 DNA was labeled directly and Microarray-CGH experiments were performed. The results confirmed that probes which hybridized more completely with Cot-1 DNA had a higher sequence similarity to the Alu element. Further, in the sex-mismatched Microarray-CGH experiments, the variation and intensity in the fluorescent signal were reduced in the high intensity probe group in which probes were better hybridized with Cot-1 DNA. Otherwise, those of the low intensity probe group showed no alterations regardless of Cot-1 DNA. These results confirmed by in silico methods that Cot-1 DNA could block repetitive sequences in gDNA and probes. In addition, it was confirmed biologically that the blocking effect of Cot-1 DNA could be presented via its repetitive sequences, especially Alu elements. Thus, in contrast to BAC-array CGH, the use of Cot-1 DNA is advantageous in controlling experimental variation in Microarray-CGH.
Research and Implementation of Tibetan Word Segmentation Based on Syllable Methods
NASA Astrophysics Data System (ADS)
Jiang, Jing; Li, Yachao; Jiang, Tao; Yu, Hongzhi
2018-03-01
Tibetan word segmentation (TWS) is an important problem in Tibetan information processing, while abbreviated word recognition is one of the key and most difficult problems in TWS. Most of the existing methods of Tibetan abbreviated word recognition are rule-based approaches, which need vocabulary support. In this paper, we propose a method based on sequence tagging model for abbreviated word recognition, and then implement in TWS systems with sequence labeling models. The experimental results show that our abbreviated word recognition method is fast and effective and can be combined easily with the segmentation model. This significantly increases the effect of the Tibetan word segmentation.
Asymmetric scoring functions for proteins
NASA Astrophysics Data System (ADS)
Lezon, Timothy; Holter, Neal; Maritan, Amos; Banavar, Jayanth
2003-03-01
The protein folding problem entails the prediction of the native state structure of a protein given the sequence of amino acids. In a coarse-grained description of a protein, an important ingredient for attempting this task is the determination of the effective energies of interaction between amino acids. We will discuss a simple approach for determining such interaction potentials from a training set of protein sequences and their experimentally determined native state structures. The key new ingredient in our study is the incorporation of the lack of symmetry in the effective interactions between amino acids. Our results, obtained using a set of 513 proteins, and their implications will be discussed.
NASA Astrophysics Data System (ADS)
Debenjak, Andrej; Boškoski, Pavle; Musizza, Bojan; Petrovčič, Janko; Juričić, Đani
2014-05-01
This paper proposes an approach to the estimation of PEM fuel cell impedance by utilizing pseudo-random binary sequence as a perturbation signal and continuous wavelet transform with Morlet mother wavelet. With the approach, the impedance characteristic in the frequency band from 0.1 Hz to 500 Hz is identified in 60 seconds, approximately five times faster compared to the conventional single-sine approach. The proposed approach was experimentally evaluated on a single PEM fuel cell of a larger fuel cell stack. The quality of the results remains at the same level compared to the single-sine approach.
File compression and encryption based on LLS and arithmetic coding
NASA Astrophysics Data System (ADS)
Yu, Changzhi; Li, Hengjian; Wang, Xiyu
2018-03-01
e propose a file compression model based on arithmetic coding. Firstly, the original symbols, to be encoded, are input to the encoder one by one, we produce a set of chaotic sequences by using the Logistic and sine chaos system(LLS), and the values of this chaotic sequences are randomly modified the Upper and lower limits of current symbols probability. In order to achieve the purpose of encryption, we modify the upper and lower limits of all character probabilities when encoding each symbols. Experimental results show that the proposed model can achieve the purpose of data encryption while achieving almost the same compression efficiency as the arithmetic coding.
Hill, D E; Liddell, S; Jenkins, M C; Dubey, J P
2001-04-01
Neospora caninum oocysts, passed in the feces of a definitive host (dog), were isolated, and genomic DNA was extracted. A polymerase cahin reaction (PCR) targeting the N. caninum-specific Nc 5 genomic sequence was performed using the isolated DNA. A synthesized competitor molecule containing part of the Nc 5 sequence was included in the assay as a check against false-negative PCR results and to quantify N. caninum oocyst DNA in fecal samples. A standard curve of the ratio of fluorescence intensity of PCR-amplified competitor to that of oocyst DNA was constructed to compare oocyst equivalents from fecal samples containing unknown numbers of N. caninum oocysts and to assess the sensitivity of the assay. The specificity of the assay was determined using the Nc 5-specific primers in PCR assays against other parasites likely to be found in canine feces. Genomic DNA sequences from the canine coccidians Hammondia heydorni, Cryptosporidium parvum, Sarcocystis cruzi, S. tenella, and Isospora ohioensis and the canine helminth parasites Strongyloides stercoralis, Toxocara canis, Dipylidium caninum, and Ancylostoma caninum were not amplified. In addition, genomic DNA sequences from oocysts of coccidian parasites that might contaminate dog feces, such as Hammondia hammondi, Toxoplasma gondii, or Eimeria tenella, were not amplified in the PCR assay. The assay should be useful in epidemiological surveys of both domestic and wild canine hosts and in investigations of oocyst biology in experimental infections.
Huang, Shan; Feng, Mengmeng; Li, Jiawen; Liu, Yi; Xiao, Qi
2018-03-03
The authors describe an electrochemical method for the determination of the single-stranded DNA (ssDNA) oligonucleotide with a sequence derived from the genom of hepatitis B virus (HBV). It is making use of circular strand displacement (CSD) and rolling circle amplification (RCA) strategies mediated by a molecular beacon (MB). This ssDNA hybridizes with the loop portion of the MB immobilized on the surface of a gold electrode, while primer DNA also hybridizes with the rest of partial DNA sequences of MB. This triggers the MB-mediated CSD. The RCA is then initiated to produce a long DNA strand with multiple tandem-repeat sequences, and this results in a significant increase of the differential pulse voltammetric response of the electrochemical probe Methylene Blue at a rather low working potential of -0.24 V (vs. Ag/AgCl). Under optimal experimental conditions, the assay displays an ultrahigh sensitivity (with a 2.6 aM detection limit) and excellent selectivity. Response is linear in the 10 to 700 aM DNA concentration range. Graphical abstract Schematic of a voltammetric method for the determination of attomolar levels of target DNA. It is based on molecular beacon mediated circular strand displacement and rolling circle amplification strategies. Under optimal experimental conditions, the assay displays an ultrahigh sensitivity with a 2.6 aM detection limit and excellent selectivity.
Dunbar, Robert C; Berden, Giel; Martens, Jonathan K; Oomens, Jos
2015-09-24
Conformational preferences have been surveyed for divalent metal cation complexes with the dipeptide ligands AlaPhe, PheAla, GlyHis, and HisGly. Density functional theory results for a full set of complexes are presented, and previous experimental infrared spectra, supplemented by a number of newly recorded spectra obtained with infrared multiple photon dissociation spectroscopy, provide experimental verification of the preferred conformations in most cases. The overall structural features of these complexes are shown, and attention is given to comparisons involving peptide sequence, nature of the metal ion, and nature of the side-chain anchor. A regular progression is observed as a function of binding strength, whereby the weakly binding metal ions (Ba(2+) to Ca(2+)) transition from carboxylate zwitterion (ZW) binding to charge-solvated (CS) binding, while the stronger binding metal ions (Ca(2+) to Mg(2+) to Ni(2+)) transition from CS binding to metal-ion-backbone binding (Iminol) by direct metal-nitrogen bonds to the deprotonated amide nitrogens. Two new sequence-dependent reversals are found between ZW and CS binding modes, such that Ba(2+) and Ca(2+) prefer ZW binding in the GlyHis case but prefer CS binding in the HisGly case. The overall binding strength for a given metal ion is not strongly dependent on the sequence, but the histidine peptides are significantly more strongly bound (by 50-100 kJ mol(-1)) than the phenylalanine peptides.
Lee, Sangwoo; Kim, Cheolmin; Kim, Jungkon; Kim, Woo-Keun; Shin, Hyun Suk; Lim, Eun-Suk; Lee, Jin Wuk; Kim, Sunmi; Kim, Ki-Tae; Lee, Sung-Kyu; Choi, Cheol Young; Choi, Kyungho
2015-07-01
Zacco platypus, pale chub, is an indigenous freshwater fish of East Asia including Korea and has many useful characteristics as indicator species for water pollution. While utility of Z. platypus as an experimental species has been recognized, genetic-level information is very limited and warrants extensive research. Metallothionein (MT) is widely used and well-known biomarker for heavy metal exposure in many experimental species. In the present study, we cloned MT in Z. platypus and evaluated its utility as a biomarker for metal exposure. For this purpose, we sequenced complete complementary DNA (cDNA) of MT in Z. platypus and carried out phylogenetic analysis with its sequences. The transcription-level responses of MT gene following the exposure to CdCl2 were also assessed to validate the utility of this gene as an exposure biomarker. Analysis of cDNA sequence of MT gene demonstrated high conformity with those of other fish. MT messenger RNA (mRNA) expression and enzymatic MT content significantly increased following CdCl2 exposure in a concentration-dependent manner. The level of CdCl2 that resulted in significant MT changes in Z. platypus was within the range that was reported from other fish. The MT gene of Z. platypus sequenced in the present study can be used as a useful biomarker for heavy metal exposure in the aquatic environment of Korea and other countries where this freshwater fish species represents the ecosystem.
Using the Tools and Resources of the RCSB Protein Data Bank.
Costanzo, Luigi Di; Ghosh, Sutapa; Zardecki, Christine; Burley, Stephen K
2016-09-07
The Protein Data Bank (PDB) archive is the worldwide repository of experimentally determined three-dimensional structures of large biological molecules found in all three kingdoms of life. Atomic-level structures of these proteins, nucleic acids, and complex assemblies thereof are central to research and education in molecular, cellular, and organismal biology, biochemistry, biophysics, materials science, bioengineering, ecology, and medicine. Several types of information are associated with each PDB archival entry, including atomic coordinates, primary experimental data, polymer sequence(s), and summary metadata. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) serves as the U.S. data center for the PDB, distributing archival data and supporting both simple and complex queries that return results. These data can be freely downloaded, analyzed, and visualized using RCSB PDB tools and resources to gain a deeper understanding of fundamental biological processes, molecular evolution, human health and disease, and drug discovery. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.
Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
2018-07-01
Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.
Diffusion modulation of DNA by toehold exchange
NASA Astrophysics Data System (ADS)
Rodjanapanyakul, Thanapop; Takabatake, Fumi; Abe, Keita; Kawamata, Ibuki; Nomura, Shinichiro M.; Murata, Satoshi
2018-05-01
We propose a method to control the diffusion speed of DNA molecules with a target sequence in a polymer solution. The interaction between solute DNA and diffusion-suppressing DNA that has been anchored to a polymer matrix is modulated by the concentration of the third DNA molecule called the competitor by a mechanism called toehold exchange. Experimental results show that the sequence-specific modulation of the diffusion coefficient is successfully achieved. The diffusion coefficient can be modulated up to sixfold by changing the concentration of the competitor. The specificity of the modulation is also verified under the coexistence of a set of DNA with noninteracting base sequences. With this mechanism, we are able to control the diffusion coefficient of individual DNA species by the concentration of another DNA species. This methodology introduces a programmability to a DNA-based reaction-diffusion system.
Motion detection and compensation in infrared retinal image sequences.
Scharcanski, J; Schardosim, L R; Santos, D; Stuchi, A
2013-01-01
Infrared image data captured by non-mydriatic digital retinography systems often are used in the diagnosis and treatment of the diabetic macular edema (DME). Infrared illumination is less aggressive to the patient retina, and retinal studies can be carried out without pupil dilation. However, sequences of infrared eye fundus images of static scenes, tend to present pixel intensity fluctuations in time, and noisy and background illumination changes pose a challenge to most motion detection methods proposed in the literature. In this paper, we present a retinal motion detection method that is adaptive to background noise and illumination changes. Our experimental results indicate that this method is suitable for detecting retinal motion in infrared image sequences, and compensate the detected motion, which is relevant in retinal laser treatment systems for DME. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Guanxi; Tie, Yun; Qi, Lin
2017-07-01
In this paper, we propose a novel approach based on Depth Maps and compute Multi-Scale Histograms of Oriented Gradient (MSHOG) from sequences of depth maps to recognize actions. Each depth frame in a depth video sequence is projected onto three orthogonal Cartesian planes. Under each projection view, the absolute difference between two consecutive projected maps is accumulated through a depth video sequence to form a Depth Map, which is called Depth Motion Trail Images (DMTI). The MSHOG is then computed from the Depth Maps for the representation of an action. In addition, we apply L2-Regularized Collaborative Representation (L2-CRC) to classify actions. We evaluate the proposed approach on MSR Action3D dataset and MSRGesture3D dataset. Promising experimental result demonstrates the effectiveness of our proposed method.
Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I
2018-04-14
Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Efficient error correction for next-generation sequencing of viral amplicons
2012-01-01
Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Comprehensive sequence-flux mapping of a levoglucosan utilization pathway in E. coli
Klesmith, Justin R.; Bacik, John -Paul; Michalczyk, Ryszard; ...
2015-09-14
Synthetic metabolic pathways often suffer from low specific productivity, and new methods that quickly assess pathway functionality for many thousands of variants are urgently needed. Here we present an approach that enables the rapid and parallel determination of sequence effects on flux for complete gene-encoding sequences. We show that this method can be used to determine the effects of over 8000 single point mutants of a pyrolysis oil catabolic pathway implanted in Escherichia coli. Experimental sequence-function data sets predicted whether fitness-enhancing mutations to the enzyme levoglucosan kinase resulted from enhanced catalytic efficiency or enzyme stability. A structure of one designmore » incorporating 38 mutations elucidated the structural basis of high fitness mutations. One design incorporating 15 beneficial mutations supported a 15-fold improvement in growth rate and greater than 24-fold improvement in enzyme activity relative to the starting pathway. Lastly, this technique can be extended to improve a wide variety of designed pathways.« less
Niland, Courtney N.; Jankowsky, Eckhard; Harris, Michael E.
2016-01-01
Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High Throughput Sequencing Kinetics (HTS-Kin) uses high throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigate the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population. Importantly, we find that high-throughput sequencing, and experimental reproducibility contribute their own sources of error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed. PMID:27296633
Complete Nucleotide Sequence of Watermelon Chlorotic Stunt Virus Originating from Oman
Khan, Akhtar J.; Akhtar, Sohail; Briddon, Rob W.; Ammara, Um; Al-Matrooshi, Abdulrahman M.; Mansoor, Shahid
2012-01-01
Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6–99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93–98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed. PMID:22852046
Complete nucleotide sequence of watermelon chlorotic stunt virus originating from Oman.
Khan, Akhtar J; Akhtar, Sohail; Briddon, Rob W; Ammara, Um; Al-Matrooshi, Abdulrahman M; Mansoor, Shahid
2012-07-01
Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6-99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93-98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed.
Structure stability of lytic peptides during their interactions with lipid bilayers.
Chen, H M; Lee, C H
2001-10-01
In this work, molecular dynamics simulations were used to examine the consequences of a variety of analogs of cecropin A on lipid bilayers. Analog sequences were constructed by replacing either the N- or C-terminal helix with the other helix in native or reverse sequence order, by making palindromic peptides based on both the N- and C-terminal helices, and by deleting the hinge region. The structure of the peptides was monitored throughout the simulation. The hinge region appeared not to assist in maintaining helical structure but help in motion flexibility. In general, the N-terminal helix of peptides was less stable than the C-terminal one during the interaction with anionic lipid bilayers. Sequences with hydrophobic helices tended to regain helical structure after an initial loss while sequences with amphipathic helices were less able to do this. The results suggests that hydrophobic design peptides have a high structural stability in an anionic membrane and are the candidates for experimental investigation.
2012-01-01
Background Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. Results We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. Conclusions Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. PMID:23181585
Optical flow estimation on image sequences with differently exposed frames
NASA Astrophysics Data System (ADS)
Bengtsson, Tomas; McKelvey, Tomas; Lindström, Konstantin
2015-09-01
Optical flow (OF) methods are used to estimate dense motion information between consecutive frames in image sequences. In addition to the specific OF estimation method itself, the quality of the input image sequence is of crucial importance to the quality of the resulting flow estimates. For instance, lack of texture in image frames caused by saturation of the camera sensor during exposure can significantly deteriorate the performance. An approach to avoid this negative effect is to use different camera settings when capturing the individual frames. We provide a framework for OF estimation on such sequences that contain differently exposed frames. Information from multiple frames are combined into a total cost functional such that the lack of an active data term for saturated image areas is avoided. Experimental results demonstrate that using alternate camera settings to capture the full dynamic range of an underlying scene can clearly improve the quality of flow estimates. When saturation of image data is significant, the proposed methods show superior performance in terms of lower endpoint errors of the flow vectors compared to a set of baseline methods. Furthermore, we provide some qualitative examples of how and when our method should be used.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Truong, Kevin; Ikura, Mitsuhiko
2003-01-01
Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
A Spiking Neural Network System for Robust Sequence Recognition.
Yu, Qiang; Yan, Rui; Tang, Huajin; Tan, Kay Chen; Li, Haizhou
2016-03-01
This paper proposes a biologically plausible network architecture with spiking neurons for sequence recognition. This architecture is a unified and consistent system with functional parts of sensory encoding, learning, and decoding. This is the first systematic model attempting to reveal the neural mechanisms considering both the upstream and the downstream neurons together. The whole system is a consistent temporal framework, where the precise timing of spikes is employed for information processing and cognitive computing. Experimental results show that the system is competent to perform the sequence recognition, being robust to noisy sensory inputs and invariant to changes in the intervals between input stimuli within a certain range. The classification ability of the temporal learning rule used in the system is investigated through two benchmark tasks that outperform the other two widely used learning rules for classification. The results also demonstrate the computational power of spiking neurons over perceptrons for processing spatiotemporal patterns. In summary, the system provides a general way with spiking neurons to encode external stimuli into spatiotemporal spikes, to learn the encoded spike patterns with temporal learning rules, and to decode the sequence order with downstream neurons. The system structure would be beneficial for developments in both hardware and software.
Sequence determinants of improved CRISPR sgRNA design.
Xu, Han; Xiao, Tengfei; Chen, Chen-Hao; Li, Wei; Meyer, Clifford A; Wu, Qiu; Wu, Di; Cong, Le; Zhang, Feng; Liu, Jun S; Brown, Myles; Liu, X Shirley
2015-08-01
The CRISPR/Cas9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens using CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent data sets, the model achieved significant results in both positive and negative selection conditions and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies. © 2015 Xu et al.; Published by Cold Spring Harbor Laboratory Press.
An experimental microcomputer controlled system for synchronized pulsating anti-gravity suit.
Moore, T W; Foley, J; Reddy, B R; Kepics, F; Jaron, D
1987-07-01
An experimental system to deliver synchronized external pressure pulsations to the lower body is described in this technical note. The system is designed using a microcomputer with a real time interface and an electro-pneumatic subsystem capable of delivering pressure pulses to a modified anti-G suit at a fast rate. It is versatile, containing many options for synchronizing, phasing and sequencing of the pressure pulsations and controlling the pressure level in the suit bladders. Details of its software and hardware are described along with the results of initial testing in a Dynamic Flight Simulator on human volunteers.
Behavior of single lap composite bolted joint under traction loading: Experimental investigation
NASA Astrophysics Data System (ADS)
Awadhani, L. V.; Bewoor, Anand
2018-04-01
Composite bolted joints are preferred connection in the composite structures to facilitate the dismantling for the replacements/ maintenance work. The joint behavior under tractive forces has been studied in order to understand the safety of the structure designed. The main objective of this paper is to investigate the behavior of single-lap joints in carbon fiber reinforced epoxy composites under traction loading conditions. The experiments were designed to identify the effect of bolt diameter, stacking sequence and loading rate on the properties of the joint. The experimental results show that the parameters influence the joint performance significantly.
Grabundzija, Ivana; Messing, Simon A; Thomas, Jainy; Cosby, Rachel L; Bilic, Ilija; Miskey, Csaba; Gogol-Döring, Andreas; Kapitonov, Vladimir; Diem, Tanja; Dalda, Anna; Jurka, Jerzy; Pritham, Ellen J; Dyda, Fred; Izsvák, Zsuzsanna; Ivics, Zoltán
2016-03-02
Helitron transposons capture and mobilize gene fragments in eukaryotes, but experimental evidence for their transposition is lacking in the absence of an isolated active element. Here we reconstruct Helraiser, an ancient element from the bat genome, and use this transposon as an experimental tool to unravel the mechanism of Helitron transposition. A hairpin close to the 3'-end of the transposon functions as a transposition terminator. However, the 3'-end can be bypassed by the transposase, resulting in transduction of flanking sequences to new genomic locations. Helraiser transposition generates covalently closed circular intermediates, suggestive of a replicative transposition mechanism, which provides a powerful means to disseminate captured transcriptional regulatory signals across the genome. Indeed, we document the generation of novel transcripts by Helitron promoter capture both experimentally and by transcriptome analysis in bats. Our results provide mechanistic insight into Helitron transposition, and its impact on diversification of gene function by genome shuffling.
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana
2016-07-01
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Experimental generation of Laguerre-Gaussian beam using digital micromirror device.
Ren, Yu-Xuan; Li, Ming; Huang, Kun; Wu, Jian-Guang; Gao, Hong-Fang; Wang, Zi-Qiang; Li, Yin-Mei
2010-04-01
A digital micromirror device (DMD) modulates laser intensity through computer control of the device. We experimentally investigate the performance of the modulation property of a DMD and optimize the modulation procedure through image correction. Furthermore, Laguerre-Gaussian (LG) beams with different topological charges are generated by projecting a series of forklike gratings onto the DMD. We measure the field distribution with and without correction, the energy of LG beams with different topological charges, and the polarization property in sequence. Experimental results demonstrate that it is possible to generate LG beams with a DMD that allows the use of a high-intensity laser with proper correction to the input images, and that the polarization state of the LG beam differs from that of the input beam.
Protein Sectors: Statistical Coupling Analysis versus Conservation
Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas
2015-01-01
Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535
Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter
2014-02-01
Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.
Static and Dynamic Properties of DNA Confined in Nanochannels
NASA Astrophysics Data System (ADS)
Gupta, Damini
Next-generation sequencing (NGS) techniques have considerably reduced the cost of high-throughput DNA sequencing. However, it is challenging to detect large-scale genomic variations by NGS due to short read lengths. Genome mapping can easily detect large-scale structural variations because it operates on extremely large intact molecules of DNA with adequate resolution. One of the promising methods of genome mapping is based on confining large DNA molecules inside a nanochannel whose cross-sectional dimensions are approximately 50 nm. Even though this genome mapping technology has been commercialized, the current understanding of the polymer physics of DNA in nanochannel confinement is based on theories and lacks much needed experimental support. The results of this dissertation are aimed at providing a detailed experimental understanding of equilibrium properties of nanochannel-confined DNA molecules. The results are divided into three parts. In first part, we evaluate the role of channel shape on thermodynamic properties of channel confined DNA molecules using a combination of fluorescence microscopy and simulations. Specifically, we show that high aspect ratio of rectangular channels significantly alters the chain statistics as compared to an equivalent square channel with same cross-sectional area. In the second part, we present experimental evidence that weak excluded volume effects arise in DNA nanochannel confinement, which form the physical basis for the extended de Gennes regime. We also show how confinement spectroscopy and simulations can be combined to reduce molecular weight dispersity effects arising from shearing, photo-cleavage, and nonuniform staining of DNA. Finally, the third part of the thesis concerns the dynamic properties of nanochannel confined DNA. We directly measure the center-of-mass diffusivity of single DNA molecules in confinement and show that that it is necessary to modify the classical results of de Gennes to account for local chain stiffness of DNA in order to explain the experimental results. In the end, we believe that our findings from the experimental test of the phase diagram for channel-confined DNA, with careful control over molecular weight dispersity, channel geometry, and electrostatic interactions, will provide a firm foundation for the emerging genome mapping technology.
Enhanced diffusion weighting generated by selective adiabatic pulse trains
NASA Astrophysics Data System (ADS)
Sun, Ziqi; Bartha, Robert
2007-09-01
A theoretical description and experimental validation of the enhanced diffusion weighting generated by selective adiabatic full passage (AFP) pulse trains is provided. Six phantoms (Ph-1-Ph-6) were studied on a 4 T Varian/Siemens whole body MRI system. Phantoms consisted of 2.8 cm diameter plastic tubes containing a mixture of 10 μm ORGASOL polymer beads and 2 mM Gd-DTPA dissolved in 5% agar (Ph-1) or nickel(II) ammonium sulphate hexahydrate doped (56.3-0.8 mM) water solutions (Ph-2-Ph-6). A customized localization by adiabatic selective refocusing (LASER) sequence containing slice selective AFP pulse trains and pulsed diffusion gradients applied in the phase encoding direction was used to measure 1H 2O diffusion. The b-value associated with the LASER sequence was derived using the Bloch-Torrey equation. The apparent diffusion coefficients measured by LASER were comparable to those measured by a conventional pulsed gradient spin-echo (PGSE) sequence for all phantoms. Image signal intensity increased in Ph-1 and decreased in Ph-2-Ph-6 as AFP pulse train length increased while maintaining a constant echo-time. These experimental results suggest that such AFP pulse trains can enhance contrast between regions containing microscopic magnetic susceptibility variations and homogeneous regions in which dynamic dephasing relaxation mechanisms are dominant.
Improved Modeling of Side-Chain–Base Interactions and Plasticity in Protein–DNA Interface Design
Thyme, Summer B.; Baker, David; Bradley, Philip
2012-01-01
Combinatorial sequence optimization for protein design requires libraries of discrete side-chain conformations. The discreteness of these libraries is problematic, particularly for long, polar side chains, since favorable interactions can be missed. Previously, an approach to loop remodeling where protein backbone movement is directed by side-chain rotamers predicted to form interactions previously observed in native complexes (termed “motifs”) was described. Here, we show how such motif libraries can be incorporated into combinatorial sequence optimization protocols and improve native complex recapitulation. Guided by the motif rotamer searches, we made improvements to the underlying energy function, increasing recapitulation of native interactions. To further test the methods, we carried out a comprehensive experimental scan of amino acid preferences in the I-AniI protein–DNA interface and found that many positions tolerated multiple amino acids. This sequence plasticity is not observed in the computational results because of the fixed-backbone approximation of the model. We improved modeling of this diversity by introducing DNA flexibility and reducing the convergence of the simulated annealing algorithm that drives the design process. In addition to serving as a benchmark, this extensive experimental data set provides insight into the types of interactions essential to maintain the function of this potential gene therapy reagent. PMID:22426128
Improved modeling of side-chain--base interactions and plasticity in protein--DNA interface design.
Thyme, Summer B; Baker, David; Bradley, Philip
2012-06-08
Combinatorial sequence optimization for protein design requires libraries of discrete side-chain conformations. The discreteness of these libraries is problematic, particularly for long, polar side chains, since favorable interactions can be missed. Previously, an approach to loop remodeling where protein backbone movement is directed by side-chain rotamers predicted to form interactions previously observed in native complexes (termed "motifs") was described. Here, we show how such motif libraries can be incorporated into combinatorial sequence optimization protocols and improve native complex recapitulation. Guided by the motif rotamer searches, we made improvements to the underlying energy function, increasing recapitulation of native interactions. To further test the methods, we carried out a comprehensive experimental scan of amino acid preferences in the I-AniI protein-DNA interface and found that many positions tolerated multiple amino acids. This sequence plasticity is not observed in the computational results because of the fixed-backbone approximation of the model. We improved modeling of this diversity by introducing DNA flexibility and reducing the convergence of the simulated annealing algorithm that drives the design process. In addition to serving as a benchmark, this extensive experimental data set provides insight into the types of interactions essential to maintain the function of this potential gene therapy reagent. Published by Elsevier Ltd.
Cenik, Can; Chua, Hon Nian; Zhang, Hui; Tarnawsky, Stefan P.; Akef, Abdalla; Derti, Adnan; Tasan, Murat; Moore, Melissa J.; Palazzo, Alexander F.; Roth, Frederick P.
2011-01-01
In higher eukaryotes, messenger RNAs (mRNAs) are exported from the nucleus to the cytoplasm via factors deposited near the 5′ end of the transcript during splicing. The signal sequence coding region (SSCR) can support an alternative mRNA export (ALREX) pathway that does not require splicing. However, most SSCR–containing genes also have introns, so the interplay between these export mechanisms remains unclear. Here we support a model in which the furthest upstream element in a given transcript, be it an intron or an ALREX–promoting SSCR, dictates the mRNA export pathway used. We also experimentally demonstrate that nuclear-encoded mitochondrial genes can use the ALREX pathway. Thus, ALREX can also be supported by nucleotide signals within mitochondrial-targeting sequence coding regions (MSCRs). Finally, we identified and experimentally verified novel motifs associated with the ALREX pathway that are shared by both SSCRs and MSCRs. Our results show strong correlation between 5′ untranslated region (5′UTR) intron presence/absence and sequence features at the beginning of the coding region. They also suggest that genes encoding secretory and mitochondrial proteins share a common regulatory mechanism at the level of mRNA export. PMID:21533221
NASA Astrophysics Data System (ADS)
Azarov, Vladimir I.
2018-01-01
Data available on the 5d3, 5d26s and 5d26p configurations in the Lu I isoelectronic sequence have been critically reviewed by means of calculations with the orthogonal operators. The study included spectra from Ta III through Hg X. The calculations agree very well with the experimental data. The isoelectronic behavior of parameters and deviations of the experimental levels from the calculated positions, ΔE = (Eexp -Ecalc), show regular trends. Three missing 5d26s levels have been accurately predicted theoretically and confirmed experimentally: the level (3P)2P3/2 in Pt VIII and the levels (3P)4P5/2 and (3P)2P1/2 in Os VI have been determined in the study. The research suggested revision of the published initial analyses of the Re V and Hg X spectra. The recently completed revised analysis of Re V has confirmed the issues noticed in the initial analysis and has resulted in the data that fit very well in the current parametric study. The isoelectronic evolution of the higher order interactions was studied for the first time in the Lu I sequence. The study included the parameters Ac, A3-A6 describing two-particle magnetic interaction of the dd-type, the parameter Amso describing two-particle magnetic ds-type effect, the parameter Tdds describing 3-particle electrostatic ds-type interaction, and the effective parameters S1 and S2 of the dp-type.
Bratzel, Graham; Buehler, Markus J
2012-03-01
Spider silk is a self-assembling biopolymer that outperforms many known materials in terms of its mechanical performance despite being constructed from simple and inferior building blocks. While experimental studies have shown that the molecular structure of silk has a direct influence on the stiffness, toughness, and failure strength of silk, few molecular-level analyses of the nanostructure of silk assemblies in particular under variations of genetic sequences have been reported. Here we report atomistic-level structures of the MaSp1 protein from the Nephila Clavipes spider dragline silk sequence, obtained using an in silico approach based on replica exchange molecular dynamics (REMD) and explicit water molecular dynamics. We apply this method to study the effects of a systematic variation of the poly-alanine repeat lengths, a parameter controlled by the genetic makeup of silk, on the resulting molecular structure of silk at the nanoscale. Confirming earlier experimental and computational work, a structural analysis reveals that poly-alanine regions in silk predominantly form distinct and orderly β-sheet crystal domains while disorderly regions are formed by glycine-rich repeats that consist of 3(10)-helix type structures and β-turns. Our predictions are directly validated against experimental data based on dihedral angle pair calculations presented in Ramachandran plots combined with an analysis of the secondary structure content. The key result of our study is our finding of a strong dependence of the resulting silk nanostructure depending on the poly-alanine length. We observe that the wildtype poly-alanine repeat length of six residues defines a critical minimum length that consistently results in clearly defined β-sheet nanocrystals. For poly-alanine lengths below six, the β-sheet nanocrystals are not well-defined or not visible at all, while for poly-alanine lengths at and above six, the characteristic nanocomposite structure of silk emerges with no significant improvement of the quality of the β-sheet nanocrystal geometry. We present a simple biophysical model that explains these computational observations based on the mechanistic insight gained from the molecular simulations. Our findings set the stage for understanding how variations in the spidroin sequence can be used to engineer the structure and thereby functional properties of this biological superfiber, and present a design strategy for the genetic optimization of spidroins for enhanced mechanical properties. The approach used here may also find application in the design of other self-assembled molecular structures and fibers and in particular biologically inspired or completely synthetic systems. Copyright © 2011 Elsevier Ltd. All rights reserved.
Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian
2017-01-01
The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916
Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.
Manrubia, Susanna; Cuesta, José A
2017-04-01
An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).
Zhou, Yu; Pearson, John E; Auerbach, Anthony
2005-12-01
We derive the analytical form of a rate-equilibrium free-energy relationship (with slope Phi) for a bounded, linear chain of coupled reactions having arbitrary connecting rate constants. The results confirm previous simulation studies showing that Phi-values reflect the position of the perturbed reaction within the chain, with reactions occurring earlier in the sequence producing higher Phi-values than those occurring later in the sequence. The derivation includes an expression for the transmission coefficients of the overall reaction based on the rate constants of an arbitrary, discrete, finite Markov chain. The results indicate that experimental Phi-values can be used to calculate the relative heights of the energy barriers between intermediate states of the chain but provide no information about the energies of the wells along the reaction path. Application of the equations to the case of diliganded acetylcholine receptor channel gating suggests that the transition-state ensemble for this reaction is nearly flat. Although this mechanism accounts for many of the basic features of diliganded and unliganded acetylcholine receptor channel gating, the experimental rate-equilibrium free-energy relationships appear to be more linear than those predicted by the theory.
Relativistic Many-Body Calculations of n=2 States for the Beryllium Isoelectronic Sequence
NASA Astrophysics Data System (ADS)
Safronova, M. S.; Johnson, W. R.; Safronova, U. I.
1996-05-01
Energies of the ten (2l2l') states of ions of the beryllium isoelectronic sequence are determined to second-order in relativistic many-body perturbation theory. Both the second-order Coulomb interaction and the second-order Breit-Coulomb interaction are included. Corrections for the frequency-dependent Breit interaction are taken in account in lowest order only. The effect of the Lamb shift is also estimated and included. Comparisons with other calculations and with experiment are made. Our theoretical results for the 2s-2p_3/2 transitions in U^88+ and Th^86+ (4501.60 eV and 4069.02 eV, resp.) differ only by 0.12 eV for U^88+ and 0.55 eV for Th^86+ from experimental data obtained at the SUPER-EBIT in LLNL.(P. Beiersdorfer, D. Knapp, R.E. Marrs, S.R. Elliot and M.H. Chen, Phys. Rev. Lett. 71), 3939 (1993); P. Beiersdorfer, A. Osterheld, S.R. Elliot, M.H. Chen, D. Knapp, and K. Reed, Phys. Rev. A52, 2693 (1995). Excellent agreement with experimental results for the splitting of ^3 P terms is found.
Role of Poultry in the Spread of Novel H7N9 Influenza Virus in China
Pantin-Jackwood, Mary J.; Miller, Patti J.; Spackman, Erica; Swayne, David E.; Susta, Leonardo; Costa-Hurtado, Mar
2014-01-01
ABSTRACT The recent outbreak of H7N9 influenza in China has resulted in many human cases with a high fatality rate. Poultry are the likely source of infection for humans on the basis of sequence analysis and virus isolations from live bird markets, but it is not clear which species of birds are most likely to be infected and shedding levels of virus sufficient to infect humans. Intranasal inoculation of chickens, Japanese quail, pigeons, Pekin ducks, Mallard ducks, Muscovy ducks, and Embden geese with 106 50% egg infective doses of the A/Anhui/1/2013 virus resulted in infection but no clinical disease signs. Virus shedding was much higher and prolonged in quail and chickens than in the other species. Quail effectively transmitted the virus to direct contacts, but pigeons and Pekin ducks did not. In all species, virus was detected at much higher titers from oropharyngeal swabs than cloacal swabs. The hemagglutinin gene from samples collected from selected experimentally infected birds was sequenced, and three amino acid differences were commonly observed when the sequence was compared to the sequence of A/Anhui/1/2013: N123D, N149D, and L217Q. Leucine at position 217 is highly conserved for human isolates and is associated with α2,6-sialic acid binding. Different amino acid combinations were observed, suggesting that the inoculum had viral subpopulations that were selected after passage in birds. These experimental studies corroborate the finding that certain poultry species are reservoirs of the H7N9 influenza virus and that the virus is highly tropic for the upper respiratory tract, so testing of bird species should preferentially be conducted with oropharyngeal swabs for the best sensitivity. IMPORTANCE The recent outbreak of H7N9 influenza in China has resulted in a number of human infections with a high case fatality rate. The source of the viral outbreak is suspected to be poultry, but definitive data on the source of the infection are not available. This study provides experimental data to show that quail and chickens are susceptible to infection, shed large amounts of virus, and are likely important in the spread of the virus to humans. Other poultry species can be infected and shed virus but are less likely to play a role of transmitting the virus to humans. Pigeons were previously suggested to be a possible source of the virus because of isolation of the virus from several pigeons in poultry markets in China, but experimental studies show that they are generally resistant to infection and are unlikely to play a role in the spread of the virus. PMID:24574407
NASA Astrophysics Data System (ADS)
Poston, Chloe N.; Higgs, Richard E.; You, Jinsam; Gelfanova, Valentina; Hale, John E.; Knierman, Michael D.; Siegel, Robert; Gutierrez, Jesus A.
2014-07-01
De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.
Poston, Chloe N; Higgs, Richard E; You, Jinsam; Gelfanova, Valentina; Hale, John E; Knierman, Michael D; Siegel, Robert; Gutierrez, Jesus A
2014-07-01
De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.
NASA Technical Reports Server (NTRS)
Murphy, Andrew G.; Browne, David J.; Mirihanage, Wajira U.; Mathiesen, Ragnvald H.
2012-01-01
In the last decade synchrotron X-ray sources have fast become the tool of choice for performing in-situ high resolution imaging during alloy solidification. This paper presents the results of an experimental campaign carried out at the European Synchrotron Radiation Facility, using a Bridgman furnace, to monitor phenomena during solidification of Al-Cu alloys - specifically the onset of equiaxed dendrite coherency. Conventional experimental methods for determining coherency involve measuring the change in viscosity or measuring the change in thermal conductivity across the solidifying melt Conflicts arise when comparing the results of these experimental techniques to find a relationship between cooling rate and coherency fraction. It has been shown that the ratio of average velocity to the average grain diameter has an inversely proportional relationship to coherency fraction. In-situ observation therefore makes it possible to measure these values directly from acquired images sequences and make comparisons with published results.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
Powell, Bradford C; Hutchison, Clyde A
2006-01-01
Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
2014-01-01
Background Deciphering of the information content of eukaryotic promoters has remained confined to universal landmarks and conserved sequence elements such as enhancers and transcription factor binding motifs, which are considered sufficient for gene activation and regulation. Gene-specific sequences, interspersed between the canonical transacting factor binding sites or adjoining them within a promoter, are generally taken to be devoid of any regulatory information and have therefore been largely ignored. An unanswered question therefore is, do gene-specific sequences within a eukaryotic promoter have a role in gene activation? Here, we present an exhaustive experimental analysis of a gene-specific sequence adjoining the heat shock element (HSE) in the proximal promoter of the small heat shock protein gene, αB-crystallin (cryab). These sequences are highly conserved between the rodents and the humans. Results Using human retinal pigment epithelial cells in culture as the host, we have identified a 10-bp gene-specific promoter sequence (GPS), which, unlike an enhancer, controls expression from the promoter of this gene, only when in appropriate position and orientation. Notably, the data suggests that GPS in comparison with the HSE works in a context-independent fashion. Additionally, when moved upstream, about a nucleosome length of DNA (−154 bp) from the transcription start site (TSS), the activity of the promoter is markedly inhibited, suggesting its involvement in local promoter access. Importantly, we demonstrate that deletion of the GPS results in complete loss of cryab promoter activity in transgenic mice. Conclusions These data suggest that gene-specific sequences such as the GPS, identified here, may have critical roles in regulating gene-specific activity from eukaryotic promoters. PMID:24589182
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Graph pyramids for protein function prediction
2015-01-01
Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder
2016-01-01
The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541
Rickert, Keith W; Grinberg, Luba; Woods, Robert M; Wilson, Susan; Bowen, Michael A; Baca, Manuel
2016-01-01
The enormous diversity created by gene recombination and somatic hypermutation makes de novo protein sequencing of monoclonal antibodies a uniquely challenging problem. Modern mass spectrometry-based sequencing will rarely, if ever, provide a single unambiguous sequence for the variable domains. A more likely outcome is computation of an ensemble of highly similar sequences that can satisfy the experimental data. This outcome can result in the need for empirical testing of many candidate sequences, sometimes iteratively, to identity one which can replicate the activity of the parental antibody. Here we describe an improved approach to antibody protein sequencing by using phage display technology to generate a combinatorial library of sequences that satisfy the mass spectrometry data, and selecting for functional candidates that bind antigen. This approach was used to reverse engineer 2 commercially-obtained monoclonal antibodies against murine CD137. Proteomic data enabled us to assign the majority of the variable domain sequences, with the exception of 3-5% of the sequence located within or adjacent to complementarity-determining regions. To efficiently resolve the sequence in these regions, small phage-displayed libraries were generated and subjected to antigen binding selection. Following enrichment of antigen-binding clones, 2 clones were selected for each antibody and recombinantly expressed as antigen-binding fragments (Fabs). In both cases, the reverse-engineered Fabs exhibited identical antigen binding affinity, within error, as Fabs produced from the commercial IgGs. This combination of proteomic and protein engineering techniques provides a useful approach to simplifying the technically challenging process of reverse engineering monoclonal antibodies from protein material.
Rickert, Keith W.; Grinberg, Luba; Woods, Robert M.; Wilson, Susan; Bowen, Michael A.; Baca, Manuel
2016-01-01
ABSTRACT The enormous diversity created by gene recombination and somatic hypermutation makes de novo protein sequencing of monoclonal antibodies a uniquely challenging problem. Modern mass spectrometry-based sequencing will rarely, if ever, provide a single unambiguous sequence for the variable domains. A more likely outcome is computation of an ensemble of highly similar sequences that can satisfy the experimental data. This outcome can result in the need for empirical testing of many candidate sequences, sometimes iteratively, to identity one which can replicate the activity of the parental antibody. Here we describe an improved approach to antibody protein sequencing by using phage display technology to generate a combinatorial library of sequences that satisfy the mass spectrometry data, and selecting for functional candidates that bind antigen. This approach was used to reverse engineer 2 commercially-obtained monoclonal antibodies against murine CD137. Proteomic data enabled us to assign the majority of the variable domain sequences, with the exception of 3–5% of the sequence located within or adjacent to complementarity-determining regions. To efficiently resolve the sequence in these regions, small phage-displayed libraries were generated and subjected to antigen binding selection. Following enrichment of antigen-binding clones, 2 clones were selected for each antibody and recombinantly expressed as antigen-binding fragments (Fabs). In both cases, the reverse-engineered Fabs exhibited identical antigen binding affinity, within error, as Fabs produced from the commercial IgGs. This combination of proteomic and protein engineering techniques provides a useful approach to simplifying the technically challenging process of reverse engineering monoclonal antibodies from protein material. PMID:26852694
Computer-based prediction of mitochondria-targeting peptides.
Martelli, Pier Luigi; Savojardo, Castrense; Fariselli, Piero; Tasco, Gianluca; Casadio, Rita
2015-01-01
Computational methods are invaluable when protein sequences, directly derived from genomic data, need functional and structural annotation. Subcellular localization is a feature necessary for understanding the protein role and the compartment where the mature protein is active and very difficult to characterize experimentally. Mitochondrial proteins encoded on the cytosolic ribosomes carry specific patterns in the precursor sequence from where it is possible to recognize a peptide targeting the protein to its final destination. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequences and benchmark our and other methods on the human mitochondrial proteins endowed with experimentally characterized targeting peptides. Furthermore, we illustrate our newly implemented web server and its usage on the whole human proteome in order to infer mitochondrial targeting peptides, their cleavage sites, and whether the targeting peptide regions contain or not arginine-rich recurrent motifs. By this, we add some other 2,800 human proteins to the 124 ones already experimentally annotated with a mitochondrial targeting peptide.
Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*
Kowalsky, Caitlin A.; Faber, Matthew S.; Nath, Aritro; Dann, Hailey E.; Kelly, Vince W.; Liu, Li; Shanker, Purva; Wagner, Ellen K.; Maynard, Jennifer A.; Chan, Christina; Whitehead, Timothy A.
2015-01-01
Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891
FRESCO: Referential compression of highly similar sequences.
Wandelt, Sebastian; Leser, Ulf
2013-01-01
In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.
Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel
2007-01-01
Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. Conclusion UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page. PMID:17629909
NASA Astrophysics Data System (ADS)
Hellwagner, Johannes; Sharma, Kshama; Tan, Kong Ooi; Wittmann, Johannes J.; Meier, Beat H.; Madhu, P. K.; Ernst, Matthias
2017-06-01
Pulse imperfections like pulse transients and radio-frequency field maladjustment or inhomogeneity are the main sources of performance degradation and limited reproducibility in solid-state nuclear magnetic resonance experiments. We quantitatively analyze the influence of such imperfections on the performance of symmetry-based pulse sequences and describe how they can be compensated. Based on a triple-mode Floquet analysis, we develop a theoretical description of symmetry-based dipolar recoupling sequences, in particular, R2 6411, calculating first- and second-order effective Hamiltonians using real pulse shapes. We discuss the various origins of effective fields, namely, pulse transients, deviation from the ideal flip angle, and fictitious fields, and develop strategies to counteract them for the restoration of full transfer efficiency. We compare experimental applications of transient-compensated pulses and an asynchronous implementation of the sequence to a supercycle, SR26, which is known to be efficient in compensating higher-order error terms. We are able to show the superiority of R26 compared to the supercycle, SR26, given the ability to reduce experimental error on the pulse sequence by pulse-transient compensation and a complete theoretical understanding of the sequence.
MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.
Grimes, Susan M; Ji, Hanlee P
2014-08-27
Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.
Position specific variation in the rate of evolution in transcription factor binding sites
Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B
2003-01-01
Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
fMRI reliability: influences of task and experimental design.
Bennett, Craig M; Miller, Michael B
2013-12-01
As scientists, it is imperative that we understand not only the power of our research tools to yield results, but also their ability to obtain similar results over time. This study is an investigation into how common decisions made during the design and analysis of a functional magnetic resonance imaging (fMRI) study can influence the reliability of the statistical results. To that end, we gathered back-to-back test-retest fMRI data during an experiment involving multiple cognitive tasks (episodic recognition and two-back working memory) and multiple fMRI experimental designs (block, event-related genetic sequence, and event-related m-sequence). Using these data, we were able to investigate the relative influences of task, design, statistical contrast (task vs. rest, target vs. nontarget), and statistical thresholding (unthresholded, thresholded) on fMRI reliability, as measured by the intraclass correlation (ICC) coefficient. We also utilized data from a second study to investigate test-retest reliability after an extended, six-month interval. We found that all of the factors above were statistically significant, but that they had varying levels of influence on the observed ICC values. We also found that these factors could interact, increasing or decreasing the relative reliability of certain Task × Design combinations. The results suggest that fMRI reliability is a complex construct whose value may be increased or decreased by specific combinations of factors.
Template-Based 3D Reconstruction of Non-rigid Deformable Object from Monocular Video
NASA Astrophysics Data System (ADS)
Liu, Yang; Peng, Xiaodong; Zhou, Wugen; Liu, Bo; Gerndt, Andreas
2018-06-01
In this paper, we propose a template-based 3D surface reconstruction system of non-rigid deformable objects from monocular video sequence. Firstly, we generate a semi-dense template of the target object with structure from motion method using a subsequence video. This video can be captured by rigid moving camera orienting the static target object or by a static camera observing the rigid moving target object. Then, with the reference template mesh as input and based on the framework of classical template-based methods, we solve an energy minimization problem to get the correspondence between the template and every frame to get the time-varying mesh to present the deformation of objects. The energy terms combine photometric cost, temporal and spatial smoothness cost as well as as-rigid-as-possible cost which can enable elastic deformation. In this paper, an easy and controllable solution to generate the semi-dense template for complex objects is presented. Besides, we use an effective iterative Schur based linear solver for the energy minimization problem. The experimental evaluation presents qualitative deformation objects reconstruction results with real sequences. Compare against the results with other templates as input, the reconstructions based on our template have more accurate and detailed results for certain regions. The experimental results show that the linear solver we used performs better efficiency compared to traditional conjugate gradient based solver.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, M.D.; Pilch, M.; Brockmann, J.E.
Two experiments, DCH-3 and DCH-4, were performed at the Surtsey test facility to investigate phenomena associated with a high-pressure melt ejection (HPME) reactor accident sequence resulting in direct containment heating (DCH). These experiments were performed using the same experimental apparatus with identical initial conditions, except that the Surtsey test vessel contained air in DCH-3 and argon in DCH-4. Inerting the vessel with argon eliminated chemical reactions between metallic debris and oxygen. Thus, a comparison of the pressure response in DCH-3 and DCH-4 gave an indication of the DCH contribution due to metal/oxygen reactions. 44 refs., 110 figs., 43 tabs.
Scene-based nonuniformity correction with reduced ghosting using a gated LMS algorithm.
Hardie, Russell C; Baxley, Frank; Brys, Brandon; Hytla, Patrick
2009-08-17
In this paper, we present a scene-based nouniformity correction (NUC) method using a modified adaptive least mean square (LMS) algorithm with a novel gating operation on the updates. The gating is designed to significantly reduce ghosting artifacts produced by many scene-based NUC algorithms by halting updates when temporal variation is lacking. We define the algorithm and present a number of experimental results to demonstrate the efficacy of the proposed method in comparison to several previously published methods including other LMS and constant statistics based methods. The experimental results include simulated imagery and a real infrared image sequence. We show that the proposed method significantly reduces ghosting artifacts, but has a slightly longer convergence time. (c) 2009 Optical Society of America
Spatio-temporal alignment of pedobarographic image sequences.
Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S
2011-07-01
This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P < 0.001) than the linear temporal model. This article represents an important step forward in the alignment of pedobarographic image data, since previous methods can only be applied on static images.
Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.
Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W
2018-02-01
Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.
Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M
2008-08-08
Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.
EGASP: the human ENCODE Genome Annotation Assessment Project
Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G
2006-01-01
Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836
Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach
Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; Segal, Eran; Stolovitzky, Gustavo; Siwo, Geoffrey; Rider, Andrew K.; Tan, Asako; Pinapati, Richard S.; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael T.; Tung, Yi-An; Chen, Yong-Syuan; Chen, Mei-Ju May; Chen, Chien-Yu; Knight, Jason M.; Sahraeian, Sayed Mohammad Ebrahim; Esfahani, Mohammad Shahrokh; Dreos, Rene; Bucher, Philipp; Maier, Ezekiel; Saeys, Yvan; Szczurek, Ewa; Myšičková, Alena; Vingron, Martin; Klein, Holger; Kiełbasa, Szymon M.; Knisley, Jeff; Bonnell, Jeff; Knisley, Debra; Kursa, Miron B.; Rudnicki, Witold R.; Bhattacharjee, Madhuchhanda; Sillanpää, Mikko J.; Yeung, James; Meysman, Pieter; Rodríguez, Aminael Sánchez; Engelen, Kristof; Marchal, Kathleen; Huang, Yezhou; Mordelet, Fantine; Hartemink, Alexander; Pinello, Luca; Yuan, Guo-Cheng
2013-01-01
The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites. PMID:23950146
[Replication of Streptomyces plasmids: the DNA nucleotide sequence of plasmid pSB 24.2].
Bolotin, A P; Sorokin, A V; Aleksandrov, N N; Danilenko, V N; Kozlov, Iu I
1985-11-01
The nucleotide sequence of DNA in plasmid pSB 24.2, a natural deletion derivative of plasmid pSB 24.1 isolated from S. cyanogenus was studied. The plasmid amounted by its size to 3706 nucleotide pairs. The G-C composition was equal to 73 per cent. The analysis of the DNA structure in plasmid pSB 24.2 revealed the protein-encoding sequence of DNA, the continuity of which was significant for replication of the plasmid containing more than 1300 nucleotide pairs. The analysis also revealed two A-T-rich areas of DNA, the G-C composition of which was less than 55 per cent and a DNA area with a branched pin structure. The results may be of value in investigation of plasmid replication in actinomycetes and experimental cloning of DNA with this plasmid as a vector.
Compression of computer generated phase-shifting hologram sequence using AVC and HEVC
NASA Astrophysics Data System (ADS)
Xing, Yafei; Pesquet-Popescu, Béatrice; Dufaux, Frederic
2013-09-01
With the capability of achieving twice the compression ratio of Advanced Video Coding (AVC) with similar reconstruction quality, High Efficiency Video Coding (HEVC) is expected to become the newleading technique of video coding. In order to reduce the storage and transmission burden of digital holograms, in this paper we propose to use HEVC for compressing the phase-shifting digital hologram sequences (PSDHS). By simulating phase-shifting digital holography (PSDH) interferometry, interference patterns between illuminated three dimensional( 3D) virtual objects and the stepwise phase changed reference wave are generated as digital holograms. The hologram sequences are obtained by the movement of the virtual objects and compressed by AVC and HEVC. The experimental results show that AVC and HEVC are efficient to compress PSDHS, with HEVC giving better performance. Good compression rate and reconstruction quality can be obtained with bitrate above 15000kbps.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klesmith, Justin R.; Bacik, John -Paul; Michalczyk, Ryszard
Synthetic metabolic pathways often suffer from low specific productivity, and new methods that quickly assess pathway functionality for many thousands of variants are urgently needed. Here we present an approach that enables the rapid and parallel determination of sequence effects on flux for complete gene-encoding sequences. We show that this method can be used to determine the effects of over 8000 single point mutants of a pyrolysis oil catabolic pathway implanted in Escherichia coli. Experimental sequence-function data sets predicted whether fitness-enhancing mutations to the enzyme levoglucosan kinase resulted from enhanced catalytic efficiency or enzyme stability. A structure of one designmore » incorporating 38 mutations elucidated the structural basis of high fitness mutations. One design incorporating 15 beneficial mutations supported a 15-fold improvement in growth rate and greater than 24-fold improvement in enzyme activity relative to the starting pathway. Lastly, this technique can be extended to improve a wide variety of designed pathways.« less
Space debris detection in optical image sequences.
Xi, Jiangbo; Wen, Desheng; Ersoy, Okan K; Yi, Hongwei; Yao, Dalei; Song, Zongxi; Xi, Shaobo
2016-10-01
We present a high-accuracy, low false-alarm rate, and low computational-cost methodology for removing stars and noise and detecting space debris with low signal-to-noise ratio (SNR) in optical image sequences. First, time-index filtering and bright star intensity enhancement are implemented to remove stars and noise effectively. Then, a multistage quasi-hypothesis-testing method is proposed to detect the pieces of space debris with continuous and discontinuous trajectories. For this purpose, a time-index image is defined and generated. Experimental results show that the proposed method can detect space debris effectively without any false alarms. When the SNR is higher than or equal to 1.5, the detection probability can reach 100%, and when the SNR is as low as 1.3, 1.2, and 1, it can still achieve 99%, 97%, and 85% detection probabilities, respectively. Additionally, two large sets of image sequences are tested to show that the proposed method performs stably and effectively.
NASA Astrophysics Data System (ADS)
Azmi, N. I. L. Mohd; Ahmad, R.; Zainuddin, Z. M.
2017-09-01
This research explores the Mixed-Model Two-Sided Assembly Line (MMTSAL). There are two interrelated problems in MMTSAL which are line balancing and model sequencing. In previous studies, many researchers considered these problems separately and only few studied them simultaneously for one-sided line. However in this study, these two problems are solved simultaneously to obtain more efficient solution. The Mixed Integer Linear Programming (MILP) model with objectives of minimizing total utility work and idle time is generated by considering variable launching interval and assignment restriction constraint. The problem is analysed using small-size test cases to validate the integrated model. Throughout this paper, numerical experiment was conducted by using General Algebraic Modelling System (GAMS) with the solver CPLEX. Experimental results indicate that integrating the problems of model sequencing and line balancing help to minimise the proposed objectives function.
NASA Astrophysics Data System (ADS)
Yang, Hongxin; Su, Fulin
2018-01-01
We propose a moving target analysis algorithm using speeded-up robust features (SURF) and regular moment in inverse synthetic aperture radar (ISAR) image sequences. In our study, we first extract interest points from ISAR image sequences by SURF. Different from traditional feature point extraction methods, SURF-based feature points are invariant to scattering intensity, target rotation, and image size. Then, we employ a bilateral feature registering model to match these feature points. The feature registering scheme can not only search the isotropic feature points to link the image sequences but also reduce the error matching pairs. After that, the target centroid is detected by regular moment. Consequently, a cost function based on correlation coefficient is adopted to analyze the motion information. Experimental results based on simulated and real data validate the effectiveness and practicability of the proposed method.
Chen, Junjie; Guo, Mingyue; Li, Shumin; Liu, Bin
2017-11-01
As one of the most important tasks in protein sequence analysis, protein remote homology detection is critical for both basic research and practical applications. Here, we present an effective web server for protein remote homology detection called ProtDec-LTR2.0 by combining ProtDec-Learning to Rank (LTR) and pseudo protein representation. Experimental results showed that the detection performance is obviously improved. The web server provides a user-friendly interface to explore the sequence and structure information of candidate proteins and find their conserved domains by launching a multiple sequence alignment tool. The web server is free and open to all users with no login requirement at http://bioinformatics.hitsz.edu.cn/ProtDec-LTR2.0/. bliu@hit.edu.cn. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Recent research on the high-probability instructional sequence: A brief review.
Lipschultz, Joshua; Wilder, David A
2017-04-01
The high-probability (high-p) instructional sequence consists of the delivery of a series of high-probability instructions immediately before delivery of a low-probability or target instruction. It is commonly used to increase compliance in a variety of populations. Recent research has described variations of the high-p instructional sequence and examined the conditions under which the sequence is most effective. This manuscript reviews the most recent research on the sequence and identifies directions for future research. Recommendations for practitioners regarding the use of the high-p instructional sequence are also provided. © 2017 Society for the Experimental Analysis of Behavior.
Design and Analysis of Single-Cell Sequencing Experiments.
Grün, Dominic; van Oudenaarden, Alexander
2015-11-05
Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Enayatifar, Rasul; Sadaei, Hossein Javedani; Abdullah, Abdul Hanan; Lee, Malrey; Isnin, Ismail Fauzi
2015-08-01
Currently, there are many studies have conducted on developing security of the digital image in order to protect such data while they are sending on the internet. This work aims to propose a new approach based on a hybrid model of the Tinkerbell chaotic map, deoxyribonucleic acid (DNA) and cellular automata (CA). DNA rules, DNA sequence XOR operator and CA rules are used simultaneously to encrypt the plain-image pixels. To determine rule number in DNA sequence and also CA, a 2-dimension Tinkerbell chaotic map is employed. Experimental results and computer simulations, both confirm that the proposed scheme not only demonstrates outstanding encryption, but also resists various typical attacks.
Compressive sensing method for recognizing cat-eye effect targets.
Li, Li; Li, Hui; Dang, Ersheng; Liu, Bo
2013-10-01
This paper proposes a cat-eye effect target recognition method with compressive sensing (CS) and presents a recognition method (sample processing before reconstruction based on compressed sensing, or SPCS) for image processing. In this method, the linear projections of original image sequences are applied to remove dynamic background distractions and extract cat-eye effect targets. Furthermore, the corresponding imaging mechanism for acquiring active and passive image sequences is put forward. This method uses fewer images to recognize cat-eye effect targets, reduces data storage, and translates the traditional target identification, based on original image processing, into measurement vectors processing. The experimental results show that the SPCS method is feasible and superior to the shape-frequency dual criteria method.
Sahin, Deniz; Taflan, Sevket Onur; Yartas, Gizem; Ashktorab, Hassan; Smoot, Duane T
2018-04-25
Background: Gastric cancer is the second most common cancer among the malign cancer types. Inefficiency of traditional techniques both in diagnosis and therapy of the disease makes the development of alternative and novel techniques indispensable. As an alternative to traditional methods, tumor specific targeting small peptides can be used to increase the efficiency of the treatment and reduce the side effects related to traditional techniques. The aim of this study is screening and identification of individual peptides specifically targeted to human gastric cancer cells using a phage-displayed peptide library and designing specific peptide sequences by using experimentally-eluted peptide sequences. Methods: Here, MKN-45 human gastric cancer cells and HFE-145 human normal gastric epithelial cells were used as the target and control cells, respectively. 5 rounds of biopannning with a phage display 12-peptide library were applied following subtraction biopanning with HFE-145 control cells. The selected phage clones were established by enzyme-linked immunosorbent assay and immunofluorescence detection. We first obtain random phage clones after five biopanning rounds, determine the binding levels of each individual clone. Then, we analyze the frequencies of each amino acid in best binding clones to determine positively overexpressed amino acids for designing novel peptide sequences. Results: DE532 (VETSQYFRGTLS) phage clone was screened positive, showing specific binding on MKN-45 gastric cancer cells. DE-Obs (HNDLFPSWYHNY) peptide, which was designed by using amino acid frequencies of experimentally selected peptides in the 5th round of biopanning, showed specific binding in MKN-45 cells. Conclusion: Selection and characterization of individual clones may give us specifically binding peptides, but more importantly, data extracted from eluted phage clones may be used to design theoretical peptides with better binding properties than even experimentally selected ones. Both peptides, experimental and designed, may be potential candidates to be developed as useful diagnostic or therapeutic ligand molecules in gastric cancer research. Creative Commons Attribution License
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
NASA Astrophysics Data System (ADS)
Farstad, Jan Magnus Granheim; Netland, Øyvind; Welo, Torgeir
2017-10-01
This paper presents the results from a second series of experiments made to study local plastic deformations of a complex, hollow aluminium extrusion formed in roll bending. The first experimental series utilizing a single step roll bending sequence has been presented at the ESAFORM 2016 conference by Farstad et. al. In this recent experimental series, the same aluminium extrusion was formed in incremental steps. The objective was to investigate local distortions of the deformed cross section as a result of different number of steps employed to arrive at the final global shape of the extrusion. Moreover, the results between the two experimental series are compared, focusing on identifying differences in both the desired and the undesired deformations taking place as a result of bending and contact stresses. The profiles formed through multiple passes had less undesirable local distortions of the cross-section than the profiles that were formed in a single pass. However, the springback effect was more pronounced, meaning that the released radii of the profiles were higher.
Metabolism and Genetics of Helicobacter pylori: the Genome Era
Marais, Armelle; Mendz, George L.; Hazell, Stuart L.; Mégraud, Francis
1999-01-01
The publication of the complete sequence of Helicobacter pylori 26695 in 1997 and more recently that of strain J99 has provided new insight into the biology of this organism. In this review, we attempt to analyze and interpret the information provided by sequence annotations and to compare these data with those provided by experimental analyses. After a brief description of the general features of the genomes of the two sequenced strains, the principal metabolic pathways are analyzed. In particular, the enzymes encoded by H. pylori involved in fermentative and oxidative metabolism, lipopolysaccharide biosynthesis, nucleotide biosynthesis, aerobic and anaerobic respiration, and iron and nitrogen assimilation are described, and the areas of controversy between the experimental data and those provided by the sequence annotation are discussed. The role of urease, particularly in pH homeostasis, and other specialized mechanisms developed by the bacterium to maintain its internal pH are also considered. The replicational, transcriptional, and translational apparatuses are reviewed, as is the regulatory network. The numerous findings on the metabolism of the bacteria and the paucity of gene expression regulation systems are indicative of the high level of adaptation to the human gastric environment. Arguments in favor of the diversity of H. pylori and molecular data reflecting possible mechanisms involved in this diversity are presented. Finally, we compare the numerous experimental data on the colonization factors and those provided from the genome sequence annotation, in particular for genes involved in motility and adherence of the bacterium to the gastric tissue. PMID:10477311
Sequence and structural analyses of nuclear export signals in the NESdb database
Xu, Darui; Farmer, Alicia; Collett, Garen; Grishin, Nick V.; Chook, Yuh Min
2012-01-01
We compiled >200 nuclear export signal (NES)–containing CRM1 cargoes in a database named NESdb. We analyzed the sequences and three-dimensional structures of natural, experimentally identified NESs and of false-positive NESs that were generated from the database in order to identify properties that might distinguish the two groups of sequences. Analyses of amino acid frequencies, sequence logos, and agreement with existing NES consensus sequences revealed strong preferences for the Φ1-X3-Φ2-X2-Φ3-X-Φ4 pattern and for negatively charged amino acids in the nonhydrophobic positions of experimentally identified NESs but not of false positives. Strong preferences against certain hydrophobic amino acids in the hydrophobic positions were also revealed. These findings led to a new and more precise NES consensus. More important, three-dimensional structures are now available for 68 NESs within 56 different cargo proteins. Analyses of these structures showed that experimentally identified NESs are more likely than the false positives to adopt α-helical conformations that transition to loops at their C-termini and more likely to be surface accessible within their protein domains or be present in disordered or unobserved parts of the structures. Such distinguishing features for real NESs might be useful in future NES prediction efforts. Finally, we also tested CRM1-binding of 40 NESs that were found in the 56 structures. We found that 16 of the NES peptides did not bind CRM1, hence illustrating how NESs are easily misidentified. PMID:22833565
Training the max-margin sequence model with the relaxed slack variables.
Niu, Lingfeng; Wu, Jianmin; Shi, Yong
2012-09-01
Sequence models are widely used in many applications such as natural language processing, information extraction and optical character recognition, etc. We propose a new approach to train the max-margin based sequence model by relaxing the slack variables in this paper. With the canonical feature mapping definition, the relaxed problem is solved by training a multiclass Support Vector Machine (SVM). Compared with the state-of-the-art solutions for the sequence learning, the new method has the following advantages: firstly, the sequence training problem is transformed into a multiclassification problem, which is more widely studied and already has quite a few off-the-shelf training packages; secondly, this new approach reduces the complexity of training significantly and achieves comparable prediction performance compared with the existing sequence models; thirdly, when the size of training data is limited, by assigning different slack variables to different microlabel pairs, the new method can use the discriminative information more frugally and produces more reliable model; last but not least, by employing kernels in the intermediate multiclass SVM, nonlinear feature space can be easily explored. Experimental results on the task of named entity recognition, information extraction and handwritten letter recognition with the public datasets illustrate the efficiency and effectiveness of our method. Copyright © 2012 Elsevier Ltd. All rights reserved.
The siRNA Non-seed Region and Its Target Sequences Are Auxiliary Determinants of Off-Target Effects.
Kamola, Piotr J; Nakano, Yuko; Takahashi, Tomoko; Wilson, Paul A; Ui-Tei, Kumiko
2015-12-01
RNA interference (RNAi) is a powerful tool for post-transcriptional gene silencing. However, the siRNA guide strand may bind unintended off-target transcripts via partial sequence complementarity by a mechanism closely mirroring micro RNA (miRNA) silencing. To better understand these off-target effects, we investigated the correlation between sequence features within various subsections of siRNA guide strands, and its corresponding target sequences, with off-target activities. Our results confirm previous reports that strength of base-pairing in the siRNA seed region is the primary factor determining the efficiency of off-target silencing. However, the degree of downregulation of off-target transcripts with shared seed sequence is not necessarily similar, suggesting that there are additional auxiliary factors that influence the silencing potential. Here, we demonstrate that both the melting temperature (Tm) in a subsection of siRNA non-seed region, and the GC contents of its corresponding target sequences, are negatively correlated with the efficiency of off-target effect. Analysis of experimentally validated miRNA targets demonstrated a similar trend, indicating a putative conserved mechanistic feature of seed region-dependent targeting mechanism. These observations may prove useful as parameters for off-target prediction algorithms and improve siRNA 'specificity' design rules.
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.
Paez-Espino, David; Chen, I-Min A; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M; Nielsen, Torben; Huntemann, Marcel; K Reddy, T B; Pavlopoulos, Georgios A; Sullivan, Matthew B; Campbell, Barbara J; Chen, Feng; McMahon, Katherine; Hallam, Steve J; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M; Streit, Wolfgang R; Webster, John; Handley, Kim M; Salekdeh, Ghasem H; Tsesmetzis, Nicolas; Setubal, Joao C; Pope, Phillip B; Liu, Wen-Tso; Rivers, Adam R; Ivanova, Natalia N; Kyrpides, Nikos C
2017-01-04
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A multiplex primer design algorithm for target amplification of continuous genomic regions.
Ozturk, Ahmet Rasit; Can, Tolga
2017-06-19
Targeted Next Generation Sequencing (NGS) assays are cost-efficient and reliable alternatives to Sanger sequencing. For sequencing of very large set of genes, the target enrichment approach is suitable. However, for smaller genomic regions, the target amplification method is more efficient than both the target enrichment method and Sanger sequencing. The major difficulty of the target amplification method is the preparation of amplicons, regarding required time, equipment, and labor. Multiplex PCR (MPCR) is a good solution for the mentioned problems. We propose a novel method to design MPCR primers for a continuous genomic region, following the best practices of clinically reliable PCR design processes. On an experimental setup with 48 different combinations of factors, we have shown that multiple parameters might effect finding the first feasible solution. Increasing the length of the initial primer candidate selection sequence gives better results whereas waiting for a longer time to find the first feasible solution does not have a significant impact. We generated MPCR primer designs for the HBB whole gene, MEFV coding regions, and human exons between 2000 bp to 2100 bp-long. Our benchmarking experiments show that the proposed MPCR approach is able produce reliable NGS assay primers for a given sequence in a reasonable amount of time.
BioSAVE: Display of scored annotation within a sequence context
Pollock, Richard F; Adryan, Boris
2008-01-01
Background Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. Results We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Conclusion Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage. PMID:18366701
Lebœuf, David; Ciesielski, Jennifer
2012-01-01
Highly functionalized cyclopentenones can be generated stereospecifically by a chemoselective copper(II)-mediated Nazarov/Wagner-Meerwein rearrangement sequence of divinyl ketones. A detailed investigation of this sequence is described including a study of substrate scope and limitations. After the initial 4π electrocyclization, this reaction proceeds via two different sequential [1,2]-shifts, with selectivity that depends upon either migratory ability or the steric bulkiness of the substituents at C1 and C5. This methodology allows the creation of vicinal stereogenic centers, including adjacent quaternary centers. This sequence can also be achieved by using a catalytic amount of copper(II) in combination with NaBAr4f, a weak Lewis acid. During the study of the scope of the reaction, a partial or complete E / Z isomerization of the enone moiety was observed in some cases prior to the cyclization, which resulted in a mixture of diastereomeric products. Use of a Cu(II)-bisoxazoline complex prevented the isomerization, allowing high diastereoselectivity to be obtained in all substrate types. In addition, the reaction sequence was studied by DFT computations at the UB3LYP/6-31G(d,p) level, which are consistent with the proposed sequences observed, including E / Z isomerizations and chemoselective Wagner-Meerwein shifts. PMID:22471833
Alam, Nuhu; Shim, Mi Ja; Lee, Min Woong; Shin, Pyeong Gyun; Yoo, Young Bok; Lee, Tae Soo
2009-09-01
The molecular phylogeny in nine different commercial cultivated strains of Pleurotus nebrodensis was studied based on their internal transcribed spacer (ITS) region and RAPD. In the sequence of ITS region of selected strains, it was revealed that the total length ranged from 592 to 614 bp. The size of ITS1 and ITS2 regions varied among the strains from 219 to 228 bp and 211 to 229 bp, respectively. The sequence of ITS2 was more variable than ITS1 and the region of 5.8S sequences were identical. Phylogenetic tree of the ITS region sequences indicated that selected strains were classified into five clusters. The reciprocal homologies of the ITS region sequences ranged from 99 to 100%. The strains were also analyzed by RAPD with 20 arbitrary primers. Twelve primers were efficient to applying amplification of the genomic DNA. The sizes of the polymorphic fragments obtained were in the range of 200 to 2000 bp. RAPD and ITS analysis techniques were able to detect genetic variation among the tested strains. Experimental results suggested that IUM-1381, IUM-3914, IUM-1495 and AY-581431 strains were genetically very similar. Therefore, all IUM and NCBI gene bank strains of P. nebrodensis were genetically same with some variations.
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.
Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia
2017-03-14
Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Comprehensive analysis of orthologous protein domains using the HOPS database.
Storm, Christian E V; Sonnhammer, Erik L L
2003-10-01
One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Kanost, Michael R.; Arrese, Estela L.; Cao, Xiaolong; Chen, Yun-Ru; Chellapilla, Sanjay; Goldsmith, Marian R; Grosse-Wilde, Ewald; Heckel, David G.; Herndon, Nicolae; Jiang, Haobo; Papanicolaou, Alexie; Qu, Jiaxin; Soulages, Jose L.; Vogel, Heiko; Walters, James; Waterhouse, Robert M.; Ahn, Seung-Joon; Almeida, Francisca C.; An, Chunju; Aqrawi, Peshtewani; Bretschneider, Anne; Bryant, William B.; Bucks, Sascha; Chao, Hsu; Chevignon, Germain; Christen, Jayne M.; Clarke, David F.; Dittmer, Neal T.; Ferguson, Laura C.F.; Garavelou, Spyridoula; Gordon, Karl H.J.; Gunaratna, Ramesh T.; Han, Yi; Hauser, Frank; He, Yan; Heidel-Fischer, Hanna; Hirsh, Ariana; Hu, Yingxia; Jiang, Hongbo; Kalra, Divya; Klinner, Christian; König, Christopher; Kovar, Christie; Kroll, Ashley R.; Kuwar, Suyog S.; Lee, Sandy L.; Lehman, Rüdiger; Li, Kai; Li, Zhaofei; Liang, Hanquan; Lovelace, Shanna; Lu, Zhiqiang; Mansfield, Jennifer H.; McCulloch, Kyle J.; Mathew, Tittu; Morton, Brian; Muzny, Donna M.; Neunemann, David; Ongeri, Fiona; Pauchet, Yannick; Pu, Ling-Ling; Pyrousis, Ioannis; Rao, Xiang-Jun; Redding, Amanda; Roesel, Charles; Sanchez-Gracia, Alejandro; Schaack, Sarah; Shukla, Aditi; Tetreau, Guillaume; Wang, Yang; Xiong, Guang-Hua; Traut, Walther; Walsh, Tom K.; Worley, Kim C.; Wu, Di; Wu, Wenbi; Wu, Yuan-Qing; Zhang, Xiufeng; Zou, Zhen; Zucker, Hannah; Briscoe, Adriana D.; Burmester, Thorsten; Clem, Rollie J.; Feyereisen, René; Grimmelikhuijzen, Cornelis J.P; Hamodrakas, Stavros J.; Hansson, Bill S.; Huguet, Elisabeth; Jermiin, Lars S.; Lan, Que; Lehman, Herman K.; Lorenzen, Marce; Merzendorfer, Hans; Michalopoulos, Ioannis; Morton, David B.; Muthukrishnan, Subbaratnam; Oakeshott, John G.; Palmer, Will; Park, Yoonseong; Passarelli, A. Lorena; Rozas, Julio; Schwartz, Lawrence M.; Smith, Wendy; Southgate, Agnes; Vilcinskas, Andreas; Vogt, Richard; Wang, Ping; Werren, John; Yu, Xiao-Qiang; Zhou, Jing-Jiang; Brown, Susan J.; Scherer, Steven E.; Richards, Stephen; Blissard, Gary W.
2016-01-01
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects. PMID:27522922
GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.
Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun
2013-01-01
As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Human action classification using procrustes shape theory
NASA Astrophysics Data System (ADS)
Cho, Wanhyun; Kim, Sangkyoon; Park, Soonyoung; Lee, Myungeun
2015-02-01
In this paper, we propose new method that can classify a human action using Procrustes shape theory. First, we extract a pre-shape configuration vector of landmarks from each frame of an image sequence representing an arbitrary human action, and then we have derived the Procrustes fit vector for pre-shape configuration vector. Second, we extract a set of pre-shape vectors from tanning sample stored at database, and we compute a Procrustes mean shape vector for these preshape vectors. Third, we extract a sequence of the pre-shape vectors from input video, and we project this sequence of pre-shape vectors on the tangent space with respect to the pole taking as a sequence of mean shape vectors corresponding with a target video. And we calculate the Procrustes distance between two sequences of the projection pre-shape vectors on the tangent space and the mean shape vectors. Finally, we classify the input video into the human action class with minimum Procrustes distance. We assess a performance of the proposed method using one public dataset, namely Weizmann human action dataset. Experimental results reveal that the proposed method performs very good on this dataset.
Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg
2017-01-01
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
Zhou, Mu; Zhang, Qiao; Xu, Kunjie; Tian, Zengshan; Wang, Yanmeng; He, Wei
2015-01-01
Due to the wide deployment of wireless local area networks (WLAN), received signal strength (RSS)-based indoor WLAN localization has attracted considerable attention in both academia and industry. In this paper, we propose a novel page rank-based indoor mapping and localization (PRIMAL) by using the gene-sequenced unlabeled WLAN RSS for simultaneous localization and mapping (SLAM). Specifically, first of all, based on the observation of the motion patterns of the people in the target environment, we use the Allen logic to construct the mobility graph to characterize the connectivity among different areas of interest. Second, the concept of gene sequencing is utilized to assemble the sporadically-collected RSS sequences into a signal graph based on the transition relations among different RSS sequences. Third, we apply the graph drawing approach to exhibit both the mobility graph and signal graph in a more readable manner. Finally, the page rank (PR) algorithm is proposed to construct the mapping from the signal graph into the mobility graph. The experimental results show that the proposed approach achieves satisfactory localization accuracy and meanwhile avoids the intensive time and labor cost involved in the conventional location fingerprinting-based indoor WLAN localization. PMID:26404274
Recognizing human actions by learning and matching shape-motion prototype trees.
Jiang, Zhuolin; Lin, Zhe; Davis, Larry S
2012-03-01
A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.
Can Chunk Size Differences Explain Developmental Changes in Lexical Learning?
Smalle, Eleonore H. M.; Bogaerts, Louisa; Simonis, Morgane; Duyck, Wouter; Page, Michael P. A.; Edwards, Martin G.; Szmalec, Arnaud
2016-01-01
In three experiments, we investigated Hebb repetition learning (HRL) differences between children and adults, as a function of the type of item (lexical vs. sub-lexical) and the level of item-overlap between sequences. In a first experiment, it was shown that when non-repeating and repeating (Hebb) sequences of words were all permutations of the same words, HRL was slower than when the sequences shared no words. This item-overlap effect was observed in both children and adults. In a second experiment, we used syllable sequences and we observed reduced HRL due to item-overlap only in children. The findings are explained within a chunking account of the HRL effect on the basis of which we hypothesize that children, compared with adults, chunk syllable sequences in smaller units. By hypothesis, small chunks are more prone to interference from anagram representations included in the filler sequences, potentially explaining the item-overlap effect in children. This hypothesis was tested in a third experiment with adults where we experimentally manipulated the chunk size by embedding pauses in the syllable sequences. Interestingly, we showed that imposing a small chunk size caused adults to show the same behavioral effects as those observed in children. Departing from the analogy between verbal HRL and lexical development, the results are discussed in light of the less-is-more hypothesis of age-related differences in language acquisition. PMID:26779065
Onozawa, Masahiro; Zhang, Zhenhua; Kim, Yoo Jung; Goldberg, Liat; Varga, Tamas; Bergsagel, P Leif; Kuehl, W Michael; Aplan, Peter D
2014-05-27
We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed "templated sequence insertions" (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.
A combined computational-experimental analyses of selected metabolic enzymes in Pseudomonas species.
Perumal, Deepak; Lim, Chu Sing; Chow, Vincent T K; Sakharkar, Kishore R; Sakharkar, Meena K
2008-09-10
Comparative genomic analysis has revolutionized our ability to predict the metabolic subsystems that occur in newly sequenced genomes, and to explore the functional roles of the set of genes within each subsystem. These computational predictions can considerably reduce the volume of experimental studies required to assess basic metabolic properties of multiple bacterial species. However, experimental validations are still required to resolve the apparent inconsistencies in the predictions by multiple resources. Here, we present combined computational-experimental analyses on eight completely sequenced Pseudomonas species. Comparative pathway analyses reveal that several pathways within the Pseudomonas species show high plasticity and versatility. Potential bypasses in 11 metabolic pathways were identified. We further confirmed the presence of the enzyme O-acetyl homoserine (thiol) lyase (EC: 2.5.1.49) in P. syringae pv. tomato that revealed inconsistent annotations in KEGG and in the recently published SYSTOMONAS database. These analyses connect and integrate systematic data generation, computational data interpretation, and experimental validation and represent a synergistic and powerful means for conducting biological research.
Alertness Modulates Conflict Adaptation and Feature Integration in an Opposite Way
Chen, Jia; Huang, Xiting; Chen, Antao
2013-01-01
Previous studies show that the congruency sequence effect can result from both the conflict adaptation effect (CAE) and feature integration effect which can be observed as the repetition priming effect (RPE) and feature overlap effect (FOE) depending on different experimental conditions. Evidence from neuroimaging studies suggests that a close correlation exists between the neural mechanisms of alertness-related modulations and the congruency sequence effect. However, little is known about whether and how alertness mediates the congruency sequence effect. In Experiment 1, the Attentional Networks Test (ANT) and a modified flanker task were used to evaluate whether the alertness of the attentional functions had a correlation with the CAE and RPE. In Experimental 2, the ANT and another modified flanker task were used to investigate whether alertness of the attentional functions correlate with the CAE and FOE. In Experiment 1, through the correlative analysis, we found a significant positive correlation between alertness and the CAE, and a negative correlation between the alertness and the RPE. Moreover, a significant negative correlation existed between CAE and RPE. In Experiment 2, we found a marginally significant negative correlation between the CAE and the RPE, but the correlation between alertness and FOE, CAE and FOE was not significant. These results suggest that alertness can modulate conflict adaptation and feature integration in an opposite way. Participants at the high alerting level group may tend to use the top-down cognitive processing strategy, whereas participants at the low alerting level group tend to use the bottom-up processing strategy. PMID:24250824
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riback, Joshua A.; Bowman, Micayla A.; Zmyslowski, Adam M.
A substantial fraction of the proteome is intrinsically disordered, and even well-folded proteins adopt non-native geometries during synthesis, folding, transport, and turnover. Characterization of intrinsically disordered proteins (IDPs) is challenging, in part because of a lack of accurate physical models and the difficulty of interpreting experimental results. We have developed a general method to extract the dimensions and solvent quality (self-interactions) of IDPs from a single small-angle x-ray scattering measurement. We applied this procedure to a variety of IDPs and found that even IDPs with low net charge and high hydrophobicity remain highly expanded in water, contrary to the generalmore » expectation that protein-like sequences collapse in water. Our results suggest that the unfolded state of most foldable sequences is expanded; we conjecture that this property was selected by evolution to minimize misfolding and aggregation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacobina, C.B.; Silva, E.R.C. da; Lima, A.M.N.
This paper investigates the PWM operation of a four switch three phase inverter (FSTPI), in the case of digital implementation. Different switching sequence strategies for vector control are described and a digital scalar method is also presented. The influence of different switching patterns on the output voltage symmetry, current waveform and switching frequency are examined. The results obtained by employing the vector and scalar strategies are compared and a relationship between them is established. This comparison is based on analytical study and is corroborated either by the computer simulations and by the experimental results. The vector approach makes ease themore » understanding and analysis of the FSTPI, as well the choice of a PWM pattern. However, similar results may be obtained through the scalar approach, which has a simpler implementation. The experimental results of the use of the FSTPI and digital PWM to control an induction motor are presented.« less
Nonexponential Decoherence and Subdiffusion in Atom-Optics Kicked Rotor.
Sarkar, Sumit; Paul, Sanku; Vishwakarma, Chetan; Kumar, Sunil; Verma, Gunjan; Sainath, M; Rapol, Umakant D; Santhanam, M S
2017-04-28
Quantum systems lose coherence upon interaction with the environment and tend towards classical states. Quantum coherence is known to exponentially decay in time so that macroscopic quantum superpositions are generally unsustainable. In this work, slower than exponential decay of coherences is experimentally realized in an atom-optics kicked rotor system subjected to nonstationary Lévy noise in the applied kick sequence. The slower coherence decay manifests in the form of quantum subdiffusion that can be controlled through the Lévy exponent. The experimental results are in good agreement with the analytical estimates and numerical simulations for the mean energy growth and momentum profiles of an atom-optics kicked rotor.
Counterbalancing for Serial Order Carryover Effects in Experimental Condition Orders
ERIC Educational Resources Information Center
Brooks, Joseph L.
2012-01-01
Reactions of neural, psychological, and social systems are rarely, if ever, independent of previous inputs and states. The potential for serial order carryover effects from one condition to the next in a sequence of experimental trials makes counterbalancing of condition order an essential part of experimental design. Here, a method is proposed…
Motion estimation of magnetic resonance cardiac images using the Wigner-Ville and hough transforms
NASA Astrophysics Data System (ADS)
Carranza, N.; Cristóbal, G.; Bayerl, P.; Neumann, H.
2007-12-01
Myocardial motion analysis and quantification is of utmost importance for analyzing contractile heart abnormalities and it can be a symptom of a coronary artery disease. A fundamental problem in processing sequences of images is the computation of the optical flow, which is an approximation of the real image motion. This paper presents a new algorithm for optical flow estimation based on a spatiotemporal-frequency (STF) approach. More specifically it relies on the computation of the Wigner-Ville distribution (WVD) and the Hough Transform (HT) of the motion sequences. The latter is a well-known line and shape detection method that is highly robust against incomplete data and noise. The rationale of using the HT in this context is that it provides a value of the displacement field from the STF representation. In addition, a probabilistic approach based on Gaussian mixtures has been implemented in order to improve the accuracy of the motion detection. Experimental results in the case of synthetic sequences are compared with an implementation of the variational technique for local and global motion estimation, where it is shown that the results are accurate and robust to noise degradations. Results obtained with real cardiac magnetic resonance images are presented.
Burden, S; Lin, Y-X; Zhang, R
2005-03-01
Although a great deal of research has been undertaken in the area of promoter prediction, prediction techniques are still not fully developed. Many algorithms tend to exhibit poor specificity, generating many false positives, or poor sensitivity. The neural network prediction program NNPP2.2 is one such example. To improve the NNPP2.2 prediction technique, the distance between the transcription start site (TSS) associated with the promoter and the translation start site (TLS) of the subsequent gene coding region has been studied for Escherichia coli K12 bacteria. An empirical probability distribution that is consistent for all E.coli promoters has been established. This information is combined with the results from NNPP2.2 to create a new technique called TLS-NNPP, which improves the specificity of promoter prediction. The technique is shown to be effective using E.coli DNA sequences, however, it is applicable to any organism for which a set of promoters has been experimentally defined. The data used in this project and the prediction results for the tested sequences can be obtained from http://www.uow.edu.au/~yanxia/E_Coli_paper/SBurden_Results.xls alh98@uow.edu.au.
Tracking Algorithm of Multiple Pedestrians Based on Particle Filters in Video Sequences
Liu, Yun; Wang, Chuanxu; Zhang, Shujun; Cui, Xuehong
2016-01-01
Pedestrian tracking is a critical problem in the field of computer vision. Particle filters have been proven to be very useful in pedestrian tracking for nonlinear and non-Gaussian estimation problems. However, pedestrian tracking in complex environment is still facing many problems due to changes of pedestrian postures and scale, moving background, mutual occlusion, and presence of pedestrian. To surmount these difficulties, this paper presents tracking algorithm of multiple pedestrians based on particle filters in video sequences. The algorithm acquires confidence value of the object and the background through extracting a priori knowledge thus to achieve multipedestrian detection; it adopts color and texture features into particle filter to get better observation results and then automatically adjusts weight value of each feature according to current tracking environment. During the process of tracking, the algorithm processes severe occlusion condition to prevent drift and loss phenomena caused by object occlusion and associates detection results with particle state to propose discriminated method for object disappearance and emergence thus to achieve robust tracking of multiple pedestrians. Experimental verification and analysis in video sequences demonstrate that proposed algorithm improves the tracking performance and has better tracking results. PMID:27847514
Reactivation, Replay, and Preplay: How It Might All Fit Together
Buhry, Laure; Azizi, Amir H.; Cheng, Sen
2011-01-01
Sequential activation of neurons that occurs during “offline” states, such as sleep or awake rest, is correlated with neural sequences recorded during preceding exploration phases. This so-called reactivation, or replay, has been observed in a number of different brain regions such as the striatum, prefrontal cortex, primary visual cortex and, most prominently, the hippocampus. Reactivation largely co-occurs together with hippocampal sharp-waves/ripples, brief high-frequency bursts in the local field potential. Here, we first review the mounting evidence for the hypothesis that reactivation is the neural mechanism for memory consolidation during sleep. We then discuss recent results that suggest that offline sequential activity in the waking state might not be simple repetitions of previously experienced sequences. Some offline sequential activity occurs before animals are exposed to a novel environment for the first time, and some sequences activated offline correspond to trajectories never experienced by the animal. We propose a conceptual framework for the dynamics of offline sequential activity that can parsimoniously describe a broad spectrum of experimental results. These results point to a potentially broader role of offline sequential activity in cognitive functions such as maintenance of spatial representation, learning, or planning. PMID:21918724
Lehmann, Jason S.; Matthias, Michael A.; Vinetz, Joseph M.; Fouts, Derrick E.
2014-01-01
Leptospirosis, caused by pathogenic spirochetes belonging to the genus Leptospira, is a zoonosis with important impacts on human and animal health worldwide. Research on the mechanisms of Leptospira pathogenesis has been hindered due to slow growth of infectious strains, poor transformability, and a paucity of genetic tools. As a result of second generation sequencing technologies, there has been an acceleration of leptospiral genome sequencing efforts in the past decade, which has enabled a concomitant increase in functional genomics analyses of Leptospira pathogenesis. A pathogenomics approach, by coupling of pan-genomic analysis of multiple isolates with sequencing of experimentally attenuated highly pathogenic Leptospira, has resulted in the functional inference of virulence factors. The global Leptospira Genome Project supported by the U.S. National Institute of Allergy and Infectious Diseases to which key scientific contributions have been made from the international leptospirosis research community has provided a new roadmap for comprehensive studies of Leptospira and leptospirosis well into the future. This review describes functional genomics approaches to apply the data generated by the Leptospira Genome Project towards deepening our knowledge of virulence factors of Leptospira using the emerging discipline of pathogenomics. PMID:25437801
The R package 'RLumModel': Simulating charge transfer in quartz
NASA Astrophysics Data System (ADS)
Friedrich, Johannes; Kreutzer, Sebastian; Schmidt, Christoph
2017-04-01
Kinetic models of quartz luminescence have gained an important role for predicting experimental results and for understanding charge transfers in (natural) quartz as well as for other dosimetric materials, e.g., Al2O3:C. We present the R package 'RLumModel', offering an easy-to-use tool for simulating quartz luminescence signals (TL, OSL, LM-OSL and RF) based on five integrated and published parameter sets as well as the possibility to use own parameters. Simulation commands can be created (a) using the Risø Sequence Editor, (b) a built-in SAR sequence generator or (c) self-explanatory keywords for customised sequences. Results can be analysed seamlessly using the R package 'Luminescence' along with a visualisation of concentrations of electrons and holes in every trap/centre as well as in the valence and conduction band during all stages of the simulation. Modelling luminescence signals can help understanding charge transfer processes occurring in nature or during measurements in the laboratory. This will lead to a better understanding of several processes concerning geoscientific questions, because quartz is the second most abundant mineral in the Earth's continental crust.
GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda
2014-01-01
Background Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally demanding, since they need to handle not only several million query microRNA and reference RNA pairs, but also several million nucleotide comparisons within each given pair. Thus, the need to perform microRNA identification at such large scale has increased the demand for parallel computing. Methods Although most CMTI programs (e.g., the miRanda algorithm) are based on a modified Smith-Waterman (SW) algorithm, the existing parallel SW implementations (e.g., CUDASW++ 2.0/3.0, SWIPE) are unable to meet this demand in CMTI tasks. We present CUDA-miRanda, a fast microRNA target identification algorithm that takes advantage of massively parallel computing on Graphics Processing Units (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). CUDA-miRanda specifically focuses on the local alignment of short (i.e., ≤ 32 nucleotides) sequences against longer reference sequences (e.g., 20K nucleotides). Moreover, the proposed algorithm is able to report multiple alignments (up to 191 top scores) and the corresponding traceback sequences for any given (query sequence, reference sequence) pair. Results Speeds over 5.36 Giga Cell Updates Per Second (GCUPs) are achieved on a server with 4 NVIDIA Tesla M2090 GPUs. Compared to the original miRanda algorithm, which is evaluated on an Intel Xeon E5620@2.4 GHz CPU, the experimental results show up to 166 times performance gains in terms of execution time. In addition, we have verified that the exact same targets were predicted in both CUDA-miRanda and the original miRanda implementations through multiple test datasets. Conclusions We offer a GPU-based alternative to high performance compute (HPC) that can be developed locally at a relatively small cost. The community of GPU developers in the biomedical research community, particularly for genome analysis, is still growing. With increasing shared resources, this community will be able to advance CMTI in a very significant manner. Our source code is available at https://sourceforge.net/projects/cudamiranda/. PMID:25077821
Cumulative Weighing of Time in Intertemporal Tradeoffs
2016-01-01
We examine preferences for sequences of delayed monetary gains. In the experimental literature, two prominent models have been advanced as psychological descriptions of preferences for sequences. In one model, the instantaneous utilities of the outcomes in a sequence are discounted as a function of their delays, and assembled into a discounted utility of the sequence. In the other model, the accumulated utility of the outcomes in a sequence is considered along with utility or disutility from improvement in outcome utilities and utility or disutility from the spreading of outcome utilities. Drawing on three threads of evidence concerning preferences for sequences of monetary gains, we propose that the accumulated utility of the outcomes in a sequence is traded off against the duration of utility accumulation. In our first experiment, aggregate choice behavior provides qualitative support for the tradeoff model. In three subsequent experiments, one of which incentivized, disaggregate choice behavior provides quantitative support for the tradeoff model in Bayesian model contests. One thread of evidence motivating the tradeoff model is that, when, in the choice between two single dated outcomes, it is conveyed that receiving less sooner means receiving nothing later, preference for receiving more later increases, but when it is conveyed that receiving more later means receiving nothing sooner, preference is left unchanged. Our results show that this asymmetric hidden-zero effect is indeed driven by those supporting the tradeoff model. The tradeoff model also accommodates all remaining evidence on preferences for sequences of monetary gains. PMID:27560853
Optimal control design of turbo spin‐echo sequences with applications to parallel‐transmit systems
Hoogduin, Hans; Hajnal, Joseph V.; van den Berg, Cornelis A. T.; Luijten, Peter R.; Malik, Shaihan J.
2016-01-01
Purpose The design of turbo spin‐echo sequences is modeled as a dynamic optimization problem which includes the case of inhomogeneous transmit radiofrequency fields. This problem is efficiently solved by optimal control techniques making it possible to design patient‐specific sequences online. Theory and Methods The extended phase graph formalism is employed to model the signal evolution. The design problem is cast as an optimal control problem and an efficient numerical procedure for its solution is given. The numerical and experimental tests address standard multiecho sequences and pTx configurations. Results Standard, analytically derived flip angle trains are recovered by the numerical optimal control approach. New sequences are designed where constraints on radiofrequency total and peak power are included. In the case of parallel transmit application, the method is able to calculate the optimal echo train for two‐dimensional and three‐dimensional turbo spin echo sequences in the order of 10 s with a single central processing unit (CPU) implementation. The image contrast is maintained through the whole field of view despite inhomogeneities of the radiofrequency fields. Conclusion The optimal control design sheds new light on the sequence design process and makes it possible to design sequences in an online, patient‐specific fashion. Magn Reson Med 77:361–373, 2017. © 2016 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine PMID:26800383
NASA Astrophysics Data System (ADS)
Mielke, Steven P.; Grønbech-Jensen, Niels; Krishnan, V. V.; Fink, William H.; Benham, Craig J.
2005-09-01
The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.
Mielke, Steven P; Grønbech-Jensen, Niels; Krishnan, V V; Fink, William H; Benham, Craig J
2005-09-22
The topological state of DNA in vivo is dynamically regulated by a number of processes that involve interactions with bound proteins. In one such process, the tracking of RNA polymerase along the double helix during transcription, restriction of rotational motion of the polymerase and associated structures, generates waves of overtwist downstream and undertwist upstream from the site of transcription. The resulting superhelical stress is often sufficient to drive double-stranded DNA into a denatured state at locations such as promoters and origins of replication, where sequence-specific duplex opening is a prerequisite for biological function. In this way, transcription and other events that actively supercoil the DNA provide a mechanism for dynamically coupling genetic activity with regulatory and other cellular processes. Although computer modeling has provided insight into the equilibrium dynamics of DNA supercoiling, to date no model has appeared for simulating sequence-dependent DNA strand separation under the nonequilibrium conditions imposed by the dynamic introduction of torsional stress. Here, we introduce such a model and present results from an initial set of computer simulations in which the sequences of dynamically superhelical, 147 base pair DNA circles were systematically altered in order to probe the accuracy with which the model can predict location, extent, and time of stress-induced duplex denaturation. The results agree both with well-tested statistical mechanical calculations and with available experimental information. Additionally, we find that sites susceptible to denaturation show a propensity for localizing to supercoil apices, suggesting that base sequence determines locations of strand separation not only through the energetics of interstrand interactions, but also by influencing the geometry of supercoiling.
Deep Recurrent Neural Networks for Human Activity Recognition
Murad, Abdulmajid
2017-01-01
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103
Deep Recurrent Neural Networks for Human Activity Recognition.
Murad, Abdulmajid; Pyun, Jae-Young
2017-11-06
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.
Zou, Jiaqi; Li, Na
2013-09-01
Proper design of nucleic acid sequences is crucial for many applications. We have previously established a thermodynamics-based quantitative model to help design aptamer-based nucleic acid probes by predicting equilibrium concentrations of all interacting species. To facilitate customization of this thermodynamic model for different applications, here we present a generic and easy-to-use platform to implement the algorithm of the model with Microsoft(®) Excel formulas and VBA (Visual Basic for Applications) macros. Two Excel spreadsheets have been developed: one for the applications involving only nucleic acid species, the other for the applications involving both nucleic acid and non-nucleic acid species. The spreadsheets take the nucleic acid sequences and the initial concentrations of all species as input, guide the user to retrieve the necessary thermodynamic constants, and finally calculate equilibrium concentrations for all species in various bound and unbound conformations. The validity of both spreadsheets has been verified by comparing the modeling results with the experimental results on nucleic acid sequences reported in the literature. This Excel-based platform described here will allow biomedical researchers to rationalize the sequence design of nucleic acid probes using the thermodynamics-based modeling even without relevant theoretical and computational skills. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Nguyen, Kieu T H; Adamkiewicz, Marta A; Hebert, Lauren E; Zygiel, Emily M; Boyle, Holly R; Martone, Christina M; Meléndez-Ríos, Carola B; Noren, Karen A; Noren, Christopher J; Hall, Marilena Fitzsimons
2014-10-01
A target-unrelated peptide (TUP) can arise in phage display selection experiments as a result of a propagation advantage exhibited by the phage clone displaying the peptide. We previously characterized HAIYPRH, from the M13-based Ph.D.-7 phage display library, as a propagation-related TUP resulting from a G→A mutation in the Shine-Dalgarno sequence of gene II. This mutant was shown to propagate in Escherichia coli at a dramatically faster rate than phage bearing the wild-type Shine-Dalgarno sequence. We now report 27 additional fast-propagating clones displaying 24 different peptides and carrying 14 unique mutations. Most of these mutations are found either in or upstream of the gene II Shine-Dalgarno sequence, but still within the mRNA transcript of gene II. All 27 clones propagate at significantly higher rates than normal library phage, most within experimental error of wild-type M13 propagation, suggesting that mutations arise to compensate for the reduced virulence caused by the insertion of a lacZα cassette proximal to the replication origin of the phage used to construct the library. We also describe an efficient and convenient assay to diagnose propagation-related TUPS among peptide sequences selected by phage display. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Bardy, Fabrice; Dillon, Harvey; Van Dun, Bram
2014-04-01
Rapid presentation of stimuli in an evoked response paradigm can lead to overlap of multiple responses and consequently difficulties interpreting waveform morphology. This paper presents a deconvolution method allowing overlapping multiple responses to be disentangled. The deconvolution technique uses a least-squared error approach. A methodology is proposed to optimize the stimulus sequence associated with the deconvolution technique under low-jitter conditions. It controls the condition number of the matrices involved in recovering the responses. Simulations were performed using the proposed deconvolution technique. Multiple overlapping responses can be recovered perfectly in noiseless conditions. In the presence of noise, the amount of error introduced by the technique can be controlled a priori by the condition number of the matrix associated with the used stimulus sequence. The simulation results indicate the need for a minimum amount of jitter, as well as a sufficient number of overlap combinations to obtain optimum results. An aperiodic model is recommended to improve reconstruction. We propose a deconvolution technique allowing multiple overlapping responses to be extracted and a method of choosing the stimulus sequence optimal for response recovery. This technique may allow audiologists, psychologists, and electrophysiologists to optimize their experimental designs involving rapidly presented stimuli, and to recover evoked overlapping responses. Copyright © 2013 International Federation of Clinical Neurophysiology. All rights reserved.
NASA Astrophysics Data System (ADS)
Manu, V. S.; Veglia, Gianluigi
2016-12-01
Identity operation in the form of π pulses is widely used in NMR spectroscopy. For an isolated single spin system, a sequence of even number of π pulses performs an identity operation, leaving the spin state essentially unaltered. For multi-spin systems, trains of π pulses with appropriate phases and time delays modulate the spin Hamiltonian to perform operations such as decoupling and recoupling. However, experimental imperfections often jeopardize the outcome, leading to severe losses in sensitivity. Here, we demonstrate that a newly designed Genetic Algorithm (GA) is able to optimize a train of π pulses, resulting in a robust identity operation. As proof-of-concept, we optimized the recoupling sequence in the transferred-echo double-resonance (TEDOR) pulse sequence, a key experiment in biological magic angle spinning (MAS) solid-state NMR for measuring multiple carbon-nitrogen distances. The GA modified TEDOR (GMO-TEDOR) experiment with improved recoupling efficiency results in a net gain of sensitivity up to 28% as tested on a uniformly 13C, 15N labeled microcrystalline ubiquitin sample. The robust identity operation achieved via GA paves the way for the optimization of several other pulse sequences used for both solid- and liquid-state NMR used for decoupling, recoupling, and relaxation experiments.
2012-01-01
Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Integrated design, execution, and analysis of arrayed and pooled CRISPR genome-editing experiments.
Canver, Matthew C; Haeussler, Maximilian; Bauer, Daniel E; Orkin, Stuart H; Sanjana, Neville E; Shalem, Ophir; Yuan, Guo-Cheng; Zhang, Feng; Concordet, Jean-Paul; Pinello, Luca
2018-05-01
CRISPR (clustered regularly interspaced short palindromic repeats) genome-editing experiments offer enormous potential for the evaluation of genomic loci using arrayed single guide RNAs (sgRNAs) or pooled sgRNA libraries. Numerous computational tools are available to help design sgRNAs with optimal on-target efficiency and minimal off-target potential. In addition, computational tools have been developed to analyze deep-sequencing data resulting from genome-editing experiments. However, these tools are typically developed in isolation and oftentimes are not readily translatable into laboratory-based experiments. Here, we present a protocol that describes in detail both the computational and benchtop implementation of an arrayed and/or pooled CRISPR genome-editing experiment. This protocol provides instructions for sgRNA design with CRISPOR (computational tool for the design, evaluation, and cloning of sgRNA sequences), experimental implementation, and analysis of the resulting high-throughput sequencing data with CRISPResso (computational tool for analysis of genome-editing outcomes from deep-sequencing data). This protocol allows for design and execution of arrayed and pooled CRISPR experiments in 4-5 weeks by non-experts, as well as computational data analysis that can be performed in 1-2 d by both computational and noncomputational biologists alike using web-based and/or command-line versions.
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.
Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt
2008-07-01
MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
Zhang, Pin; Liang, Yanmei; Chang, Shengjiang; Fan, Hailun
2013-08-01
Accurate segmentation of renal tissues in abdominal computed tomography (CT) image sequences is an indispensable step for computer-aided diagnosis and pathology detection in clinical applications. In this study, the goal is to develop a radiology tool to extract renal tissues in CT sequences for the management of renal diagnosis and treatments. In this paper, the authors propose a new graph-cuts-based active contours model with an adaptive width of narrow band for kidney extraction in CT image sequences. Based on graph cuts and contextual continuity, the segmentation is carried out slice-by-slice. In the first stage, the middle two adjacent slices in a CT sequence are segmented interactively based on the graph cuts approach. Subsequently, the deformable contour evolves toward the renal boundaries by the proposed model for the kidney extraction of the remaining slices. In this model, the energy function combining boundary with regional information is optimized in the constructed graph and the adaptive search range is determined by contextual continuity and the object size. In addition, in order to reduce the complexity of the min-cut computation, the nodes in the graph only have n-links for fewer edges. The total 30 CT images sequences with normal and pathological renal tissues are used to evaluate the accuracy and effectiveness of our method. The experimental results reveal that the average dice similarity coefficient of these image sequences is from 92.37% to 95.71% and the corresponding standard deviation for each dataset is from 2.18% to 3.87%. In addition, the average automatic segmentation time for one kidney in each slice is about 0.36 s. Integrating the graph-cuts-based active contours model with contextual continuity, the algorithm takes advantages of energy minimization and the characteristics of image sequences. The proposed method achieves effective results for kidney segmentation in CT sequences.
Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J
2015-09-15
As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Liu, Yu-Cheng; Yang, Meng-Han; Lin, Win-Li; Huang, Chien-Kang; Oyang, Yen-Jen
2009-12-03
Proteins are dynamic macromolecules which may undergo conformational transitions upon changes in environment. As it has been observed in laboratories that protein flexibility is correlated to essential biological functions, scientists have been designing various types of predictors for identifying structurally flexible regions in proteins. In this respect, there are two major categories of predictors. One category of predictors attempts to identify conformationally flexible regions through analysis of protein tertiary structures. Another category of predictors works completely based on analysis of the polypeptide sequences. As the availability of protein tertiary structures is generally limited, the design of predictors that work completely based on sequence information is crucial for advances of molecular biology research. In this article, we propose a novel approach to design a sequence-based predictor for identifying conformationally ambivalent regions in proteins. The novelty in the design stems from incorporating two classifiers based on two distinctive supervised learning algorithms that provide complementary prediction powers. Experimental results show that the overall performance delivered by the hybrid predictor proposed in this article is superior to the performance delivered by the existing predictors. Furthermore, the case study presented in this article demonstrates that the proposed hybrid predictor is capable of providing the biologists with valuable clues about the functional sites in a protein chain. The proposed hybrid predictor provides the users with two optional modes, namely, the high-sensitivity mode and the high-specificity mode. The experimental results with an independent testing data set show that the proposed hybrid predictor is capable of delivering sensitivity of 0.710 and specificity of 0.608 under the high-sensitivity mode, while delivering sensitivity of 0.451 and specificity of 0.787 under the high-specificity mode. Though experimental results show that the hybrid approach designed to exploit the complementary prediction powers of distinctive supervised learning algorithms works more effectively than conventional approaches, there exists a large room for further improvement with respect to the achieved performance. In this respect, it is of interest to investigate the effects of exploiting additional physiochemical properties that are related to conformational ambivalence. Furthermore, it is of interest to investigate the effects of incorporating lately-developed machine learning approaches, e.g. the random forest design and the multi-stage design. As conformational transition plays a key role in carrying out several essential types of biological functions, the design of more advanced predictors for identifying conformationally ambivalent regions in proteins deserves our continuous attention.
DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.
Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M
2007-01-01
DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.
Protein subcellular localization prediction using artificial intelligence technology.
Nair, Rajesh; Rost, Burkhard
2008-01-01
Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.
Du, Q S; Ma, Y; Xie, N Z; Huang, R B
2014-01-01
In the design of peptide inhibitors the huge possible variety of the peptide sequences is of high concern. In collaboration with the fast accumulation of the peptide experimental data and database, a statistical method is suggested for peptide inhibitor design. In the two-level peptide prediction network (2L-QSAR) one level is the physicochemical properties of amino acids and the other level is the peptide sequence position. The activity contributions of amino acids are the functions of physicochemical properties and the sequence positions. In the prediction equation two weight coefficient sets {ak} and {bl} are assigned to the physicochemical properties and to the sequence positions, respectively. After the two coefficient sets are optimized based on the experimental data of known peptide inhibitors using the iterative double least square (IDLS) procedure, the coefficients are used to evaluate the bioactivities of new designed peptide inhibitors. The two-level prediction network can be applied to the peptide inhibitor design that may aim for different target proteins, or different positions of a protein. A notable advantage of the two-level statistical algorithm is that there is no need for host protein structural information. It may also provide useful insight into the amino acid properties and the roles of sequence positions.
Solanki, Prem K; Rabin, Yoed
2018-01-01
This study presents experimental results and an analysis approach for polarized light effects associated with thermomechanical stress during cooling of glass promoting solutions, with applications to cryopreservation and tissue banking in a process known as vitrification. Polarized light means have been previously integrated into the cryomacroscope-a visualization device to detect physical effects associated with cryopreservation success, such as crystallization, fracture formation, and contamination. The experimental study concerns vitrification in a cuvette, which is a rectangular container. Polarized light modeling in the cuvette is based on subdividing the tridimensional (3D) domain into a series of planar (2D) problems, for which a mathematical solution is available in the literature. The current analysis is based on tracking the accumulated changes in light polarization and magnitude, as it passes through the sequence of planar problems. Results of this study show qualitative agreement in light intensity history and distribution between experimental data and simulated results. The simulated results help explaining differences between 2D and 3D effects in photoelasticity, most notably, the counterintuitive observation that high stress areas may correlate with low light intensity regions based on the particular experimental conditions. Finally, it is suggested that polarized-light analysis must always be accompanied by thermomechanical stress modeling in order to explain 3D effects.
2018-01-01
This study presents experimental results and an analysis approach for polarized light effects associated with thermomechanical stress during cooling of glass promoting solutions, with applications to cryopreservation and tissue banking in a process known as vitrification. Polarized light means have been previously integrated into the cryomacroscope—a visualization device to detect physical effects associated with cryopreservation success, such as crystallization, fracture formation, and contamination. The experimental study concerns vitrification in a cuvette, which is a rectangular container. Polarized light modeling in the cuvette is based on subdividing the tridimensional (3D) domain into a series of planar (2D) problems, for which a mathematical solution is available in the literature. The current analysis is based on tracking the accumulated changes in light polarization and magnitude, as it passes through the sequence of planar problems. Results of this study show qualitative agreement in light intensity history and distribution between experimental data and simulated results. The simulated results help explaining differences between 2D and 3D effects in photoelasticity, most notably, the counterintuitive observation that high stress areas may correlate with low light intensity regions based on the particular experimental conditions. Finally, it is suggested that polarized-light analysis must always be accompanied by thermomechanical stress modeling in order to explain 3D effects. PMID:29912973
deFUME: Dynamic exploration of functional metagenomic sequencing data.
van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper; Malla, Sailesh; Sommer, Morten Otto Alexander
2015-07-31
Functional metagenomic selections represent a powerful technique that is widely applied for identification of novel genes from complex metagenomic sources. However, whereas hundreds to thousands of clones can be easily generated and sequenced over a few days of experiments, analyzing the data is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence to a comprehensive visual data overview that facilitates effortless inspection of gene function, clustering and distribution. The webserver is available at cbs.dtu.dk/services/deFUME/and the source code is distributed at github.com/EvdH0/deFUME.
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas
2014-01-01
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727
Szeinbaum, Nadia; Kellum, Cailin E; Glass, Jennifer B; Janda, J Michael; DiChristina, Thomas J
2018-04-01
Previously, experimental DNA-DNA hybridization (DDH) between Shewanellahaliotis JCM 14758 T and Shewanellaalgae JCM 21037 T had suggested that the two strains could be considered different species, despite minimal phenotypic differences. The recent isolation of Shewanella sp. MN-01, with 99 % 16S rRNA gene identity to S. algae and S. haliotis, revealed a potential taxonomic problem between these two species. In this study, we reassessed the nomenclature of S. haliotis and S. algae using available whole-genome sequences. The whole-genome sequence of S. haliotis JCM 14758 T and ten S. algae strains showed ≥97.7 % average nucleotide identity and >78.9 % digital DDH, clearly above the recommended species thresholds. According to the rules of priority and in view of the results obtained, S. haliotis is to be considered a later heterotypic synonym of S. algae. Because the whole-genome sequence of Shewanella sp. strain MN-01 shares >99 % ANI with S. algae JCM 14758 T , it can be confidently identified as S. algae.
Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.
Stickney, A L; Dunowska, M; Cave, N J
2013-06-08
The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.
Myohara, Maroko; Niva, Cintia Carla; Lee, Jae Min
2006-08-01
To identify genes specifically activated during annelid regeneration, suppression subtractive hybridization was performed with cDNAs from regenerating and intact Enchytraeus japonensis, a terrestrial oligochaete that can regenerate a complete organism from small body fragments within 4-5 days. Filter array screening subsequently revealed that about 38% of the forward-subtracted cDNA clones contained genes that were upregulated during regeneration. Two hundred seventy-nine of these clones were sequenced and found to contain 165 different sequences (79 known and 86 unknown). Nine clones were fully sequenced and four of these sequences were matched to known genes for glutamine synthetase, glucosidase 1, retinal protein 4, and phosphoribosylaminoimidazole carboxylase, respectively. The remaining five clones encoded an unknown open-reading frame. The expression levels of these genes were highest during blastema formation. Our present results, therefore, demonstrate the great potential of annelids as a new experimental subject for the exploration of unknown genes that play critical roles in animal regeneration.
Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins
NASA Astrophysics Data System (ADS)
Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil
2014-03-01
Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.
Exploiting three kinds of interface propensities to identify protein binding sites.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2009-08-01
Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.
Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri
2016-01-01
Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774
Predicting the binding preference of transcription factors to individual DNA k-mers.
Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R
2009-04-15
Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
Boosting antibody developability through rational sequence optimization.
Seeliger, Daniel; Schulz, Patrick; Litzenburger, Tobias; Spitz, Julia; Hoerer, Stefan; Blech, Michaela; Enenkel, Barbara; Studts, Joey M; Garidel, Patrick; Karow, Anne R
2015-01-01
The application of monoclonal antibodies as commercial therapeutics poses substantial demands on stability and properties of an antibody. Therapeutic molecules that exhibit favorable properties increase the success rate in development. However, it is not yet fully understood how the protein sequences of an antibody translates into favorable in vitro molecule properties. In this work, computational design strategies based on heuristic sequence analysis were used to systematically modify an antibody that exhibited a tendency to precipitation in vitro. The resulting series of closely related antibodies showed improved stability as assessed by biophysical methods and long-term stability experiments. As a notable observation, expression levels also improved in comparison with the wild-type candidate. The methods employed to optimize the protein sequences, as well as the biophysical data used to determine the effect on stability under conditions commonly used in the formulation of therapeutic proteins, are described. Together, the experimental and computational data led to consistent conclusions regarding the effect of the introduced mutations. Our approach exemplifies how computational methods can be used to guide antibody optimization for increased stability.
Chen, Dana; Orenstein, Yaron; Golodnitsky, Rada; Pellach, Michal; Avrahami, Dorit; Wachtel, Chaim; Ovadia-Shochat, Avital; Shir-Shapira, Hila; Kedmi, Adi; Juven-Gershon, Tamar; Shamir, Ron; Gerber, Doron
2016-01-01
Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression. PMID:27628341
NASA Astrophysics Data System (ADS)
Papanicolaou, G. C.; Pappa, E. J.; Portan, D. V.; Kotrotsos, A.; Kollia, E.
2018-02-01
The aim of the present investigation was to study the effect of both the stacking sequence and surface treatment on the thermal conductivity of multilayered hybrid nano-composites. Four types of multilayered hybrid nanocomposites were manufactured and tested: Nitinol- CNTs (carbon nanotubes)- Acrylic resin; Nitinol- Acrylic resin- CNTs; Surface treated Nitinol- CNTs- Acrylic resin and Surface treated Nitinol- Acrylic resin- CNTs. Surface treatment of Nitinol plies was realized by means of the electrochemical anodization. Surface topography of the anodized nitinol sheets was investigated through Scanning Electron Microscopy (SEM). It was found that the overall thermal response of the manufactured multilayered nano-composites was greatly influenced by both the anodization and the stacking sequence. A theoretical model for the prediction of the overall thermal conductivity has been developed considering the nature of the different layers, their stacking sequence as well as the interfacial thermal resistance. Thermal conductivity and Differential Scanning Calorimetry (DSC) measurements were conducted, to verify the predicted by the model overall thermal conductivities. In all cases, a good agreement between theoretical predictions and experimental results was found.
Effect of stacking sequence on mechanical properties neem wood veneer plastic composites
NASA Astrophysics Data System (ADS)
Nagamadhu, M.; Kumar, G. C. Mohan; Jeyaraj, P.
2018-04-01
This study investigates the effect of wood veneer stacking sequence on mechanical properties of neem wood polymer composite (WPC) experimentally. Wood laminated samples were fabricated by conventional hand layup technique in a mold and cured under pressure at room temperature and then post cured at elevated temperature. Initially, the tensile, flexural, and impact test were conducted to understand the effect of weight fraction of fiber on mechanical properties. The mechanical properties have increased with the weight fraction of fiber. Moreover the stacking sequence of neem wood plays an important role. As it has a significant impact on the mechanical properties. The results indicated that 0°/0° WPC shows highest mechanical properties as compared to other sequences (90°/90°, 0°/90°, 45°/90°, 45°/45°). The Fourier Transform Infrared Spectroscopy (FTIR) Analysis were carried out to identify chemical compounds both in raw neem wood and neem wood epoxy composite. The microstructure raw/neat neem wood and the interfacial bonding characteristics of neem wood composite investigated using Scanning electron microscopy images.
NASA Astrophysics Data System (ADS)
Bonanno, A.; Bozzo, G.; Sapia, P.
2017-11-01
In this work, we present a coherent sequence of experiments on electromagnetic (EM) induction and eddy currents, appropriate for university undergraduate students, based on a magnet falling through a drilled aluminum disk. The sequence, leveraging on the didactical interplay between the EM and mechanical aspects of the experiments, allows us to exploit the students’ awareness of mechanics to elicit their comprehension of EM phenomena. The proposed experiments feature two kinds of measurements: (i) kinematic measurements (performed by means of high-speed video analysis) give information on the system’s kinematics and, via appropriate numerical data processing, allow us to get dynamic information, in particular on energy dissipation; (ii) induced electromagnetic field (EMF) measurements (by using a homemade multi-coil sensor connected to a cheap data acquisition system) allow us to quantitatively determine the inductive effects of the moving magnet on its neighborhood. The comparison between experimental results and the predictions from an appropriate theoretical model (of the dissipative coupling between the moving magnet and the conducting disk) offers many educational hints on relevant topics related to EM induction, such as Maxwell’s displacement current, magnetic field flux variation, and the conceptual link between induced EMF and induced currents. Moreover, the didactical activity gives students the opportunity to be trained in video analysis, data acquisition and numerical data processing.
Jeon, Junhyun; Choi, Jaeyoung; Lee, Gir-Won; Dean, Ralph A; Lee, Yong-Hwan
2013-01-01
Knowledge on mutation processes is central to interpreting genetic analysis data as well as understanding the underlying nature of almost all evolutionary phenomena. However, studies on genome-wide mutational spectrum and dynamics in fungal pathogens are scarce, hindering our understanding of their evolution and biology. Here, we explored changes in the phenotypes and genome sequences of the rice blast fungus Magnaporthe oryzae during the forced in vitro evolution by weekly transfer of cultures on artificial media. Through combination of experimental evolution with high throughput sequencing technology, we found that mutations accumulate rapidly prior to visible phenotypic changes and that both genetic drift and selection seem to contribute to shaping mutational landscape, suggesting the buffering capacity of fungal genome against mutations. Inference of mutational effects on phenotypes through the use of T-DNA insertion mutants suggested that at least some of the DNA sequence mutations are likely associated with the observed phenotypic changes. Furthermore, our data suggest oxidative damages and UV as major sources of mutation during subcultures. Taken together, our work revealed important properties of original source of variation in the genome of the rice blast fungus. We believe that these results provide not only insights into stability of pathogenicity and genome evolution in plant pathogenic fungi but also a model in which evolution of fungal pathogens in natura can be comparatively investigated.
Evaluation of pleural and pericardial effusions by magnetic resonance imaging.
Tscholakoff, D; Sechtem, U; de Geer, G; Schmidt, H; Higgins, C B
1987-08-01
MR examinations of 36 patients with pleural and/or pericardial effusions were retrospectively evaluated. The purpose of this study was to determine of MR imaging is capable of differentiating between pleural and pericardial effusions of different compositions using standard electrocardiogram (ECG)-gated and non-gated spin echo pulse sequences. Additional data was obtained from experimental pleural effusions in 10 dogs. The results of this study indicate that old hemorrhages into the pleural or pericardial space can be differentiated from other pleural or pericardial effusions. However, further differentiation between transudates, exudates and sanguinous effusions is not possible on MR images acquired with standard spin echo pulse sequences. Respiratory and cardiac motion are responsible for signal loss, particularly on first echo images. This was documented in experiments in dogs with induced effusions of known composition; "negative" T2 values consistent with fluid motion during imaging sequences were observed in 80% of cases. However, postmortem studies of the dogs with experimental effusions showed differences between effusions with low protein concentrations and higher protein concentrations. We conclude from our study that characterization of pleural and pericardial effusions on standard ECG-gated and non-gated MR examinations is limited to the positive identification of hemorrhage. Motion of the fluid due to cardiac and respiratory activity causes artifactual and unpredictable changes in intensity values negating the more subtle differences in intensity associated with increasing protein content.
Carl, Michael; Bydder, Graeme M; Du, Jiang
2016-08-01
The long repetition time and inversion time with inversion recovery preparation ultrashort echo time (UTE) often causes prohibitively long scan times. We present an optimized method for long T2 signal suppression in which several k-space spokes are acquired after each inversion preparation. Using Bloch equations the sequence parameters such as TI and flip angle were optimized to suppress the long T2 water and fat signals and to maximize short T2 contrast. Volunteer imaging was performed on a healthy male volunteer. Inversion recovery preparation was performed using a Silver-Hoult adiabatic inversion pulse together with a three-dimensional (3D) UTE (3D Cones) acquisition. The theoretical signal curves generally agreed with the experimentally measured region of interest curves. The multispoke inversion recovery method showed good muscle and fatty bone marrow suppression, and highlighted short T2 signals such as these from the femoral and tibial cortex. Inversion recovery 3D UTE imaging with multiple spoke acquisitions can be used to effectively suppress long T2 signals and highlight short T2 signals within clinical scan times. Theoretical modeling can be used to determine sequence parameters to optimize long T2 signal suppression and maximize short T2 signals. Experimental results on a volunteer confirmed the theoretical predictions. Magn Reson Med 76:577-582, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.
Michnick, S W; Shakhnovich, E
1998-01-01
Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Chien -Chi; Chain, Patrick S. G.
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Colwell, Rita
2018-05-14
Rita Colwell on "Experimental Reservoirs of Human Pathogens: The Vibrio cholerae paradigm" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Colwell, Rita
Rita Colwell on "Experimental Reservoirs of Human Pathogens: The Vibrio cholerae paradigm" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
High-pressure structural study of MnF 2
Stavrou, Elissaios; Yao, Yansun; Goncharov, Alexander F.; ...
2015-02-01
In this study, manganese fluoride (MnF 2) with the tetragonal rutile-type structure has been studied using a synchrotron angle-dispersive powder x-ray diffraction and Raman spectroscopy in a diamond anvil cell up to 60 GPa at room temperature combined with first-principles density functional calculations. The experimental data reveal two pressure-induced structural phase transitions with the following sequence: rutile → SrI 2 type (3 GPa)→ α–PbCl 2 type (13 GPa). Complete structural information, including interatomic distances, has been determined in the case of MnF 2 including the exact structure of the debated first high-pressure phase. First-principles density functional calculations confirm this phasemore » transition sequence, and the two calculated transition pressures are in excellent agreement with the experiment. Lattice dynamics calculations also reproduce the experimental Raman spectra measured for the ambient and high-pressure phases. The results are discussed in line with the possible practical use of rutile-type fluorides in general and specifically MnF 2 as a model compound to reveal the HP structural behavior of rutile-type SiO 2 (Stishovite).« less
A fast bilinear structure from motion algorithm using a video sequence and inertial sensors.
Ramachandran, Mahesh; Veeraraghavan, Ashok; Chellappa, Rama
2011-01-01
In this paper, we study the benefits of the availability of a specific form of additional information—the vertical direction (gravity) and the height of the camera, both of which can be conveniently measured using inertial sensors and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The SfM algorithm developed in this paper is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research data set.
Brizuela, Leonardo; Richardson, Aaron; Marsischky, Gerald; Labaer, Joshua
2002-01-01
Thanks to the results of the multiple completed and ongoing genome sequencing projects and to the newly available recombination-based cloning techniques, it is now possible to build gene repositories with no precedent in their composition, formatting, and potential. This new type of gene repository is necessary to address the challenges imposed by the post-genomic era, i.e., experimentation on a genome-wide scale. We are building the FLEXGene (Full Length EXpression-ready) repository. This unique resource will contain clones representing the complete ORFeome of different organisms, including Homo sapiens as well as several pathogens and model organisms. It will consist of a comprehensive, characterized (sequence-verified), and arrayed gene repository. This resource will allow full exploitation of the genomic information by enabling genome-wide scale experimentation at the level of functional/phenotypic assays as well as at the level of protein expression, purification, and analysis. Here we describe the rationale and construction of this resource and focus on the data obtained from the Saccharomyces cerevisiae project.
Vozárová, Z; Kamencayová, M; Glasa, M; Subr, Z
2013-01-01
Plum pox virus (PPV) isolates of the strain PPV-M prevalently infect peaches under natural conditions in Middle Europe. Comparison of complete genome sequences obtained from subisolates of a PPV-M isolate maintained experimentally over a 6-year period in different Prunus host species and passaged in Nicotiana benthamiana was performed with the aim to highlight the mutations potentially connected with the virus-host adaptation. The results showed that the lowest number of non-silent mutations was accumulated in PPV-M maintained in peach (original host species), approximately two times higher diversity was recorded in plum, apricot and N. benthamiana, indicating the genetic determination of the PPV host preference. The sequence variability of Prunus subisolates was distributed more or less evenly along the PPV genome and no amino acid motif could be outlined as responsible for the host adaptation. In N. benthamiana the mutations were accumulated notably in the P1 and P3 genes indicating their non-essentiality in the infection of this experimental host plant.