large sequence-dependent effects: Topics by Science.gov

Sample records for large sequence-dependent effects

Hybrid Pareto artificial bee colony algorithm for multi-objective single machine group scheduling problem with sequence-dependent setup times and learning effects.

PubMed

Yue, Lei; Guan, Zailin; Saif, Ullah; Zhang, Fei; Wang, Hao

2016-01-01

Group scheduling is significant for efficient and cost effective production system. However, there exist setup times between the groups, which require to decrease it by sequencing groups in an efficient way. Current research is focused on a sequence dependent group scheduling problem with an aim to minimize the makespan in addition to minimize the total weighted tardiness simultaneously. In most of the production scheduling problems, the processing time of jobs is assumed as fixed. However, the actual processing time of jobs may be reduced due to "learning effect". The integration of sequence dependent group scheduling problem with learning effects has been rarely considered in literature. Therefore, current research considers a single machine group scheduling problem with sequence dependent setup times and learning effects simultaneously. A novel hybrid Pareto artificial bee colony algorithm (HPABC) with some steps of genetic algorithm is proposed for current problem to get Pareto solutions. Furthermore, five different sizes of test problems (small, small medium, medium, large medium, large) are tested using proposed HPABC. Taguchi method is used to tune the effective parameters of the proposed HPABC for each problem category. The performance of HPABC is compared with three famous multi objective optimization algorithms, improved strength Pareto evolutionary algorithm (SPEA2), non-dominated sorting genetic algorithm II (NSGAII) and particle swarm optimization algorithm (PSO). Results indicate that HPABC outperforms SPEA2, NSGAII and PSO and gives better Pareto optimal solutions in terms of diversity and quality for almost all the instances of the different sizes of problems.
Toxic plants: Effects on reproduction and fetal and embryonic development in livestock

USDA-ARS?s Scientific Manuscript database

Reproductive success is dependent on a large number of carefully orchestrated biological events that must occur in a specifically timed sequence. The interference with one of more of these sequences or events may result in total reproductive failure or a more subtle reduction in reproductive potent...
Correlation of Local Effects of DNA Sequence and Position of Beta-Alanine Inserts with Polyamide-DNA Complex Binding Affinities and Kinetics

PubMed Central

Wang, Shuo; Nanjunda, Rupesh; Aston, Karl; Bashkin, James K.; Wilson, W. David

2012-01-01

In order to better understand the effects of β-alanine (β) substitution and the number of heterocycles on DNA binding affinity and selectivity, the interactions of an eight-ring hairpin polyamide (PA) and two β derivatives as well as a six-heterocycle analog have been investigated with their cognate DNA sequence, 5′-TGGCTT-3′. Binding selectivity and the effects of β have been investigated with the cognate and five mutant DNAs. A set of powerful and complementary methods have been employed for both energetic and structural evaluations: UV-melting, biosensor-surface plasmon resonance, isothermal titration calorimetry, circular dichroism and a DNA ligation ladder global structure assay. The reduced number of heterocycles in the six-ring PA weakens the binding affinity; however, the smaller PA aggregates significantly less than the larger PAs, and allows us to obtain the binding thermodynamics. The PA-DNA binding enthalpy is large and negative with a large negative ΔCp, and is the primary driving component of the Gibbs free energy. The complete SPR binding results clearly show that β substitutions can substantially weaken the binding affinity of hairpin PAs in a position-dependent manner. More importantly, the changes in PA binding to the mutant DNAs further confirm the position-dependent effects on PA-DNA interaction affinity. Comparison of mutant DNA sequences also shows a different effect in recognition of T•A versus A•T base pairs. The effects of DNA mutations on binding of a single PA as well as the effects of the position of β substitution on binding tell a clear and very important story about sequence dependent binding of PAs to DNA. PMID:23167504
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules.

PubMed

Li, Yueqi; Xiang, Limin; Palma, Julio L; Asai, Yoshihiro; Tao, Nongjian

2016-04-15

Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules

PubMed Central

Li, Yueqi; Xiang, Limin; Palma, Julio L.; Asai, Yoshihiro; Tao, Nongjian

2016-01-01

Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models. PMID:27079152
A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM).

PubMed

Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

2013-11-20

With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.
Utilization of RNA polymerase I promoter and terminator sequences to develop a DNA transfection system for the study of hepatitis C virus internal ribosomal entry site-dependent translation.

PubMed

Oem, Jae-Ku; Xiang, Zhonghua; Zhou, Yan; Babiuk, Lorne A; Liu, Qiang

2007-09-01

Hepatitis C virus (HCV) causes severe liver diseases in a large population worldwide. HCV protein translation is controlled by an internal ribosomal entry site (IRES) within the 5'-untranslated region (UTR). HCV IRES-dependent translation is critical for HCV-associated pathogenesis. To develop a plasmid DNA transfection system by using RNA polymerase I promoter and terminator sequences for studying HCV IRES-dependent translation. A gene cassette containing HCV 5'-UTR, Renilla luciferase reporter gene, and HCV 3'-UTR was inserted between RNA polymerase I promoter and terminator sequences. HCV IRES-directed translation was determined by luciferase assay after transfection. Transfection of the RNA polymerase I-HCV IRES plasmid into human hepatoma Huh-7 and HepG2 cells resulted in luciferase gene expression. Deletion of the IIIf domain in HCV IRES dramatically reduced luciferase activity. Our results indicated that the plasmid vector system-based on RNA polymerase I promoter and terminator sequences represents an effective approach for the study of HCV IRES-dependent translation.
A generic, cost-effective, and scalable cell lineage analysis platform

PubMed Central

Biezuner, Tamir; Spiro, Adam; Raz, Ofir; Amir, Shiran; Milo, Lilach; Adar, Rivka; Chapal-Ilani, Noa; Berman, Veronika; Fried, Yael; Ainbinder, Elena; Cohen, Galit; Barr, Haim M.; Halaban, Ruth; Shapiro, Ehud

2016-01-01

Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery. PMID:27558250
A sequence-dependent rigid-base model of DNA

NASA Astrophysics Data System (ADS)

Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.

2013-02-01

A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.
A sequence-dependent rigid-base model of DNA.

PubMed

Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

2013-02-07

A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.
Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling.

PubMed

Tao, Ran; Zeng, Donglin; Franceschini, Nora; North, Kari E; Boerwinkle, Eric; Lin, Dan-Yu

2015-06-01

High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.
Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

PubMed

Shteingart, Hanan; Loewenstein, Yonatan

2016-01-01

There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.
Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

PubMed Central

Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro

2014-01-01

The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631
Reducing DNA context dependence in bacterial promoters

PubMed Central

Carr, Swati B.; Densmore, Douglas M.

2017-01-01

Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio. PMID:28422998
Effect of Temporal Pattern of Radiation in Intensity Modulated Radiotherapy on Cell Cycle Progression and Apoptosis of ACHN Renal Cell Carcinoma Cell Line.

PubMed

Khorramizadeh, Maryam; Saberi, Alihossein; Tahmasebi-Birgani, Mohammadjavad; Shokrani, Parvaneh; Amouhedari, Alireza

The existence of a hypersensitive radiation response to doses below 1 Gy is well established for many normal and tumor cell lines. The aim of this study was to ascertain the impact of temporal pattern modeling IMRT on survival, cell cycle and apoptosis of human RCC cell line ACHN, so as to provide radiobiological basis for optimizing IMRT plans for this disease. The ACHN renal cell carcinoma cell line was used in this study. Impact of the triangle, V, small-large or large-small temporal patterns in the presence and absence of threshold dose of hyper-radiosensitivity at the beginning of patterns were studied using soft agarclonogenic assays. Cell cycle and apoptosis analysis were performed after irradiation with the temporal patterns. For triangle and small-large dose sequences, survival fraction was significantly reduced after irradiation with or without threshold dose of hyper-radiosensitivity at the beginning of the patterns. In all of the dose patterns, cell cycle distributions and the percentage of apoptotic cells at 24 h after irradiation with or without priming dose of hyper-radiosensitivity showed no significant difference. However, apoptotic cells were increased when beams with the smallest dose applied at the beginning of dose pattern like triangle and small-large dose sequence. These data show that the biologic effects of single fraction may differ in clinical settings depending on the size and sequence of the partial fractions. Doses at the beginning but not at the end of sequences may change cytotoxicity effects of radiation.
DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation

PubMed Central

Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob

2014-01-01

As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252
Generating intrinsically disordered protein conformational ensembles from a Markov chain

NASA Astrophysics Data System (ADS)

Cukier, Robert I.

2018-03-01

Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.
Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

PubMed Central

Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

2012-01-01

RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
The zero age main sequence of WIMP burners

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fairbairn, Malcolm; Scott, Pat; Edsjoe, Joakim

2008-02-15

We modify a stellar structure code to estimate the effect upon the main sequence of the accretion of weakly-interacting dark matter onto stars and its subsequent annihilation. The effect upon the stars depends upon whether the energy generation rate from dark matter annihilation is large enough to shut off the nuclear burning in the star. Main sequence weakly-interacting massive particles (WIMP) burners look much like proto-stars moving on the Hayashi track, although they are in principle completely stable. We make some brief comments about where such stars could be found, how they might be observed and more detailed simulations whichmore » are currently in progress. Finally we comment on whether or not it is possible to link the paradoxically hot, young stars found at the galactic center with WIMP burners.« less
Sequence-dependent DNA deformability studied using molecular dynamics simulations.

PubMed

Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

2007-01-01

Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads.

PubMed

Watson, Christopher M; Camm, Nick; Crinnion, Laura A; Clokie, Samuel; Robinson, Rachel L; Adlard, Julian; Charlton, Ruth; Markham, Alexander F; Carr, Ian M; Bonthron, David T

2017-12-01

Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.
Rhesus monkeys lack a consistent peak-end effect.

PubMed

Xu, Eric R; Knight, Emily J; Kralik, Jerald D

2011-12-01

In humans, the order of receiving sequential rewards can significantly influence the overall subjective utility of an outcome. For example, people subjectively rate receiving a large reward by itself significantly higher than receiving the same large reward followed by a smaller one (Do, Rupert, & Wolford, 2008). This result is called the peak-end effect. A comparative analysis of order effects can help determine the generality of such effects across primates, and we therefore examined the influence of reward-quality order on decision making in three rhesus macaque monkeys (Macaca mulatta). When given the choice between a high-low reward sequence and a low-high sequence, all three monkeys preferred receiving the high-value reward first. Follow-up experiments showed that for two of the three monkeys their choices depended specifically on reward-quality order and could not be accounted for by delay discounting. These results provide evidence for the influence of outcome order on decision making in rhesus monkeys. Unlike humans, who usually discount choices when a low-value reward comes last, rhesus monkeys show no such peak-end effect.
Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity

PubMed Central

Elias-Kirma, Shani; Nir, Ronit; Segal, Eran

2017-01-01

Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394
Engineering Environmentally-Stable Proteases to Specifically Neutralize Protein Toxins

DTIC Science & Technology

2013-10-01

acids. These sites constitute a variable environment, with the effect of mutations largely isolated to effects on interactions with the P4 side chain. 2...desires to cut. We observe, however, sequence-specific cleavage is much more subtle, depending upon how side chain interactions influence not only...first five substrate amino acids on the acyl side of the scissile bond (denoted P1 through P5, numbering from the scissile bond toward the N-terminus
Engineering Environmentally-Stable Proteases to Specifically Neutralize Protein Toxins

DTIC Science & Technology

2012-10-14

effect of mutations largely isolated to effects on interactions with the P4 side chain. 2) Most mutations at some sites (e.g. 126, 128) decrease...to cut. We observe, however, sequence-specific cleavage is much more subtle, depending upon how side chain interactions influence not only ground...five substrate amino acids on the acyl side of the scissile bond (denoted P1 through P5, numbering from the scissile bond toward the N-terminus of the
The Release 6 reference sequence of the Drosophila melanogaster genome

DOE PAGES

Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.; ...

2015-01-14

Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less
The Release 6 reference sequence of the Drosophila melanogaster genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.

Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

PubMed

Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

2012-01-01

RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.
Power law tails in phylogenetic systems.

PubMed

Qin, Chongli; Colwell, Lucy J

2018-01-23

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
Sequence-specific epigenetic effects of the maternal somatic genome on developmental rearrangements of the zygotic genome in Paramecium primaurelia.

PubMed Central

Meyer, E; Butler, A; Dubrana, K; Duharcourt, S; Caron, F

1997-01-01

In ciliates, the germ line genome is extensively rearranged during the development of the somatic macronucleus from a mitotic product of the zygotic nucleus. Germ line chromosomes are fragmented in specific regions, and a large number of internal sequence elements are eliminated. It was previously shown that transformation of the vegetative macronucleus of Paramecium primaurelia with a plasmid containing a subtelomeric surface antigen gene can affect the processing of the homologous germ line genomic region during development of a new macronucleus in sexual progeny of transformed clones. The gene and telomere-proximal flanking sequences are deleted from the new macronuclear genome, although the germ line genome remains wild type. Here we show that plasmids containing nonoverlapping segments of the same genomic region are able to induce similar terminal deletions; the locations of deletion end points depend on the particular sequence used. Transformation of the maternal macronucleus with a sequence internal to a macronuclear chromosome also causes the occurrence of internal deletions between short direct repeats composed of alternating thymines and adenines. The epigenetic influence of maternal macronuclear sequences on developmental rearrangements of the zygotic genome thus appears to be both sequence specific and general, suggesting that this trans-nucleus effect is mediated by pairing of homologous sequences. PMID:9199294
DNA Sequence-Dependent Ionic Currents in Ultra-Small Solid-State Nanopores†

PubMed Central

Comer, Jeffrey

2016-01-01

Measurements of ionic currents through nanopores partially blocked by DNA have emerged as a powerful method for characterization of the DNA nucleotide sequence. Although the effect of the nucleotide sequence on the nanopore blockade current has been experimentally demonstrated, prediction and interpretation of such measurements remain a formidable challenge. Using atomic resolution computational approaches, here we show how the sequence, molecular conformation, and pore geometry affect the blockade ionic current in model solid-state nanopores. We demonstrate that the blockade current from a DNA molecule is determined by the chemical identities and conformations of at least three consecutive nucleotides. We find the blockade currents produced by the nucleotide triplets to vary considerably with their nucleotide sequence despite having nearly identical molecular conformations. Encouragingly, we find blockade current differences as large as 25% for single-base substitutions in ultra small (1.6 nm × 1.1 nm cross section; 2 nm length) solid-state nanopores. Despite the complex dependence of the blockade current on the sequence and conformation of the DNA triplets, we find that, under many conditions, the number of thymine bases is positively correlated with the current, whereas the number of purine bases and the presence of both purine and pyrimidines in the triplet are negatively correlated with the current. Based on these observations, we construct a simple theoretical model that relates the ion current to the base content of a solid-state nanopore. Furthermore, we show that compact conformations of DNA in narrow pores provide the greatest signal-to-noise ratio for single base detection, whereas reduction of the nanopore length increases the ionic current noise. Thus, the sequence dependence of nanopore blockade current can be theoretically rationalized, although the predictions will likely need to be customized for each nanopore type. PMID:27103233
The sequence and de novo assembly of the giant panda genome

PubMed Central

Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

2013-01-01

Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
A Nonparametric Approach For Representing Interannual Dependence In Monthly Streamflow Sequences

NASA Astrophysics Data System (ADS)

Sharma, A.; Oneill, R.

The estimation of risks associated with water management plans requires generation of synthetic streamflow sequences. The mathematical algorithms used to generate these sequences at monthly time scales are found lacking in two main respects: inability in preserving dependence attributes particularly at large (seasonal to interannual) time lags; and, a poor representation of observed distributional characteristics, in partic- ular, representation of strong assymetry or multimodality in the probability density function. Proposed here is an alternative that naturally incorporates both observed de- pendence and distributional attributes in the generated sequences. Use of a nonpara- metric framework provides an effective means for representing the observed proba- bility distribution, while the use of a Svariable kernelT ensures accurate modeling of & cedil;streamflow data sets that contain a substantial number of zero flow values. A careful selection of prior flows imparts the appropriate short-term memory, while use of an SaggregateT flow variable allows representation of interannual dependence. The non- & cedil;parametric simulation model is applied to monthly flows from the Beaver River near Beaver, Utah, USA, and the Burrendong dam inflows, New South Wales, Australia. Results indicate that while the use of traditional simulation approaches leads to an inaccurate representation of dependence at long (annual and interannual) time scales, the proposed model can simulate both short and long-term dependence. As a result, the proposed model ensures a significantly improved representation of reservoir storage statistics, particularly for systems influenced by long droughts. It is important to note that the proposed method offers a simpler and better alternative to conventional dis- aggregation models as: (a) a separate annual flow series is not required, (b) stringent assumptions relating annual and monthly flows are not needed, and (c) the method does not require the specification of a "water year", instead ensuring that the sum of any sequence of flows lasting twelve months will result in the type of dependence that is observed in the historical annual flow series.
Cold shock protein YB-1 is involved in hypoxia-dependent gene transcription.

PubMed

Rauen, Thomas; Frye, Bjoern C; Wang, Jialin; Raffetseder, Ute; Alidousty, Christina; En-Nia, Abdelaziz; Floege, Jürgen; Mertens, Peter R

2016-09-16

Hypoxia-dependent gene regulation is largely orchestrated by hypoxia-inducible factors (HIFs), which associate with defined nucleotide sequences of hypoxia-responsive elements (HREs). Comparison of the regulatory HRE within the 3' enhancer of the human erythropoietin (EPO) gene with known binding motifs for cold shock protein Y-box (YB) protein-1 yielded strong similarities within the Y-box element and 3' adjacent sequences. DNA binding assays confirmed YB-1 binding to both, single- and double-stranded HRE templates. Under hypoxia, we observed nuclear shuttling of YB-1 and co-immunoprecipitation assays demonstrated that YB-1 and HIF-1α physically interact with each other. Cellular YB-1 depletion using siRNA significantly induced hypoxia-dependent EPO production at both, promoter and mRNA level. Vice versa, overexpressed YB-1 significantly reduced EPO-HRE-dependent gene transcription, whereas this effect was minor under normoxia. HIF-1α overexpression induced hypoxia-dependent gene transcription through the same element and accordingly, co-expression with YB-1 reduced HIF-1α-mediated EPO induction under hypoxic conditions. Taken together, we identified YB-1 as a novel binding factor for HREs that participates in fine-tuning of the hypoxia transcriptome. Copyright © 2016 Elsevier Inc. All rights reserved.
Stimulus-Dependent Flexibility in Non-Human Auditory Pitch Processing

ERIC Educational Resources Information Center

Bregman, Micah R.; Patel, Aniruddh D.; Gentner, Timothy Q.

2012-01-01

Songbirds and humans share many parallels in vocal learning and auditory sequence processing. However, the two groups differ notably in their abilities to recognize acoustic sequences shifted in absolute pitch (pitch height). Whereas humans maintain accurate recognition of words or melodies over large pitch height changes, songbirds are…
Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

PubMed

Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

2017-04-01

There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.
The Effects of CBI Lesson Sequence Type and Field Dependence on Learning from Computer-Based Cooperative Instruction in Web

ERIC Educational Resources Information Center

Ipek, Ismail

2010-01-01

The purpose of this study was to investigate the effects of CBI lesson sequence type and cognitive style of field dependence on learning from Computer-Based Cooperative Instruction (CBCI) in WEB on the dependent measures, achievement, reading comprehension and reading rate. Eighty-seven college undergraduate students were randomly assigned to…
A Maxwell Demon Model Connecting Information and Thermodynamics

NASA Astrophysics Data System (ADS)

Peng, Pei-Yan; Duan, Chang-Kui

2016-08-01

In the past decade several theoretical Maxwell's demon models have been proposed exhibiting effects such as refrigerating, doing work at the cost of information, and some experiments have been done to realise these effects. Here we propose a model with a two level demon, information represented by a sequence of bits, and two heat reservoirs. Which reservoir the demon interact with depends on the bit. If information is pure, one reservoir will be refrigerated, on the other hand, information can be erased if temperature difference is large. Genuine examples of such a system are discussed.
Does TATA matter? A structural exploration of the selectivity determinants in its complexes with TATA box-binding protein.

PubMed Central

Pastor, N; Pardo, L; Weinstein, H

1997-01-01

The binding of the TATA box-binding protein (TBP) to a TATA sequence in DNA is essential for eukaryotic basal transcription. TBP binds in the minor groove of DNA, causing a large distortion of the DNA helix. Given the apparent stereochemical equivalence of AT and TA basepairs in the minor groove, DNA deformability must play a significant role in binding site selection, because not all AT-rich sequences are bound effectively by TBP. To gain insight into the precise role that the properties of the TATA sequence have in determining the specificity of the DNA substrates of TBP, the solution structure and dynamics of seven DNA dodecamers have been studied by using molecular dynamics simulations. The analysis of the structural properties of basepair steps in these TATA sequences suggests a reason for the preference for alternating pyrimidine-purine (YR) sequences, but indicates that these properties cannot be the sole determinant of the sequence specificity of TBP. Rather, recognition depends on the interplay between the inherent deformability of the DNA and steric complementarity at the molecular interface. Images FIGURE 2 PMID:9251783
A highly efficient method for extracting next-generation sequencing quality RNA from adipose tissue of recalcitrant animal species.

PubMed

Sharma, Davinder; Golla, Naresh; Singh, Dheer; Onteru, Suneel K

2018-03-01

The next-generation sequencing (NGS) based RNA sequencing (RNA-Seq) and transcriptome profiling offers an opportunity to unveil complex biological processes. Successful RNA-Seq and transcriptome profiling requires a large amount of high-quality RNA. However, NGS-quality RNA isolation is extremely difficult from recalcitrant adipose tissue (AT) with high lipid content and low cell numbers. Further, the amount and biochemical composition of AT lipid varies depending upon the animal species which can pose different degree of resistance to RNA extraction. Currently available approaches may work effectively in one species but can be almost unproductive in another species. Herein, we report a two step protocol for the extraction of NGS quality RNA from AT across a broad range of animal species. © 2017 Wiley Periodicals, Inc.

RNA-ID, a highly sensitive and robust method to identify cis-regulatory sequences using superfolder GFP and a fluorescence-based assay.

PubMed

Dean, Kimberly M; Grayhack, Elizabeth J

2012-12-01

We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.
Prediction of Human Activity by Discovering Temporal Sequence Patterns.

PubMed

Li, Kang; Fu, Yun

2014-08-01

Early prediction of ongoing human activity has become more valuable in a large variety of time-critical applications. To build an effective representation for prediction, human activities can be characterized by a complex temporal composition of constituent simple actions and interacting objects. Different from early detection on short-duration simple actions, we propose a novel framework for long -duration complex activity prediction by discovering three key aspects of activity: Causality, Context-cue, and Predictability. The major contributions of our work include: (1) a general framework is proposed to systematically address the problem of complex activity prediction by mining temporal sequence patterns; (2) probabilistic suffix tree (PST) is introduced to model causal relationships between constituent actions, where both large and small order Markov dependencies between action units are captured; (3) the context-cue, especially interactive objects information, is modeled through sequential pattern mining (SPM), where a series of action and object co-occurrence are encoded as a complex symbolic sequence; (4) we also present a predictive accumulative function (PAF) to depict the predictability of each kind of activity. The effectiveness of our approach is evaluated on two experimental scenarios with two data sets for each: action-only prediction and context-aware prediction. Our method achieves superior performance for predicting global activity classes and local action units.
Spatial and Temporal Coordination of Bone Marrow-Derived Cell Activity During Arteriogenesis: Regulation of the Endogenous Response and Therapeutic Implications

PubMed Central

Meisner, Joshua K.; Price, Richard J.

2010-01-01

Arterial occlusive disease (AOD) is the leading cause of morbidity and mortality through the developed world, which creates a significant need for effective therapies to halt disease progression. Despite success of animal and small-scale human therapeutic arteriogenesis studies, this promising concept for treating AOD has yielded largely disappointing results in large-scale clinical trials. One reason for this lack of successful translation is that endogenous arteriogenesis is highly dependent on a poorly understood sequence of events and interactions between bone marrow derived cells (BMCs) and vascular cells, which makes designing effective therapies difficult. We contend that the process follows a complex, ordered sequence of events with multiple, specific BMC populations recruited at specific times and locations. Here we present the evidence suggesting roles for multiple BMC populations from neutrophils and mast cells to progenitor cells and propose how and where these cell populations fit within the sequence of events during arteriogenesis. Disruptions in these various BMC populations can impair the arteriogenesis process in patterns that characterize specific patient populations. We propose that an improved understanding of how arteriogenesis functions as a system can reveal individual BMC populations and functions that can be targeted for overcoming particular impairments in collateral vessel development. PMID:21044213
PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences.

PubMed

Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K

2014-01-01

In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in.
PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences

PubMed Central

Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K

2014-01-01

In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. Availability PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in PMID:24616564
Consolidating the effects of waking and sleep on motor-sequence learning.

PubMed

Brawn, Timothy P; Fenn, Kimberly M; Nusbaum, Howard C; Margoliash, Daniel

2010-10-20

Sleep is widely believed to play a critical role in memory consolidation. Sleep-dependent consolidation has been studied extensively in humans using an explicit motor-sequence learning paradigm. In this task, performance has been reported to remain stable across wakefulness and improve significantly after sleep, making motor-sequence learning the definitive example of sleep-dependent enhancement. Recent work, however, has shown that enhancement disappears when the task is modified to reduce task-related inhibition that develops over a training session, thus questioning whether sleep actively consolidates motor learning. Here we use the same motor-sequence task to demonstrate sleep-dependent consolidation for motor-sequence learning and explain the discrepancies in results across studies. We show that when training begins in the morning, motor-sequence performance deteriorates across wakefulness and recovers after sleep, whereas performance remains stable across both sleep and subsequent waking with evening training. This pattern of results challenges an influential model of memory consolidation defined by a time-dependent stabilization phase and a sleep-dependent enhancement phase. Moreover, the present results support a new account of the behavioral effects of waking and sleep on explicit motor-sequence learning that is consistent across a wide range of tasks. These observations indicate that current theories of memory consolidation that have been formulated to explain sleep-dependent performance enhancements are insufficient to explain the range of behavioral changes associated with sleep.
A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans.

PubMed

Timpson, Nicholas J; Walter, Klaudia; Min, Josine L; Tachmazidou, Ioanna; Malerba, Giovanni; Shin, So-Youn; Chen, Lu; Futema, Marta; Southam, Lorraine; Iotchkova, Valentina; Cocca, Massimiliano; Huang, Jie; Memari, Yasin; McCarthy, Shane; Danecek, Petr; Muddyman, Dawn; Mangino, Massimo; Menni, Cristina; Perry, John R B; Ring, Susan M; Gaye, Amadou; Dedoussis, George; Farmaki, Aliki-Eleni; Burton, Paul; Talmud, Philippa J; Gambaro, Giovanni; Spector, Tim D; Smith, George Davey; Durbin, Richard; Richards, J Brent; Humphries, Steve E; Zeggini, Eleftheria; Soranzo, Nicole

2014-09-16

The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (-1.43 s.d. (s.e.=0.27 per minor allele (P-value=8.0 × 10(-8))) discovered in 3,202 individuals with low read-depth, whole-genome sequence. We replicate this in 12,831 participants from five additional samples of Northern and Southern European origin (-1.0 s.d. (s.e.=0.173), P-value=7.32 × 10(-9)). This is consistent with an effect between 0.5 and 1.5 mmol l(-1) dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale.
Efficient engineering of chromosomal ribosome binding site libraries in mismatch repair proficient Escherichia coli.

PubMed

Oesterle, Sabine; Gerngross, Daniel; Schmitt, Steven; Roberts, Tania Michelle; Panke, Sven

2017-09-26

Multiplexed gene expression optimization via modulation of gene translation efficiency through ribosome binding site (RBS) engineering is a valuable approach for optimizing artificial properties in bacteria, ranging from genetic circuits to production pathways. Established algorithms design smart RBS-libraries based on a single partially-degenerate sequence that efficiently samples the entire space of translation initiation rates. However, the sequence space that is accessible when integrating the library by CRISPR/Cas9-based genome editing is severely restricted by DNA mismatch repair (MMR) systems. MMR efficiency depends on the type and length of the mismatch and thus effectively removes potential library members from the pool. Rather than working in MMR-deficient strains, which accumulate off-target mutations, or depending on temporary MMR inactivation, which requires additional steps, we eliminate this limitation by developing a pre-selection rule of genome-library-optimized-sequences (GLOS) that enables introducing large functional diversity into MMR-proficient strains with sequences that are no longer subject to MMR-processing. We implement several GLOS-libraries in Escherichia coli and show that GLOS-libraries indeed retain diversity during genome editing and that such libraries can be used in complex genome editing operations such as concomitant deletions. We argue that this approach allows for stable and efficient fine tuning of chromosomal functions with minimal effort.
Comprehensive analysis of RNA-protein interactions by high-throughput sequencing-RNA affinity profiling.

PubMed

Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T

2014-06-01

RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.
Effects of Aftershock Declustering in Risk Modeling: Case Study of a Subduction Sequence in Mexico

NASA Astrophysics Data System (ADS)

Kane, D. L.; Nyst, M.

2014-12-01

Earthquake hazard and risk models often assume that earthquake rates can be represented by a stationary Poisson process, and that aftershocks observed in historical seismicity catalogs represent a deviation from stationarity that must be corrected before earthquake rates are estimated. Algorithms for classifying individual earthquakes as independent mainshocks or as aftershocks vary widely, and analysis of a single catalog can produce considerably different earthquake rates depending on the declustering method implemented. As these rates are propagated through hazard and risk models, the modeled results will vary due to the assumptions implied by these choices. In particular, the removal of large aftershocks following a mainshock may lead to an underestimation of the rate of damaging earthquakes and potential damage due to a large aftershock may be excluded from the model. We present a case study based on the 1907 - 1911 sequence of nine 6.9 <= Mw <= 7.9 earthquakes along the Cocos - North American plate subduction boundary in Mexico in order to illustrate the variability in risk under various declustering approaches. Previous studies have suggested that subduction zone earthquakes in Mexico tend to occur in clusters, and this particular sequence includes events that would be labeled as aftershocks in some declustering approaches yet are large enough to produce significant damage. We model the ground motion for each event, determine damage ratios using modern exposure data, and then compare the variability in the modeled damage from using the full catalog or one of several declustered catalogs containing only "independent" events. We also consider the effects of progressive damage caused by each subsequent event and how this might increase or decrease the total losses expected from this sequence.
A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing.

PubMed

Oono, Ryoko

2017-01-01

High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.
A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing

PubMed Central

2017-01-01

High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889
Scaling exponents for ordered maxima

DOE PAGES

Ben-Naim, E.; Krapivsky, P. L.; Lemons, N. W.

2015-12-22

We study extreme value statistics of multiple sequences of random variables. For each sequence with N variables, independently drawn from the same distribution, the running maximum is defined as the largest variable to date. We compare the running maxima of m independent sequences and investigate the probability S N that the maxima are perfectly ordered, that is, the running maximum of the first sequence is always larger than that of the second sequence, which is always larger than the running maximum of the third sequence, and so on. The probability S N is universal: it does not depend on themore » distribution from which the random variables are drawn. For two sequences, S N~N –1/2, and in general, the decay is algebraic, S N~N –σm, for large N. We analytically obtain the exponent σ 3≅1.302931 as root of a transcendental equation. Moreover, the exponents σ m grow with m, and we show that σ m~m for large m.« less
Pms2 Suppresses Large Expansions of the (GAA·TTC)n Sequence in Neuronal Tissues

PubMed Central

Bourn, Rebecka L.; De Biase, Irene; Pinto, Ricardo Mouro; Sandi, Chiranjeevi; Al-Mahdawi, Sahar; Pook, Mark A.; Bidichandani, Sanjay I.

2012-01-01

Expanded trinucleotide repeat sequences are the cause of several inherited neurodegenerative diseases. Disease pathogenesis is correlated with several features of somatic instability of these sequences, including further large expansions in postmitotic tissues. The presence of somatic expansions in postmitotic tissues is consistent with DNA repair being a major determinant of somatic instability. Indeed, proteins in the mismatch repair (MMR) pathway are required for instability of the expanded (CAG·CTG)n sequence, likely via recognition of intrastrand hairpins by MutSβ. It is not clear if or how MMR would affect instability of disease-causing expanded trinucleotide repeat sequences that adopt secondary structures other than hairpins, such as the triplex/R-loop forming (GAA·TTC)n sequence that causes Friedreich ataxia. We analyzed somatic instability in transgenic mice that carry an expanded (GAA·TTC)n sequence in the context of the human FXN locus and lack the individual MMR proteins Msh2, Msh6 or Pms2. The absence of Msh2 or Msh6 resulted in a dramatic reduction in somatic mutations, indicating that mammalian MMR promotes instability of the (GAA·TTC)n sequence via MutSα. The absence of Pms2 resulted in increased accumulation of large expansions in the nervous system (cerebellum, cerebrum, and dorsal root ganglia) but not in non-neuronal tissues (heart and kidney), without affecting the prevalence of contractions. Pms2 suppressed large expansions specifically in tissues showing MutSα-dependent somatic instability, suggesting that they may act on the same lesion or structure associated with the expanded (GAA·TTC)n sequence. We conclude that Pms2 specifically suppresses large expansions of a pathogenic trinucleotide repeat sequence in neuronal tissues, possibly acting independently of the canonical MMR pathway. PMID:23071719
Pms2 suppresses large expansions of the (GAA·TTC)n sequence in neuronal tissues.

PubMed

Bourn, Rebecka L; De Biase, Irene; Pinto, Ricardo Mouro; Sandi, Chiranjeevi; Al-Mahdawi, Sahar; Pook, Mark A; Bidichandani, Sanjay I

2012-01-01

Expanded trinucleotide repeat sequences are the cause of several inherited neurodegenerative diseases. Disease pathogenesis is correlated with several features of somatic instability of these sequences, including further large expansions in postmitotic tissues. The presence of somatic expansions in postmitotic tissues is consistent with DNA repair being a major determinant of somatic instability. Indeed, proteins in the mismatch repair (MMR) pathway are required for instability of the expanded (CAG·CTG)(n) sequence, likely via recognition of intrastrand hairpins by MutSβ. It is not clear if or how MMR would affect instability of disease-causing expanded trinucleotide repeat sequences that adopt secondary structures other than hairpins, such as the triplex/R-loop forming (GAA·TTC)(n) sequence that causes Friedreich ataxia. We analyzed somatic instability in transgenic mice that carry an expanded (GAA·TTC)(n) sequence in the context of the human FXN locus and lack the individual MMR proteins Msh2, Msh6 or Pms2. The absence of Msh2 or Msh6 resulted in a dramatic reduction in somatic mutations, indicating that mammalian MMR promotes instability of the (GAA·TTC)(n) sequence via MutSα. The absence of Pms2 resulted in increased accumulation of large expansions in the nervous system (cerebellum, cerebrum, and dorsal root ganglia) but not in non-neuronal tissues (heart and kidney), without affecting the prevalence of contractions. Pms2 suppressed large expansions specifically in tissues showing MutSα-dependent somatic instability, suggesting that they may act on the same lesion or structure associated with the expanded (GAA·TTC)(n) sequence. We conclude that Pms2 specifically suppresses large expansions of a pathogenic trinucleotide repeat sequence in neuronal tissues, possibly acting independently of the canonical MMR pathway.
CLAST: CUDA implemented large-scale alignment search tool.

PubMed

Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken

2014-12-11

Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
Insights into Hox protein function from a large scale combinatorial analysis of protein domains.

PubMed

Merabet, Samir; Litim-Mecheri, Isma; Karlsson, Daniel; Dixit, Richa; Saadaoui, Mehdi; Monier, Bruno; Brun, Christine; Thor, Stefan; Vijayraghavan, K; Perrin, Laurent; Pradel, Jacques; Graba, Yacine

2011-10-01

Protein function is encoded within protein sequence and protein domains. However, how protein domains cooperate within a protein to modulate overall activity and how this impacts functional diversification at the molecular and organism levels remains largely unaddressed. Focusing on three domains of the central class Drosophila Hox transcription factor AbdominalA (AbdA), we used combinatorial domain mutations and most known AbdA developmental functions as biological readouts to investigate how protein domains collectively shape protein activity. The results uncover redundancy, interactivity, and multifunctionality of protein domains as salient features underlying overall AbdA protein activity, providing means to apprehend functional diversity and accounting for the robustness of Hox-controlled developmental programs. Importantly, the results highlight context-dependency in protein domain usage and interaction, allowing major modifications in domains to be tolerated without general functional loss. The non-pleoitropic effect of domain mutation suggests that protein modification may contribute more broadly to molecular changes underlying morphological diversification during evolution, so far thought to rely largely on modification in gene cis-regulatory sequences.
Insights into Hox Protein Function from a Large Scale Combinatorial Analysis of Protein Domains

PubMed Central

Karlsson, Daniel; Dixit, Richa; Saadaoui, Mehdi; Monier, Bruno; Brun, Christine; Thor, Stefan; Vijayraghavan, K.; Perrin, Laurent; Pradel, Jacques; Graba, Yacine

2011-01-01

Protein function is encoded within protein sequence and protein domains. However, how protein domains cooperate within a protein to modulate overall activity and how this impacts functional diversification at the molecular and organism levels remains largely unaddressed. Focusing on three domains of the central class Drosophila Hox transcription factor AbdominalA (AbdA), we used combinatorial domain mutations and most known AbdA developmental functions as biological readouts to investigate how protein domains collectively shape protein activity. The results uncover redundancy, interactivity, and multifunctionality of protein domains as salient features underlying overall AbdA protein activity, providing means to apprehend functional diversity and accounting for the robustness of Hox-controlled developmental programs. Importantly, the results highlight context-dependency in protein domain usage and interaction, allowing major modifications in domains to be tolerated without general functional loss. The non-pleoitropic effect of domain mutation suggests that protein modification may contribute more broadly to molecular changes underlying morphological diversification during evolution, so far thought to rely largely on modification in gene cis-regulatory sequences. PMID:22046139
The contribution of alu elements to mutagenic DNA double-strand break repair.

PubMed

Morales, Maria E; White, Travis B; Streva, Vincent A; DeFreece, Cecily B; Hedges, Dale J; Deininger, Prescott L

2015-03-01

Alu elements make up the largest family of human mobile elements, numbering 1.1 million copies and comprising 11% of the human genome. As a consequence of evolution and genetic drift, Alu elements of various sequence divergence exist throughout the human genome. Alu/Alu recombination has been shown to cause approximately 0.5% of new human genetic diseases and contribute to extensive genomic structural variation. To begin understanding the molecular mechanisms leading to these rearrangements in mammalian cells, we constructed Alu/Alu recombination reporter cell lines containing Alu elements ranging in sequence divergence from 0%-30% that allow detection of both Alu/Alu recombination and large non-homologous end joining (NHEJ) deletions that range from 1.0 to 1.9 kb in size. Introduction of as little as 0.7% sequence divergence between Alu elements resulted in a significant reduction in recombination, which indicates even small degrees of sequence divergence reduce the efficiency of homology-directed DNA double-strand break (DSB) repair. Further reduction in recombination was observed in a sequence divergence-dependent manner for diverged Alu/Alu recombination constructs with up to 10% sequence divergence. With greater levels of sequence divergence (15%-30%), we observed a significant increase in DSB repair due to a shift from Alu/Alu recombination to variable-length NHEJ which removes sequence between the two Alu elements. This increase in NHEJ deletions depends on the presence of Alu sequence homeology (similar but not identical sequences). Analysis of recombination products revealed that Alu/Alu recombination junctions occur more frequently in the first 100 bp of the Alu element within our reporter assay, just as they do in genomic Alu/Alu recombination events. This is the first extensive study characterizing the influence of Alu element sequence divergence on DNA repair, which will inform predictions regarding the effect of Alu element sequence divergence on both the rate and nature of DNA repair events.
Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

PubMed

Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

2018-01-01

Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

Deconstruction of the Ras switching cycle through saturation mutagenesis

PubMed Central

Bandaru, Pradeep; Shah, Neel H; Bhattacharyya, Moitrayee; Barton, John P; Kondo, Yasushi; Cofsky, Joshua C; Gee, Christine L; Chakraborty, Arup K; Kortemme, Tanja; Ranganathan, Rama; Kuriyan, John

2017-01-01

Ras proteins are highly conserved signaling molecules that exhibit regulated, nucleotide-dependent switching between active and inactive states. The high conservation of Ras requires mechanistic explanation, especially given the general mutational tolerance of proteins. Here, we use deep mutational scanning, biochemical analysis and molecular simulations to understand constraints on Ras sequence. Ras exhibits global sensitivity to mutation when regulated by a GTPase activating protein and a nucleotide exchange factor. Removing the regulators shifts the distribution of mutational effects to be largely neutral, and reveals hotspots of activating mutations in residues that restrain Ras dynamics and promote the inactive state. Evolutionary analysis, combined with structural and mutational data, argue that Ras has co-evolved with its regulators in the vertebrate lineage. Overall, our results show that sequence conservation in Ras depends strongly on the biochemical network in which it operates, providing a framework for understanding the origin of global selection pressures on proteins. DOI: http://dx.doi.org/10.7554/eLife.27810.001 PMID:28686159
The ion-induced folding of the hammerhead ribozyme: core sequence changes that perturb folding into the active conformation.

PubMed Central

Bassi, G S; Murchie, A I; Lilley, D M

1996-01-01

The hammerhead ribozyme undergoes an ion-dependent folding process into the active conformation. We find that the folding can be blocked at specific stages by changes of sequence or functionality within the core. In the the absence of added metal ions, the global structure of the hammerhead is extended, with a large angle subtended between stems I and II. No core sequence changes appear to alter this geometry, consistent with an unstructured core under these conditions. Upon addition of low concentrations of magnesium ions, the hammerhead folds by an association of stems II and III, to include a large angle between them. This stage is inhibited or altered by mutations within the oligopurine sequence lying between stems II and III, and folding is completely prevented by an A14G mutation. Further increase in magnesium ion concentration brings about a second stage of folding in the natural sequence hammerhead, involving a reorientation of stem I, which rotates around into the same direction of stem II. Because this transition occurs over the same range of magnesium ion concentration over which the hammerhead ribozyme becomes active, it is likely that the final conformation is most closely related to the active form of the structure. Magnesium ion-dependent folding into this conformation is prevented by changes at G5, notably removal of the 2'-hydroxyl group and replacement of the base by cytidine. The ability to dissect the folding process by means of sequence changes suggests that two separate ion-dependent stages are involved in the folding of the hammerhead ribozyme into the active conformation. PMID:8752086
Fungal diversity in grape must and wine fermentation assessed by massive sequencing, quantitative PCR and DGGE

PubMed Central

Wang, Chunxiao; García-Fernández, David; Mas, Albert; Esteve-Zarzoso, Braulio

2015-01-01

The diversity of fungi in grape must and during wine fermentation was investigated in this study by culture-dependent and culture-independent techniques. Carignan and Grenache grapes were harvested from three vineyards in the Priorat region (Spain) in 2012, and nine samples were selected from the grape must after crushing and during wine fermentation. From culture-dependent techniques, 362 isolates were randomly selected and identified by 5.8S-ITS-RFLP and 26S-D1/D2 sequencing. Meanwhile, genomic DNA was extracted directly from the nine samples and analyzed by qPCR, DGGE and massive sequencing. The results indicated that grape must after crushing harbored a high species richness of fungi with Aspergillus tubingensis, Aureobasidium pullulans, or Starmerella bacillaris as the dominant species. As fermentation proceeded, the species richness decreased, and yeasts such as Hanseniaspora uvarum, Starmerella bacillaris and Saccharomyces cerevisiae successively occupied the must samples. The “terroir” characteristics of the fungus population are more related to the location of the vineyard than to grape variety. Sulfur dioxide treatment caused a low effect on yeast diversity by similarity analysis. Because of the existence of large population of fungi on grape berries, massive sequencing was more appropriate to understand the fungal community in grape must after crushing than the other techniques used in this study. Suitable target sequences and databases were necessary for accurate evaluation of the community and the identification of species by the 454 pyrosequencing of amplicons. PMID:26557110
Implementation into earthquake sequence simulations of a rate- and state-dependent friction law incorporating pressure solution creep

NASA Astrophysics Data System (ADS)

Noda, H.

2016-05-01

Pressure solution creep (PSC) is an important elementary process in rock friction at high temperatures where solubilities of rock-forming minerals are significantly large. It significantly changes the frictional resistance and enhances time-dependent strengthening. A recent microphysical model for PSC-involved friction of clay-quartz mixtures, which can explain a transition between dilatant and non-dilatant deformation (d-nd transition), was modified here and implemented in dynamic earthquake sequence simulations. The original model resulted in essentially a kind of rate- and state-dependent friction (RSF) law, but assumed a constant friction coefficient for clay resulting in zero instantaneous rate dependency in the dilatant regime. In this study, an instantaneous rate dependency for the clay friction coefficient was introduced, consistent with experiments, resulting in a friction law suitable for earthquake sequence simulations. In addition, a term for time-dependent strengthening due to PSC was added which makes the friction law logarithmically rate-weakening in the dilatant regime. The width of the zone in which clasts overlap or, equivalently, the interface porosity involved in PSC plays a role as the state variable. Such a concrete physical meaning of the state variable is a great advantage in future modelling studies incorporating other physical processes such as hydraulic effects. Earthquake sequence simulations with different pore pressure distributions demonstrated that excess pore pressure at depth causes deeper rupture propagation with smaller slip per event and a shorter recurrence interval. The simulated ruptures were arrested a few kilometres below the point of pre-seismic peak stress at the d-nd transition and did not propagate spontaneously into the region of pre-seismic non-dilatant deformation. PSC weakens the fault against slow deformation and thus such a region cannot produce a dynamic stress drop. Dynamic rupture propagation further down to brittle-plastic transition, evidenced by geological observations, would require even smaller frictional resistance at coseismic slip rate, suggesting the importance of implementation of dynamic weakening activated at coseismic slip rates for more realistic simulation of earthquake sequences. The present models produced much smaller afterslip at deeper parts of arrested ruptures than those with logarithmic RSF laws because of a more significant rate-strengthening effect due to linearly viscous PSC. Detailed investigation of afterslip would give a clue to understand the deformation mechanism which controls shear resistance of the fault in a region of arrest of earthquake ruptures.
The length but not the sequence of peptide linker modules exerts the primary influence on the conformations of protein domains in cellulosome multi-enzyme complexes.

PubMed

Różycki, Bartosz; Cazade, Pierre-André; O'Mahony, Shane; Thompson, Damien; Cieplak, Marek

2017-08-16

Cellulosomes are large multi-protein catalysts produced by various anaerobic microorganisms to efficiently degrade plant cell-wall polysaccharides down into simple sugars. X-ray and physicochemical structural characterisations show that cellulosomes are composed of numerous protein domains that are connected by unstructured polypeptide segments, yet the properties and possible roles of these 'linker' peptides are largely unknown. We have performed coarse-grained and all-atom molecular dynamics computer simulations of a number of cellulosomal linkers of different lengths and compositions. Our data demonstrates that the effective stiffness of the linker peptides, as quantified by the equilibrium fluctuations in the end-to-end distances, depends primarily on the length of the linker and less so on the specific amino acid sequence. The presence of excluded volume - provided by the domains that are connected - dampens the motion of the linker residues and reduces the effective stiffness of the linkers. Simultaneously, the presence of the linkers alters the conformations of the protein domains that are connected. We demonstrate that short, stiff linkers induce significant rearrangements in the folded domains of the mini-cellulosome composed of endoglucanase Cel8A in complex with scaffoldin ScafT (Cel8A-ScafT) of Clostridium thermocellum as well as in a two-cohesin system derived from the scaffoldin ScaB of Acetivibrio cellulolyticus. We give experimentally testable predictions on structural changes in protein domains that depend on the length of linkers.
The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

PubMed

Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

2015-11-04

In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.
A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans

PubMed Central

Timpson, Nicholas J.; Walter, Klaudia; Min, Josine L.; Tachmazidou, Ioanna; Malerba, Giovanni; Shin, So-Youn; Chen, Lu; Futema, Marta; Southam, Lorraine; Iotchkova, Valentina; Cocca, Massimiliano; Huang, Jie; Memari, Yasin; McCarthy, Shane; Danecek, Petr; Muddyman, Dawn; Mangino, Massimo; Menni, Cristina; Perry, John R. B.; Ring, Susan M.; Gaye, Amadou; Dedoussis, George; Farmaki, Aliki-Eleni; Burton, Paul; Talmud, Philippa J.; Gambaro, Giovanni; Spector, Tim D.; Smith, George Davey; Durbin, Richard; Richards, J Brent; Humphries, Steve E.; Zeggini, Eleftheria; Soranzo, Nicole; Al Turki, Saeed; Anderson, Carl; Anney, Richard; Antony, Dinu; Soler Artigas, Maria; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Danecek, Petr; Davey Smith, George; Day-Williams, Aaron; Day, Ian N. M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Durbin, Richard; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A Reghan; Franklin, Christopher S; Futema, Marta; Gallagher, Louise; Gaunt, Tom; Geihs, Matthias; Geschwind, Daniel; Greenwood, Celia; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Jie; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McCarthy, Shane; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Memari, Yasin; Metrustry, Sarah; Min, Josine; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muddyman, Dawn; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Perry, John; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quail, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R. S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shihab, Hasheem; Shin, So-Youn; Skuse, David; Small, Kerrin; Soranzo, Nicole; Southam, Lorraine; Spasic-Boskovic, Olivera; Spector, Tim; St Clair, David; Stalker, Jim; Stevens, Elizabeth; St Pourcian, Beate; Sun, Jianping; Surdulescu, Gabriela; Suvisaari, Jaana; Tachmazidou, Ionna; Timpson, Nicholas; Tobin, Martin D.; Valdes, Ana; Van Kogelenberg, Margriet; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walter, Klaudia; Walters, James T. R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wilson, Scott G.; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zeggini, Eleftheria; Zhang, Fend; Zhang, Pingbo; Zheng, Hou-Feng

2014-01-01

The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (−1.43 s.d. (s.e.=0.27 per minor allele (P-value=8.0 × 10−8)) discovered in 3,202 individuals with low read-depth, whole-genome sequence. We replicate this in 12,831 participants from five additional samples of Northern and Southern European origin (−1.0 s.d. (s.e.=0.173), P-value=7.32 × 10−9). This is consistent with an effect between 0.5 and 1.5 mmol l−1 dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale. PMID:25225788
Single Machine Scheduling and Due Date Assignment with Past-Sequence-Dependent Setup Time and Position-Dependent Processing Time

PubMed Central

Zhao, Chuan-Li; Hsu, Hua-Feng

2014-01-01

This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n 4) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n 3) time by providing a dynamic programming algorithm. PMID:25258727
Single machine scheduling and due date assignment with past-sequence-dependent setup time and position-dependent processing time.

PubMed

Zhao, Chuan-Li; Hsu, Chou-Jung; Hsu, Hua-Feng

2014-01-01

This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n(4)) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n(3)) time by providing a dynamic programming algorithm.
Nature and evolution of B chromosomes in plants: A non-coding but information-rich part of plant genomes.

PubMed

Puertas, M J

2002-01-01

This work reviews recent advances providing insights on the origin and evolution of B chromosomes (Bs) in representative plant species. Brachyome dichromosomatica has large and micro Bs. Both carry an inactive ribosomal gene cluster. The large Bs contain the B-specific Bd49 family, mainly located at the centromere. Multiple copies are present in the A chromosomes (As) of related species, whereas only a few copies exist in B. dichromosomatica As. The micro Bs share sequences with the As, the large Bs and have the B-specific repeats Bdm29 and Bdm54. It seems that the large and micro Bs are related in origin. It is very unlikely that the Bs originated by simple excision from the As. Rye Bs are composed of sequences predominantly shared with the As. B-specific sequences are located at the heterochromatic end of the long arm. Probably, they originated from the As after many rearrangements, with a tendency for duplication. The E3900 family derives from a Ty3 gypsy retrotransposon, but the D1100 family shows no evidence of genic origin. The overall composition of maize As and Bs is similar suggesting a common origin. Several B-specific sequences have been found, the most studied being pZmBs, which is located at the B centromere. It shows partial homology to the centromere of chromosome 4 and to the knobs. It is not known whether the B centromere derives from centromere 4, or whether both have a more distant common origin. The dynamics of Bs in populations depends on their non-Mendelian mechanisms of transmission, their effects on carrier fitness and on A genes modulating their parasitic properties. Three representative examples are reviewed. The Bs of Allium schoenoprassum are transmitted at a mean lower than Mendelian and adversely affect vigour and fertility. However, there is a differential selection operating in favour of B-containing seedlings. Rye Bs undergo strong drive, which is counteracted by harmful effects on fertility and instabilities at meiosis. Both nondisjunction and meiotic behaviour, and consequently the establishment of B polymorphisms, mainly depend on the Bs themselves. B nondisjunction in maize is controlled by the B, but the As control preferential fertilisation. Considering the non-equilibrium model, the Bs of Allium seem to have been neutralised by the A genome, the As of maize provide defence against B attack, whereas the Bs of rye are only slightly neutralized. Copyright 2002 S. Karger AG, Basel
CRISPR interference and priming varies with individual spacer sequences

PubMed Central

Xue, Chaoyou; Seetharam, Arun S.; Musharova, Olga; Severinov, Konstantin; J. Brouns, Stan J.; Severin, Andrew J.; Sashital, Dipali G.

2015-01-01

CRISPR–Cas (clustered regularly interspaced short palindromic repeats-CRISPR associated) systems allow bacteria to adapt to infection by acquiring ‘spacer’ sequences from invader DNA into genomic CRISPR loci. Cas proteins use RNAs derived from these loci to target cognate sequences for destruction through CRISPR interference. Mutations in the protospacer adjacent motif (PAM) and seed regions block interference but promote rapid ‘primed’ adaptation. Here, we use multiple spacer sequences to reexamine the PAM and seed sequence requirements for interference and priming in the Escherichia coli Type I-E CRISPR–Cas system. Surprisingly, CRISPR interference is far more tolerant of mutations in the seed and the PAM than previously reported, and this mutational tolerance, as well as priming activity, is highly dependent on spacer sequence. We identify a large number of functional PAMs that can promote interference, priming or both activities, depending on the associated spacer sequence. Functional PAMs are preferentially acquired during unprimed ‘naïve’ adaptation, leading to a rapid priming response following infection. Our results provide numerous insights into the importance of both spacer and target sequences for interference and priming, and reveal that priming is a major pathway for adaptation during initial infection. PMID:26586800
Validation of high-resolution DNA melting analysis for mutation scanning of the CDKL5 gene: identification of novel mutations.

PubMed

Raymond, Laure; Diebold, Bertrand; Leroux, Céline; Maurey, Hélène; Drouin-Garraud, Valérie; Delahaye, Andre; Dulac, Olivier; Metreau, Julia; Melikishvili, Gia; Toutain, Annick; Rivier, François; Bahi-Buisson, Nadia; Bienvenu, Thierry

2013-01-01

Mutations in the cyclin-dependent kinase-like 5 gene (CDKL5) have been predominantly described in epileptic encephalopathies of female, including infantile spasms with Rett-like features. Up to now, detection of mutations in this gene was made by laborious, expensive and/or time consuming methods. Here, we decided to validate high-resolution melting analysis (HRMA) for mutation scanning of the CDKL5 gene. Firstly, using a large DNA bank consisting to 34 samples carrying different mutations and polymorphisms, we validated our analytical conditions to analyse the different exons and flanking intronic sequences of the CDKL5 gene by HRMA. Secondly, we screened CDKL5 by both HRMA and denaturing high performance liquid chromatography (dHPLC) in a cohort of 135 patients with early-onset seizures. Our results showed that point mutations and small insertions and deletions can be reliably detected by HRMA. Compared to dHPLC, HRMA profiles are more discriminated, thereby decreasing unnecessary sequencing. In this study, we identified eleven novel sequence variations including four pathogenic mutations (2.96% prevalence). HRMA appears cost-effective, easy to set up, highly sensitive, non-toxic and rapid for mutation screening, ideally suited for large genes with heterogeneous mutations located along the whole coding sequence, such as the CDKL5 gene. Copyright © 2012 Elsevier B.V. All rights reserved.
Effects of Site-Specific Guanine C8-Modifications on an Intramolecular DNA G-Quadruplex

PubMed Central

Lech, Christopher Jacques; Cheow Lim, Joefina Kim; Wen Lim, Jocelyn Mei; Amrane, Samir; Heddi, Brahim; Phan, Anh Tuân

2011-01-01

Understanding the fundamentals of G-quadruplex formation is important both for targeting G-quadruplexes formed by natural sequences and for engineering new G-quadruplexes with desired properties. Using a combination of experimental and computational techniques, we have investigated the effects of site-specific substitution of a guanine with C8-modified guanine derivatives, including 8-bromo-guanine, 8-O-methyl-guanine, 8-amino-guanine, and 8-oxo-guanine, within a well-defined (3 + 1) human telomeric G-quadruplex platform. The effects of substitutions on the stability of the G-quadruplex were found to depend on the type and position of the modification among different guanines in the structure. An interesting modification-dependent NMR chemical-shift effect was observed across basepairing within a guanine tetrad. This effect was reproduced by ab initio quantum mechanical computations, which showed that the observed variation in imino proton chemical shift is largely influenced by changes in hydrogen-bond geometry within the guanine tetrad. PMID:22004753
Rotational evolution of slow-rotator sequence stars

NASA Astrophysics Data System (ADS)

Lanzafame, A. C.; Spada, F.

2015-12-01

Context. The observed relationship between mass, age and rotation in open clusters shows the progressive development of a slow-rotator sequence among stars possessing a radiative interior and a convective envelope during their pre-main sequence and main-sequence evolution. After 0.6 Gyr, most cluster members of this type have settled on this sequence. Aims: The observed clustering on this sequence suggests that it corresponds to some equilibrium or asymptotic condition that still lacks a complete theoretical interpretation, and which is crucial to our understanding of the stellar angular momentum evolution. Methods: We couple a rotational evolution model, which takes internal differential rotation into account, with classical and new proposals for the wind braking law, and fit models to the data using a Monte Carlo Markov chain (MCMC) method tailored to the problem at hand. We explore to what extent these models are able to reproduce the mass and time dependence of the stellar rotational evolution on the slow-rotator sequence. Results: The description of the evolution of the slow-rotator sequence requires taking the transfer of angular momentum from the radiative core to the convective envelope into account. We find that, in the mass range 0.85-1.10 M⊙, the core-envelope coupling timescale for stars in the slow-rotator sequence scales as M-7.28. Quasi-solid body rotation is achieved only after 1-2 Gyr, depending on stellar mass, which implies that observing small deviations from the Skumanich law (P ∝ √{t}) would require period data of older open clusters than is available to date. The observed evolution in the 0.1-2.5 Gyr age range and in the 0.85-1.10 M⊙ mass range is best reproduced by assuming an empirical mass dependence of the wind angular momentum loss proportional to the convective turnover timescale and to the stellar moment of inertia. Period isochrones based on our MCMC fit provide a tool for inferring stellar ages of solar-like main-sequence stars from their mass and rotation period that is largely independent of the wind braking model adopted. These effectively represent gyro-chronology relationships that take the physics of the two-zone model for the stellar angular momentum evolution into account.
Three ingredients for Improved global aftershock forecasts: Tectonic region, time-dependent catalog incompleteness, and inter-sequence variability

USGS Publications Warehouse

Page, Morgan T.; Van Der Elst, Nicholas; Hardebeck, Jeanne L.; Felzer, Karen; Michael, Andrew J.

2016-01-01

Following a large earthquake, seismic hazard can be orders of magnitude higher than the long‐term average as a result of aftershock triggering. Because of this heightened hazard, emergency managers and the public demand rapid, authoritative, and reliable aftershock forecasts. In the past, U.S. Geological Survey (USGS) aftershock forecasts following large global earthquakes have been released on an ad hoc basis with inconsistent methods, and in some cases aftershock parameters adapted from California. To remedy this, the USGS is currently developing an automated aftershock product based on the Reasenberg and Jones (1989) method that will generate more accurate forecasts. To better capture spatial variations in aftershock productivity and decay, we estimate regional aftershock parameters for sequences within the García et al. (2012) tectonic regions. We find that regional variations for mean aftershock productivity reach almost a factor of 10. We also develop a method to account for the time‐dependent magnitude of completeness following large events in the catalog. In addition to estimating average sequence parameters within regions, we develop an inverse method to estimate the intersequence parameter variability. This allows for a more complete quantification of the forecast uncertainties and Bayesian updating of the forecast as sequence‐specific information becomes available.
Continuous aesthetic judgment of image sequences.

PubMed

Khaw, Mel W; Freedberg, David

2018-05-18

Perceptual judgments are said to be reference-dependent as they change on the basis of recent experiences. Here we quantify sequence effects within two types of aesthetic judgments: (i) individual ratings of single images (during self-paced trials) and (ii) continuous ratings of image sequences. As in the case of known contrast effects, trial-by-trial aesthetic responses are negatively correlated with judgments made toward the preceding image. During continuous judgment, a different type of bias is observed. The onset of change within a sequence introduces a persistent increase in ratings (relative to when the same images are judged in isolation). Furthermore, subjects indicate adjustment patterns and choices that selectively favor sequences that are rich in change. Sequence effects in aesthetic judgments thus differ greatly depending on the continuity and arrangement of presented stimuli. The effects highlighted here are important in understanding sustained aesthetic responses over time, such as those elicited during choreographic and musical arrangements. In contrast, standard measurements of aesthetic responses (over trials) may represent a series of distinct aesthetic experiences (e.g., viewing artworks in a museum). Copyright © 2018 Elsevier B.V. All rights reserved.
A model of human motor sequence learning explains facilitation and interference effects based on spike-timing dependent plasticity.

PubMed

Wang, Quan; Rothkopf, Constantin A; Triesch, Jochen

2017-08-01

The ability to learn sequential behaviors is a fundamental property of our brains. Yet a long stream of studies including recent experiments investigating motor sequence learning in adult human subjects have produced a number of puzzling and seemingly contradictory results. In particular, when subjects have to learn multiple action sequences, learning is sometimes impaired by proactive and retroactive interference effects. In other situations, however, learning is accelerated as reflected in facilitation and transfer effects. At present it is unclear what the underlying neural mechanism are that give rise to these diverse findings. Here we show that a recently developed recurrent neural network model readily reproduces this diverse set of findings. The self-organizing recurrent neural network (SORN) model is a network of recurrently connected threshold units that combines a simplified form of spike-timing dependent plasticity (STDP) with homeostatic plasticity mechanisms ensuring network stability, namely intrinsic plasticity (IP) and synaptic normalization (SN). When trained on sequence learning tasks modeled after recent experiments we find that it reproduces the full range of interference, facilitation, and transfer effects. We show how these effects are rooted in the network's changing internal representation of the different sequences across learning and how they depend on an interaction of training schedule and task similarity. Furthermore, since learning in the model is based on fundamental neuronal plasticity mechanisms, the model reveals how these plasticity mechanisms are ultimately responsible for the network's sequence learning abilities. In particular, we find that all three plasticity mechanisms are essential for the network to learn effective internal models of the different training sequences. This ability to form effective internal models is also the basis for the observed interference and facilitation effects. This suggests that STDP, IP, and SN may be the driving forces behind our ability to learn complex action sequences.
Deriving high-resolution protein backbone structure propensities from all crystal data using the information maximization device.

PubMed

Solis, Armando D

2014-01-01

The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.
TESTING SCALING RELATIONS FOR SOLAR-LIKE OSCILLATIONS FROM THE MAIN SEQUENCE TO RED GIANTS USING KEPLER DATA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huber, D.; Bedding, T. R.; Stello, D.

2011-12-20

We have analyzed solar-like oscillations in {approx}1700 stars observed by the Kepler Mission, spanning from the main sequence to the red clump. Using evolutionary models, we test asteroseismic scaling relations for the frequency of maximum power ({nu}{sub max}), the large frequency separation ({Delta}{nu}), and oscillation amplitudes. We show that the difference of the {Delta}{nu}-{nu}{sub max} relation for unevolved and evolved stars can be explained by different distributions in effective temperature and stellar mass, in agreement with what is expected from scaling relations. For oscillation amplitudes, we show that neither (L/M){sup s} scaling nor the revised scaling relation by Kjeldsen andmore » Bedding is accurate for red-giant stars, and demonstrate that a revised scaling relation with a separate luminosity-mass dependence can be used to calculate amplitudes from the main sequence to red giants to a precision of {approx}25%. The residuals show an offset particularly for unevolved stars, suggesting that an additional physical dependency is necessary to fully reproduce the observed amplitudes. We investigate correlations between amplitudes and stellar activity, and find evidence that the effect of amplitude suppression is most pronounced for subgiant stars. Finally, we test the location of the cool edge of the instability strip in the Hertzsprung-Russell diagram using solar-like oscillations and find the detections in the hottest stars compatible with a domain of hybrid stochastically excited and opacity driven pulsation.« less
Chromatin accessibility and guide sequence secondary structure affect CRISPR-Cas9 gene editing efficiency.

PubMed

Jensen, Kristopher Torp; Fløe, Lasse; Petersen, Trine Skov; Huang, Jinrong; Xu, Fengping; Bolund, Lars; Luo, Yonglun; Lin, Lin

2017-07-01

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated protein 9 (CRISPR-Cas9) systems have emerged as the method of choice for genome editing, but large variations in on-target efficiencies continue to limit their applicability. Here, we investigate the effect of chromatin accessibility on Cas9-mediated gene editing efficiency for 20 gRNAs targeting 10 genomic loci in HEK293T cells using both SpCas9 and the eSpCas9(1.1) variant. Our study indicates that gene editing is more efficient in euchromatin than in heterochromatin, and we validate this finding in HeLa cells and in human fibroblasts. Furthermore, we investigate the gRNA sequence determinants of CRISPR-Cas9 activity using a surrogate reporter system and find that the efficiency of Cas9-mediated gene editing is dependent on guide sequence secondary structure formation. This knowledge can aid in the further improvement of tools for gRNA design. © 2017 Federation of European Biochemical Societies.

An efficient study design to test parent-of-origin effects in family trios.

PubMed

Yu, Xiaobo; Chen, Gao; Feng, Rui

2017-11-01

Increasing evidence has shown that genes may cause prenatal, neonatal, and pediatric diseases depending on their parental origins. Statistical models that incorporate parent-of-origin effects (POEs) can improve the power of detecting disease-associated genes and help explain the missing heritability of diseases. In many studies, children have been sequenced for genome-wide association testing. But it may become unaffordable to sequence their parents and evaluate POEs. Motivated by the reality, we proposed a budget-friendly study design of sequencing children and only genotyping their parents through single nucleotide polymorphism array. We developed a powerful likelihood-based method, which takes into account both sequence reads and linkage disequilibrium to infer the parental origins of children's alleles and estimate their POEs on the outcome. We evaluated the performance of our proposed method and compared it with an existing method using only genotypes, through extensive simulations. Our method showed higher power than the genotype-based method. When either the mean read depth or the pair-end length was reasonably large, our method achieved ideal power. When single parents' genotypes were unavailable or parental genotypes at the testing locus were not typed, both methods lost power compared with when complete data were available; but the power loss from our method was smaller than the genotype-based method. We also extended our method to accommodate mixed genotype, low-, and high-coverage sequence data from children and their parents. At presence of sequence errors, low-coverage parental sequence data may lead to lower power than parental genotype data. © 2017 WILEY PERIODICALS, INC.
Conditional Probabilities of Large Earthquake Sequences in California from the Physics-based Rupture Simulator RSQSim

NASA Astrophysics Data System (ADS)

Gilchrist, J. J.; Jordan, T. H.; Shaw, B. E.; Milner, K. R.; Richards-Dinger, K. B.; Dieterich, J. H.

2017-12-01

Within the SCEC Collaboratory for Interseismic Simulation and Modeling (CISM), we are developing physics-based forecasting models for earthquake ruptures in California. We employ the 3D boundary element code RSQSim (Rate-State Earthquake Simulator of Dieterich & Richards-Dinger, 2010) to generate synthetic catalogs with tens of millions of events that span up to a million years each. This code models rupture nucleation by rate- and state-dependent friction and Coulomb stress transfer in complex, fully interacting fault systems. The Uniform California Earthquake Rupture Forecast Version 3 (UCERF3) fault and deformation models are used to specify the fault geometry and long-term slip rates. We have employed the Blue Waters supercomputer to generate long catalogs of simulated California seismicity from which we calculate the forecasting statistics for large events. We have performed probabilistic seismic hazard analysis with RSQSim catalogs that were calibrated with system-wide parameters and found a remarkably good agreement with UCERF3 (Milner et al., this meeting). We build on this analysis, comparing the conditional probabilities of sequences of large events from RSQSim and UCERF3. In making these comparisons, we consider the epistemic uncertainties associated with the RSQSim parameters (e.g., rate- and state-frictional parameters), as well as the effects of model-tuning (e.g., adjusting the RSQSim parameters to match UCERF3 recurrence rates). The comparisons illustrate how physics-based rupture simulators might assist forecasters in understanding the short-term hazards of large aftershocks and multi-event sequences associated with complex, multi-fault ruptures.
Complete convergence of randomly weighted END sequences and its application.

PubMed

Li, Penghua; Li, Xiaoqin; Wu, Kehan

2017-01-01

We investigate the complete convergence of partial sums of randomly weighted extended negatively dependent (END) random variables. Some results of complete moment convergence, complete convergence and the strong law of large numbers for this dependent structure are obtained. As an application, we study the convergence of the state observers of linear-time-invariant systems. Our results extend the corresponding earlier ones.
A General Conditional Large Deviation Principle

DOE PAGES

La Cour, Brian R.; Schieve, William C.

2015-07-18

Given a sequence of Borel probability measures on a Hausdorff space which satisfy a large deviation principle (LDP), we consider the corresponding sequence of measures formed by conditioning on a set B. If the large deviation rate function I is good and effectively continuous, and the conditioning set has the property that (1)more » $$\\overline{B°}$$=$$\\overline{B}$$ and (2) I(x)<∞ for all xε$$\\overline{B}$$, then the sequence of conditional measures satisfies a LDP with the good, effectively continuous rate function I B, where I B(x)=I(x)-inf I(B) if xε$$\\overline{B}$$ and I B(x)=∞ otherwise.« less
Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes

PubMed Central

Shringarpure, Suyash S.; Carroll, Andrew; De La Vega, Francisco M.; Bustamante, Carlos D.

2015-01-01

Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. PMID:26110529
Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

PubMed

Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

2016-01-01

Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.
A novel gammaherpesvirus in a large flying fox (Pteropus vampyrus) with blepharitis.

PubMed

Paige Brock, A; Cortés-Hinojosa, Galaxia; Plummer, Caryn E; Conway, Julia A; Roff, Shannon R; Childress, April L; Wellehan, James F X

2013-05-01

A novel gammaherpesvirus was identified in a large flying fox (Pteropus vampyrus) with conjunctivitis, blepharitis, and meibomianitis by nested polymerase chain reaction and sequencing. Polymerase chain reaction amplification and sequencing of 472 base pairs of the DNA-dependent DNA polymerase gene were used to identify a novel herpesvirus. Bayesian and maximum likelihood phylogenetic analyses indicated that the virus is a member of the genus Percavirus in the subfamily Gammaherpesvirinae. Additional research is needed regarding the association of this virus with conjunctivitis and other ocular pathology. This virus may be useful as a biomarker of stress and may be a useful model of virus recrudescence in Pteropus spp.
Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference

PubMed Central

Karcher, Michael D.; Palacios, Julia A.; Bedford, Trevor; Suchard, Marc A.; Minin, Vladimir N.

2016-01-01

Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples. PMID:26938243
Complex Sequencing Rules of Birdsong Can be Explained by Simple Hidden Markov Processes

PubMed Central

Katahira, Kentaro; Suzuki, Kenta; Okanoya, Kazuo; Okada, Masato

2011-01-01

Complex sequencing rules observed in birdsongs provide an opportunity to investigate the neural mechanism for generating complex sequential behaviors. To relate the findings from studying birdsongs to other sequential behaviors such as human speech and musical performance, it is crucial to characterize the statistical properties of the sequencing rules in birdsongs. However, the properties of the sequencing rules in birdsongs have not yet been fully addressed. In this study, we investigate the statistical properties of the complex birdsong of the Bengalese finch (Lonchura striata var. domestica). Based on manual-annotated syllable labeles, we first show that there are significant higher-order context dependencies in Bengalese finch songs, that is, which syllable appears next depends on more than one previous syllable. We then analyze acoustic features of the song and show that higher-order context dependencies can be explained using first-order hidden state transition dynamics with redundant hidden states. This model corresponds to hidden Markov models (HMMs), well known statistical models with a large range of application for time series modeling. The song annotation with these models with first-order hidden state dynamics agreed well with manual annotation, the score was comparable to that of a second-order HMM, and surpassed the zeroth-order model (the Gaussian mixture model; GMM), which does not use context information. Our results imply that the hierarchical representation with hidden state dynamics may underlie the neural implementation for generating complex behavioral sequences with higher-order dependencies. PMID:21915345
A Rapid, Extensive, and Transient Transcriptional Response to Estrogen Signaling in Breast Cancer Cells

PubMed Central

Hah, Nasun; Danko, Charles G.; Core, Leighton; Waterfall, Joshua J.; Siepel, Adam; Lis, John T.; Kraus, W. Lee

2011-01-01

Summary We report the immediate effects of estrogen signaling on the transcriptome of breast cancer cells using Global Run-On and sequencing (GRO-seq). The data were analyzed using a new bioinformatic approach that allowed us to identify transcripts directly from the GRO-seq data. We found that estrogen signaling directly regulates a strikingly large fraction of the transcriptome in a rapid, robust, and unexpectedly transient manner. In addition to protein coding genes, estrogen regulates the distribution and activity of all three RNA polymerases, and virtually every class of non-coding RNA that has been described to date. We also identified a large number of previously undetected estrogen-regulated intergenic transcripts, many of which are found proximal to estrogen receptor binding sites. Collectively, our results provide the most comprehensive measurement of the primary and immediate estrogen effects to date and a resource for understanding rapid signal-dependent transcription in other systems. PMID:21549415
Microscale simulations of shock interaction with large assembly of particles for developing point-particle models

NASA Astrophysics Data System (ADS)

Thakur, Siddharth; Neal, Chris; Mehta, Yash; Sridharan, Prasanth; Jackson, Thomas; Balachandar, S.

2017-01-01

Micrsoscale simulations are being conducted for developing point-particle and other related models that are needed for the mesoscale and macroscale simulations of explosive dispersal of particles. These particle models are required to compute (a) instantaneous aerodynamic force on the particle and (b) instantaneous net heat transfer between the particle and the surrounding. A strategy for a sequence of microscale simulations has been devised that allows systematic development of the hybrid surrogate models that are applicable at conditions representative of the explosive dispersal application. The ongoing microscale simulations seek to examine particle force dependence on: (a) Mach number, (b) Reynolds number, and (c) volume fraction (different particle arrangements such as cubic, face-centered cubic (FCC), body-centered cubic (BCC) and random). Future plans include investigation of sequences of fully-resolved microscale simulations consisting of an array of particles subjected to more realistic time-dependent flows that progressively better approximate the actual problem of explosive dispersal. Additionally, effects of particle shape, size, and number in simulation as well as the transient particle deformation dependence on various parameters including: (a) particle material, (b) medium material, (c) multiple particles, (d) incoming shock pressure and speed, (e) medium to particle impedance ratio, (f) particle shape and orientation to shock, etc. are being investigated.
Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

PubMed

Kikuchi, Akihito; Ikemura, Toshimichi; Abe, Takashi

2015-01-01

With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data.
Sleep and memory consolidation: motor performance and proactive interference effects in sequence learning.

PubMed

Borragán, Guillermo; Urbain, Charline; Schmitz, Rémy; Mary, Alison; Peigneux, Philippe

2015-04-01

That post-training sleep supports the consolidation of sequential motor skills remains debated. Performance improvement and sensitivity to proactive interference are both putative measures of long-term memory consolidation. We tested sleep-dependent memory consolidation for visuo-motor sequence learning using a proactive interference paradigm. Thirty-three young adults were trained on sequence A on Day 1, then had Regular Sleep (RS) or were Sleep Deprived (SD) on the night after learning. After two recovery nights, they were tested on the same sequence A, then had to learn a novel, potentially competing sequence B. We hypothesized that proactive interference effects on sequence B due to the prior learning of sequence A would be higher in the RS condition, considering that proactive interference is an indirect marker of the robustness of sequence A, which should be better consolidated over post-training sleep. Results highlighted sleep-dependent improvement for sequence A, with faster RTs overnight for RS participants only. Moreover, the beneficial impact of sleep was specific to the consolidation of motor but not sequential skills. Proactive interference effects on learning a new material at Day 4 were similar between RS and SD participants. These results suggest that post-training sleep contributes to optimizing motor but not sequential components of performance in visuo-motor sequence learning. Copyright © 2015 Elsevier Inc. All rights reserved.
Characterization of Aftershock Sequences from Large Strike-Slip Earthquakes Along Geometrically Complex Faults

NASA Astrophysics Data System (ADS)

Sexton, E.; Thomas, A.; Delbridge, B. G.

2017-12-01

Large earthquakes often exhibit complex slip distributions and occur along non-planar fault geometries, resulting in variable stress changes throughout the region of the fault hosting aftershocks. To better discern the role of geometric discontinuities on aftershock sequences, we compare areas of enhanced and reduced Coulomb failure stress and mean stress for systematic differences in the time dependence and productivity of these aftershock sequences. In strike-slip faults, releasing structures, including stepovers and bends, experience an increase in both Coulomb failure stress and mean stress during an earthquake, promoting fluid diffusion into the region and further failure. Conversely, Coulomb failure stress and mean stress decrease in restraining bends and stepovers in strike-slip faults, and fluids diffuse away from these areas, discouraging failure. We examine spatial differences in seismicity patterns along structurally complex strike-slip faults which have hosted large earthquakes, such as the 1992 Mw 7.3 Landers, the 2010 Mw 7.2 El-Mayor Cucapah, the 2014 Mw 6.0 South Napa, and the 2016 Mw 7.0 Kumamoto events. We characterize the behavior of these aftershock sequences with the Epidemic Type Aftershock-Sequence Model (ETAS). In this statistical model, the total occurrence rate of aftershocks induced by an earthquake is λ(t) = λ_0 + \\sum_{i:t_i
Cold shock protein YB-1 is involved in hypoxia-dependent gene transcription

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rauen, Thomas; Frye, Bjoern C.; Pneumology, University Medical Center, University of Freiburg, Freiburg

Hypoxia-dependent gene regulation is largely orchestrated by hypoxia-inducible factors (HIFs), which associate with defined nucleotide sequences of hypoxia-responsive elements (HREs). Comparison of the regulatory HRE within the 3′ enhancer of the human erythropoietin (EPO) gene with known binding motifs for cold shock protein Y-box (YB) protein-1 yielded strong similarities within the Y-box element and 3′ adjacent sequences. DNA binding assays confirmed YB-1 binding to both, single- and double-stranded HRE templates. Under hypoxia, we observed nuclear shuttling of YB-1 and co-immunoprecipitation assays demonstrated that YB-1 and HIF-1α physically interact with each other. Cellular YB-1 depletion using siRNA significantly induced hypoxia-dependent EPOmore » production at both, promoter and mRNA level. Vice versa, overexpressed YB-1 significantly reduced EPO-HRE-dependent gene transcription, whereas this effect was minor under normoxia. HIF-1α overexpression induced hypoxia-dependent gene transcription through the same element and accordingly, co-expression with YB-1 reduced HIF-1α-mediated EPO induction under hypoxic conditions. Taken together, we identified YB-1 as a novel binding factor for HREs that participates in fine-tuning of the hypoxia transcriptome. - Highlights: • Hypoxia drives nuclear translocation of cold shock protein YB-1. • YB-1 physically interacts with hypoxia-inducible factor (HIF)-1α. • YB-1 binds to the hypoxia-responsive element (HRE) within the erythropoietin (EPO) 3′ enhancer. • YB-1 trans-regulates transcription of hypoxia-dependent genes such as EPO and VEGF.« less
Parrondo Games with Two-Dimensional Spatial Dependence

NASA Astrophysics Data System (ADS)

Ethier, S. N.; Lee, Jiyeon

Parrondo games with one-dimensional (1D) spatial dependence were introduced by Toral and extended to the two-dimensional (2D) setting by Mihailović and Rajković. MN players are arranged in an M × N array. There are three games, the fair, spatially independent game A, the spatially dependent game B, and game C, which is a random mixture or non-random pattern of games A and B. Of interest is μB (or μC), the mean profit per turn at equilibrium to the set of MN players playing game B (or game C). Game A is fair, so if μB ≤ 0 and μC > 0, then we say the Parrondo effect is present. We obtain a strong law of large numbers (SLLN) and a central limit theorem (CLT) for the sequence of profits of the set of MN players playing game B (or game C). The mean and variance parameters are computable for small arrays and can be simulated otherwise. The SLLN justifies the use of simulation to estimate the mean. The CLT permits evaluation of the standard error of a simulated estimate. We investigate the presence of the Parrondo effect for both small arrays and large ones. One of the findings of Mihailović and Rajković was that “capital evolution depends to a large degree on the lattice size.” We provide evidence that this conclusion is partly incorrect. A paradoxical feature of the 2D game B that does not appear in the 1D setting is that, for fixed M and N, the mean function μB is not necessarily a monotone function of its parameters.
Study designs for identification of rare disease variants in complex diseases: the utility of family-based designs.

PubMed

Ionita-Laza, Iuliana; Ottman, Ruth

2011-11-01

The recent progress in sequencing technologies makes possible large-scale medical sequencing efforts to assess the importance of rare variants in complex diseases. The results of such efforts depend heavily on the use of efficient study designs and analytical methods. We introduce here a unified framework for association testing of rare variants in family-based designs or designs based on unselected affected individuals. This framework allows us to quantify the enrichment in rare disease variants in families containing multiple affected individuals and to investigate the optimal design of studies aiming to identify rare disease variants in complex traits. We show that for many complex diseases with small values for the overall sibling recurrence risk ratio, such as Alzheimer's disease and most cancers, sequencing affected individuals with a positive family history of the disease can be extremely advantageous for identifying rare disease variants. In contrast, for complex diseases with large values of the sibling recurrence risk ratio, sequencing unselected affected individuals may be preferable.
Evolutionary distances in the twilight zone--a rational kernel approach.

PubMed

Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian

2010-12-31

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
The interstellar redding law in the ultraviolet deduced from filter photometry obtained by the OAO-2 satellite

NASA Technical Reports Server (NTRS)

Laget, M.

1972-01-01

Filter photometry has been obtained of 16 BO stars at ten effective wavelengths in the range 4250-1430 A. The wavelength dependence of the interstellar reddening law, deduced from a least squares fit of the observed values to the reddening line at each band, is found in satisfactory agreement with that derived by Bless and Savage (1972). Toward the shorter wavelengths the increase of the computed probable error of the slope of the mean reddening line suggests that large fluctuations in the law may occur from star to star. Similar computations, separating main-sequence stars and supergiants, indicate that the large fluctuations of the law appear to be well related to the luminosity of the stars; the supergiants show systematically less extinction, this deficiency becoming large toward the far UV. The small number in the sample however, does not allow a general conclusion to be drawn.
XS: a FASTQ read simulator.

PubMed

Pratas, Diogo; Pinho, Armando J; Rodrigues, João M O S

2014-01-16

The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche of new specialized tools (for analysis, compression, alignment, among others) and large public and private network infrastructures. Therefore, a direct necessity of specific simulation tools for testing and benchmarking is rising, such as a flexible and portable FASTQ read simulator, without the need of a reference sequence, yet correctly prepared for producing approximately the same characteristics as real data. We present XS, a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores). XS provides an efficient and convenient method for fast simulation of FASTQ files, such as those from Ion Torrent (currently uncovered by other simulators), Roche-454, Illumina and ABI-SOLiD sequencing machines. This tool is publicly available at http://bioinformatics.ua.pt/software/xs/.

Transfer of movement sequences: bigger is better.

PubMed

Dean, Noah J; Kovacs, Attila J; Shea, Charles H

2008-02-01

Experiment 1 was conducted to determine if proportional transfer from "small to large" scale movements is as effective as transferring from "large to small." We hypothesize that the learning of larger scale movement will require the participant to learn to manage the generation, storage, and dissipation of forces better than when practicing smaller scale movements. Thus, we predict an advantage for transfer of larger scale movements to smaller scale movements relative to transfer from smaller to larger scale movements. Experiment 2 was conducted to determine if adding a load to a smaller scale movement would enhance later transfer to a larger scale movement sequence. It was hypothesized that the added load would require the participants to consider the dynamics of the movement to a greater extent than without the load. The results replicated earlier findings of effective transfer from large to small movements, but consistent with our hypothesis, transfer was less effective from small to large (Experiment 1). However, when a load was added during acquisition transfer from small to large was enhanced even though the load was removed during the transfer test. These results are consistent with the notion that the transfer asymmetry noted in Experiment 1 was due to factors related to movement dynamics that were enhanced during practice of the larger scale movement sequence, but not during the practice of the smaller scale movement sequence. The findings that the movement structure is unaffected by transfer direction but the movement dynamics are influenced by transfer direction is consistent with hierarchal models of sequence production.
An Evolutionary/Biochemical Connection Between Promoter- and Primer-Dependent Polymerases Revealed by Selective Evolution of Ligands by Exponential Enrichment (SELEX).

PubMed

Fenstermacher, Katherine J; Achuthan, Vasudevan; Schneider, Thomas D; DeStefano, Jeffrey J

2018-01-16

DNA polymerases (DNAPs) recognize 3' recessed termini on duplex DNA and carry out nucleotide catalysis. Unlike promoter-specific RNA polymerases (RNAPs), no sequence specificity is required for binding or initiation of catalysis. Despite this, previous results indicate that viral reverse transcriptases bind much more tightly to DNA primers that mimic the polypurine tract. In the current report, primer sequences that bind with high affinity to Taq and Klenow polymerases were identified using a modified Selective Evolution of Ligands by Exponential Enrichment (SELEX) approach. Two Taq -specific primers that bound ∼10 (Taq1) and over 100 (Taq2) times more stably than controls to Taq were identified. Taq1 contained 8 nucleotides (5' -CACTAAAG-3') that matched the phage T3 RNAP "core" promoter. Both primers dramatically outcompeted primers with similar binding thermodynamics in PCR reactions. Similarly, exonuclease minus Klenow polymerase also selected a high affinity primer that contained a related core promoter sequence from phage T7 RNAP (5' -ACTATAG-3'). For both Taq and Klenow, even small modifications to the sequence resulted in large losses in binding affinity suggesting that binding was highly sequence-specific. The results are discussed in the context of possible effects on multi-primer (multiplex) PCR assays, molecular information theory, and the evolution of RNAPs and DNAPs. Importance This work further demonstrates that primer-dependent DNA polymerases can have strong sequence biases leading to dramatically tighter binding to specific sequences. These may be related to biological function, or be a consequences of the structural architecture of the enzyme. New sequence specificity for Taq and Klenow polymerases were uncovered and among them were sequences that contained the core promoter elements from T3 and T7 phage RNA polymerase promoters. This suggests the intriguing possibility that phage RNA polymerases exploited intrinsic binding affinities of ancestral DNA polymerases to develop their promotors. Conversely, DNA polymerases could have evolved from related RNA polymerases and retained the intrinsic binding preference despite there being no clear function for such a preference in DNA biology. Copyright © 2018 American Society for Microbiology.
New Sequences with Low Correlation and Large Family Size

NASA Astrophysics Data System (ADS)

Zeng, Fanxin

In direct-sequence code-division multiple-access (DS-CDMA) communication systems and direct-sequence ultra wideband (DS-UWB) radios, sequences with low correlation and large family size are important for reducing multiple access interference (MAI) and accepting more active users, respectively. In this paper, a new collection of families of sequences of length pn-1, which includes three constructions, is proposed. The maximum number of cyclically distinct families without GMW sequences in each construction is φ(pn-1)/n·φ(pm-1)/m, where p is a prime number, n is an even number, and n=2m, and these sequences can be binary or polyphase depending upon choice of the parameter p. In Construction I, there are pn distinct sequences within each family and the new sequences have at most d+2 nontrivial periodic correlation {-pm-1, -1, pm-1, 2pm-1,…,dpm-1}. In Construction II, the new sequences have large family size p2n and possibly take the nontrivial correlation values in {-pm-1, -1, pm-1, 2pm-1,…,(3d-4)pm-1}. In Construction III, the new sequences possess the largest family size p(d-1)n and have at most 2d correlation levels {-pm-1, -1,pm-1, 2pm-1,…,(2d-2)pm-1}. Three constructions are near-optimal with respect to the Welch bound because the values of their Welch-Ratios are moderate, WR_??_d, WR_??_3d-4 and WR_??_2d-2, respectively. Each family in Constructions I, II and III contains a GMW sequence. In addition, Helleseth sequences and Niho sequences are special cases in Constructions I and III, and their restriction conditions to the integers m and n, pm≠2 (mod 3) and n≅0 (mod 4), respectively, are removed in our sequences. Our sequences in Construction III include the sequences with Niho type decimation 3·2m-2, too. Finally, some open questions are pointed out and an example that illustrates the performance of these sequences is given.
STAT1:DNA sequence-dependent binding modulation by phosphorylation, protein:protein interactions and small-molecule inhibition

PubMed Central

Bonham, Andrew J.; Wenta, Nikola; Osslund, Leah M.; Prussin, Aaron J.; Vinkemeier, Uwe; Reich, Norbert O.

2013-01-01

The DNA-binding specificity and affinity of the dimeric human transcription factor (TF) STAT1, were assessed by total internal reflectance fluorescence protein-binding microarrays (TIRF-PBM) to evaluate the effects of protein phosphorylation, higher-order polymerization and small-molecule inhibition. Active, phosphorylated STAT1 showed binding preferences consistent with prior characterization, whereas unphosphorylated STAT1 showed a weak-binding preference for one-half of the GAS consensus site, consistent with recent models of STAT1 structure and function in response to phosphorylation. This altered-binding preference was further tested by use of the inhibitor LLL3, which we show to disrupt STAT1 binding in a sequence-dependent fashion. To determine if this sequence-dependence is specific to STAT1 and not a general feature of human TF biology, the TF Myc/Max was analysed and tested with the inhibitor Mycro3. Myc/Max inhibition by Mycro3 is sequence independent, suggesting that the sequence-dependent inhibition of STAT1 may be specific to this system and a useful target for future inhibitor design. PMID:23180800
A systematic molecular dynamics study of nearest-neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA

PubMed Central

Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David; Bishop, Thomas C.; Case, David A.; Cheatham, Thomas; Dixit, Surjit; Jayaram, B.; Lankas, Filip; Laughton, Charles; Maddocks, John H.; Michon, Alexis; Osman, Roman; Orozco, Modesto; Perez, Alberto; Singh, Tanya; Spackova, Nada; Sponer, Jiri

2010-01-01

It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA. PMID:19850719
Evaluation of point mutations in dystrophin gene in Iranian Duchenne and Becker muscular dystrophy patients: introducing three novel variants.

PubMed

Haghshenas, Maryam; Akbari, Mohammad Taghi; Karizi, Shohreh Zare; Deilamani, Faravareh Khordadpoor; Nafissi, Shahriar; Salehi, Zivar

2016-06-01

Duchenne and Becker muscular dystrophies (DMD and BMD) are X-linked neuromuscular diseases characterized by progressive muscular weakness and degeneration of skeletal muscles. Approximately two-thirds of the patients have large deletions or duplications in the dystrophin gene and the remaining one-third have point mutations. This study was performed to evaluate point mutations in Iranian DMD/BMD male patients. A total of 29 DNA samples from patients who did not show any large deletion/duplication mutations following multiplex polymerase chain reaction (PCR) and multiplex ligation-dependent probe amplification (MLPA) screening were sequenced for detection of point mutations in exons 50-79. Also exon 44 was sequenced in one sample in which a false positive deletion was detected by MLPA method. Cycle sequencing revealed four nonsense, one frameshift and two splice site mutations as well as two missense variants.
eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

PubMed Central

2014-01-01

Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

PubMed

Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

2014-03-05

RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
Overcoming Sequence Misalignments with Weighted Structural Superposition

PubMed Central

Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.

2012-01-01

An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs. PMID:22733542
Sibutramine-induced anorexia: potent, dose-dependent and behaviourally-selective profile in male rats.

PubMed

Tallett, A J; Blundell, J E; Rodgers, R J

2009-03-17

The serotonin and noradrenaline reuptake inhibitor sibutramine has been licensed as an anti-obesity treatment for more than a decade. However, while inhibitory effects on food intake and weight gain are well documented, surprisingly little published detail exists regarding its influence on feeding and related behaviours. The present study was therefore designed to assess the effects of acute sibutramine treatment on food intake, the behavioural satiety sequence (BSS) and post-treatment weight gain. Subjects were 10 non-deprived adult male Lister hooded rats, tested with 0.5-3.0 mg/kg sibutramine hydrochloride during 1-h DVD-recorded test sessions with palatable mash. Our results show that sibutramine dose-dependently reduced food intake, an effect significant at all doses tested. Ethological analysis revealed very few behavioural effects, except for a dose-dependent reduction in time spent feeding and an increase in the frequency of resting. Behavioural specificity was further supported by time-bin analysis which confirmed both the structural integrity and dose-dependent acceleration of the BSS. Single dosing with sibutramine (3.0 mg/kg) also suppressed daily weight gain over the 24-72 h period post-dosing. Current data support the conclusion that the acute anorectic and weight loss efficacy of sibutramine in adult male rats is not secondary to behavioural disruption but, instead, is due largely to an acceleration in behavioural satiety.
Effect of Sequence Blockiness on the Morphologies of Surface-grafted Elastin-like Polypeptides

NASA Astrophysics Data System (ADS)

Albert, Julie; Sintavanon, Kornkanok; Mays, Robin; MacEwan, Sarah; Chilkoti, Ashutosh; Genzer, Jan

2014-03-01

The inter- and intra- molecular interactions among monomeric units of copolymers and polypeptides depend strongly on monomer sequence distribution and dictate the phase behavior of these species both in solution and on surfaces. To study the relationship between sequence and phase behavior, we have designed a series of elastin-like polypeptides (ELPs) with controlled monomer sequences that mimic copolymers with various co-monomer sequence distributions and attached them covalently to silicon substrates from buffer solutions at temperatures below and above the bulk ELPs' lower critical solution temperatures (LCSTs). The dependence of ELP grafting density on solution temperature was examined by ellipsometry and the resultant surface morphologies were examined in air and under water with atomic force microscopy. Depositions performed above the LCST resulted in higher grafting densities and greater surface roughness of ELPs relative to depositions carried out below the LCST. In addition, we are using gradient substrates to examine the effect of ELP grafting density on temperature responsiveness.
Effects of stacking sequence on impact damage resistance and residual strength for quasi-isotropic laminates

NASA Technical Reports Server (NTRS)

Dost, Ernest F.; Ilcewicz, Larry B.; Avery, William B.; Coxon, Brian R.

1991-01-01

Residual strength of an impacted composite laminate is dependent on details of the damage state. Stacking sequence was varied to judge its effect on damage caused by low-velocity impact. This was done for quasi-isotropic layups of a toughened composite material. Experimental observations on changes in the impact damage state and postimpact compressive performance were presented for seven different laminate stacking sequences. The applicability and limitations of analysis compared to experimental results were also discussed. Postimpact compressive behavior was found to be a strong function of the laminate stacking sequence. This relationship was found to depend on thickness, stacking sequence, size, and location of sublaminates that comprise the impact damage state. The postimpact strength for specimens with a relatively symmetric distribution of damage through the laminate thickness was accurately predicted by models that accounted for sublaminate stability and in-plane stress redistribution. An asymmetric distribution of damage in some laminate stacking sequences tended to alter specimen stability. Geometrically nonlinear finite element analysis was used to predict this behavior.
Rotation sequence to report humerothoracic kinematics during 3D motion involving large horizontal component: application to the tennis forehand drive.

PubMed

Creveaux, Thomas; Sevrez, Violaine; Dumas, Raphaël; Chèze, Laurence; Rogowski, Isabelle

2018-03-01

The aim of this study was to examine the respective aptitudes of three rotation sequences (Y t X f 'Y h '', Z t X f 'Y h '', and X t Z f 'Y h '') to effectively describe the orientation of the humerus relative to the thorax during a movement involving a large horizontal abduction/adduction component: the tennis forehand drive. An optoelectronic system was used to record the movements of eight elite male players, each performing ten forehand drives. The occurrences of gimbal lock, phase angle discontinuity and incoherency in the time course of the three angles defining humerothoracic rotation were examined for each rotation sequence. Our results demonstrated that no single sequence effectively describes humerothoracic motion without discontinuities throughout the forehand motion. The humerothoracic joint angles can nevertheless be described without singularities when considering the backswing/forward-swing and the follow-through phases separately. Our findings stress that the sequence choice may have implications for the report and interpretation of 3D joint kinematics during large shoulder range of motion. Consequently, the use of Euler/Cardan angles to represent 3D orientation of the humerothoracic joint in sport tasks requires the evaluation of the rotation sequence regarding singularity occurrence before analysing the kinematic data, especially when the task involves a large shoulder range of motion in the horizontal plane.
Input dependent cell assembly dynamics in a model of the striatal medium spiny neuron network.

PubMed

Ponzi, Adam; Wickens, Jeff

2012-01-01

The striatal medium spiny neuron (MSN) network is sparsely connected with fairly weak GABAergic collaterals receiving an excitatory glutamatergic cortical projection. Peri-stimulus time histograms (PSTH) of MSN population response investigated in various experimental studies display strong firing rate modulations distributed throughout behavioral task epochs. In previous work we have shown by numerical simulation that sparse random networks of inhibitory spiking neurons with characteristics appropriate for UP state MSNs form cell assemblies which fire together coherently in sequences on long behaviorally relevant timescales when the network receives a fixed pattern of constant input excitation. Here we first extend that model to the case where cortical excitation is composed of many independent noisy Poisson processes and demonstrate that cell assembly dynamics is still observed when the input is sufficiently weak. However if cortical excitation strength is increased more regularly firing and completely quiescent cells are found, which depend on the cortical stimulation. Subsequently we further extend previous work to consider what happens when the excitatory input varies as it would when the animal is engaged in behavior. We investigate how sudden switches in excitation interact with network generated patterned activity. We show that sequences of cell assembly activations can be locked to the excitatory input sequence and outline the range of parameters where this behavior is shown. Model cell population PSTH display both stimulus and temporal specificity, with large population firing rate modulations locked to elapsed time from task events. Thus the random network can generate a large diversity of temporally evolving stimulus dependent responses even though the input is fixed between switches. We suggest the MSN network is well suited to the generation of such slow coherent task dependent response which could be utilized by the animal in behavior.
Input Dependent Cell Assembly Dynamics in a Model of the Striatal Medium Spiny Neuron Network

PubMed Central

Ponzi, Adam; Wickens, Jeff

2012-01-01

The striatal medium spiny neuron (MSN) network is sparsely connected with fairly weak GABAergic collaterals receiving an excitatory glutamatergic cortical projection. Peri-stimulus time histograms (PSTH) of MSN population response investigated in various experimental studies display strong firing rate modulations distributed throughout behavioral task epochs. In previous work we have shown by numerical simulation that sparse random networks of inhibitory spiking neurons with characteristics appropriate for UP state MSNs form cell assemblies which fire together coherently in sequences on long behaviorally relevant timescales when the network receives a fixed pattern of constant input excitation. Here we first extend that model to the case where cortical excitation is composed of many independent noisy Poisson processes and demonstrate that cell assembly dynamics is still observed when the input is sufficiently weak. However if cortical excitation strength is increased more regularly firing and completely quiescent cells are found, which depend on the cortical stimulation. Subsequently we further extend previous work to consider what happens when the excitatory input varies as it would when the animal is engaged in behavior. We investigate how sudden switches in excitation interact with network generated patterned activity. We show that sequences of cell assembly activations can be locked to the excitatory input sequence and outline the range of parameters where this behavior is shown. Model cell population PSTH display both stimulus and temporal specificity, with large population firing rate modulations locked to elapsed time from task events. Thus the random network can generate a large diversity of temporally evolving stimulus dependent responses even though the input is fixed between switches. We suggest the MSN network is well suited to the generation of such slow coherent task dependent response which could be utilized by the animal in behavior. PMID:22438838
Disruption of Boundary Encoding During Sensorimotor Sequence Learning: An MEG Study.

PubMed

Michail, Georgios; Nikulin, Vadim V; Curio, Gabriel; Maess, Burkhard; Herrojo Ruiz, María

2018-01-01

Music performance relies on the ability to learn and execute actions and their associated sounds. The process of learning these auditory-motor contingencies depends on the proper encoding of the serial order of the actions and sounds. Among the different serial positions of a behavioral sequence, the first and last (boundary) elements are particularly relevant. Animal and patient studies have demonstrated a specific neural representation for boundary elements in prefrontal cortical regions and in the basal ganglia, highlighting the relevance of their proper encoding. The neural mechanisms underlying the encoding of sequence boundaries in the general human population remain, however, largely unknown. In this study, we examined how alterations of auditory feedback, introduced at different ordinal positions (boundary or within-sequence element), affect the neural and behavioral responses during sensorimotor sequence learning. Analysing the neuromagnetic signals from 20 participants while they performed short piano sequences under the occasional effect of altered feedback (AF), we found that at around 150-200 ms post-keystroke, the neural activities in the dorsolateral prefrontal cortex (DLPFC) and supplementary motor area (SMA) were dissociated for boundary and within-sequence elements. Furthermore, the behavioral data demonstrated that feedback alterations on boundaries led to greater performance costs, such as more errors in the subsequent keystrokes. These findings jointly support the idea that the proper encoding of boundaries is critical in acquiring sensorimotor sequences. They also provide evidence for the involvement of a distinct neural circuitry in humans including prefrontal and higher-order motor areas during the encoding of the different classes of serial order.
Ages of intermediate-age Magellanic Cloud star clusters

NASA Technical Reports Server (NTRS)

Flower, P. J.

1984-01-01

Ages of intermediate-age Large Magellanic Cloud star clusters have been estimated without locating the faint, unevolved portion of cluster main sequences. Six clusters with established color-magnitude diagrams were selected for study: SL 868, NGC 1783, NGC 1868, NGC 2121, NGC 2209, and NGC 2231. Since red giant photometry is more accurate than the necessarily fainter main-sequence photometry, the distributions of red giants on the cluster color-magnitude diagrams were compared to a grid of 33 stellar evolutionary tracks, evolved from the main sequence through core-helium exhaustion, spanning the expected mass and metallicity range for Magellanic Cloud cluster red giants. The time-dependent behavior of the luminosity of the model red giants was used to estimate cluster ages from the observed cluster red giant luminosities. Except for the possibility of SL 868 being an old globular cluster, all clusters studied were found to have ages less than 10 to the 9th yr. It is concluded that there is currently no substantial evidence for a major cluster population of large, populous clusters greater than 10 to the 9th yr old in the Large Magellanic Cloud.
Domain-specific learning of grammatical structure in musical and phonological sequences.

PubMed

Bly, Benjamin Martin; Carrión, Ricardo E; Rasch, Björn

2009-01-01

Artificial grammar learning depends on acquisition of abstract structural representations rather than domain-specific representational constraints, or so many studies tell us. Using an artificial grammar task, we compared learning performance in two stimulus domains in which respondents have differing tacit prior knowledge. We found that despite grammatically identical sequence structures, learning was better for harmonically related chord sequences than for letter name sequences or harmonically unrelated chord sequences. We also found transfer effects within the musical and letter name tasks, but not across the domains. We conclude that knowledge acquired in implicit learning depends not only on abstract features of structured stimuli, but that the learning of regularities is in some respects domain-specific and strongly linked to particular features of the stimulus domain.
RNA interference inhibits herpes simplex virus type 1 isolated from saliva samples and mucocutaneous lesions.

PubMed

Silva, Amanda Perse da; Lopes, Juliana Freitas; Paula, Vanessa Salete de

2014-01-01

The aim of this study was to evaluate the use of RNA interference to inhibit herpes simplex virus type-1 replication in vitro. For herpes simplex virus type-1 gene silencing, three different small interfering RNAs (siRNAs) targeting the herpes simplex virus type-1 UL39 gene (sequence si-UL 39-1, si-UL 39-2, and si-UL 39-3) were used, which encode the large subunit of ribonucleotide reductase, an essential enzyme for DNA synthesis. Herpes simplex virus type-1 was isolated from saliva samples and mucocutaneous lesions from infected patients. All mucocutaneous lesions' samples were positive for herpes simplex virus type-1 by real-time PCR and by virus isolation; all herpes simplex virus type-1 from saliva samples were positive by real-time PCR and 50% were positive by virus isolation. The levels of herpes simplex virus type-1 DNA remaining after siRNA treatment were assessed by real-time PCR, whose results demonstrated that the effect of siRNAs on gene expression depends on siRNA concentration. The three siRNA sequences used were able to inhibit viral replication, assessed by real-time PCR and plaque assays and among them, the sequence si-UL 39-1 was the most effective. This sequence inhibited 99% of herpes simplex virus type-1 replication. The results demonstrate that silencing herpes simplex virus type-1 UL39 expression by siRNAs effectively inhibits herpes simplex virus type-1 replication, suggesting that siRNA based antiviral strategy may be a potential therapeutic alternative. Copyright © 2014. Published by Elsevier Editora Ltda.
Implications of Secondary Aftershocks for Failure Processes

NASA Astrophysics Data System (ADS)

Gross, S. J.

2001-12-01

When a seismic sequence with more than one mainshock or an unusually large aftershock occurs, there is a compound aftershock sequence. The secondary aftershocks need not have exactly the same decay as the primary sequence, with the differences having implications for the failure process. When the stress step from the secondary mainshock is positive but not large enough to cause immediate failure of all the remaining primary aftershocks, failure processes which involve accelerating slip will produce secondary aftershocks that decay more rapidly than primary aftershocks. This is because the primary aftershocks are an accelerated version of the background seismicity, and secondary aftershocks are an accelerated version of the primary aftershocks. Real stress perturbations may be negative, and heterogeneities in mainshock stress fields mean that the real world situation is quite complicated. I will first describe and verify my picture of secondary aftershock decay with reference to a simple numerical model of slipping faults which obeys rate and state dependent friction and lacks stress heterogeneity. With such a model, it is possible to generate secondary aftershock sequences with perturbed decay patterns, quantify those patterns, and develop an analysis technique capable of correcting for the effect in real data. The secondary aftershocks are defined in terms of frequency linearized time s(T), which is equal to the number of primary aftershocks expected by a time T, $ s ≡ ∫ t=0T n(t) dt, where the start time t=0 is the time of the primary aftershock, and the primary aftershock decay function n(t) is extrapolated forward to the times of the secondary aftershocks. In the absence of secondary sequences the function s(T)$ re-scales the time so that approximately one event occurs per new time unit; the aftershock sequence is gone. If this rescaling is applied in the presence of a secondary sequence, the secondary sequence is shaped like a primary aftershock sequence, and can be fit by the same modeling techniques applied to simple sequences. The later part of the presentation will concern the decay of Hector Mine aftershocks as influenced by the Landers aftershocks. Although attempts to predict the abundance of Hector aftershocks based on stress overlap analysis are not very successful, the analysis does do a good job fitting the decay of secondary sequences.

A rapid NGS strategy for comprehensive molecular diagnosis of Birt-Hogg-Dubé syndrome in patients with primary spontaneous pneumothorax.

PubMed

Zhang, Xinxin; Ma, Dehua; Zou, Wei; Ding, Yibing; Zhu, Chengchu; Min, Haiyan; Zhang, Bin; Wang, Wei; Chen, Baofu; Ye, Minhua; Cai, Minghui; Pan, Yanqing; Cao, Lei; Wan, Yueming; Jin, Yu; Gao, Qian; Yi, Long

2016-05-27

Primary spontaneous pneumothorax (PSP) or pulmonary cysts is one of the manifestations of Birt-Hogg-Dube syndrome (BHDS) that is caused by heterozygous mutations in FLCN gene. Most of the mutations are SNVs and small indels, and there are also approximately 10 % large intragenic deletions and duplications of the mutations. These molecular findings are generally obtained by disparate methods including Sanger sequencing and Multiple Ligation-dependent Probe Amplification in the clinical laboratory. In addition, as a genetically heterogeneous disorder, PSP may be caused by mutations in multiple genes include FBN1, COL3A1, CBS, SERPINA1 and TSC1/TSC2 genes. For differential diagnosis, these genes should also be screened which makes the diagnostic procedure more time-consuming and labor-intensive. Forty PSP patients were divided into 2 groups. Nineteen patients with different pathogenic mutations of FLCN previously identified by conventional Sanger sequencing and MLPA were included in test group, 21 random PSP patients without any genetic screening were included in blinded sample group. 7 PSP genes including FLCN, FBN1, COL3A1, CBS, SERPINA1 and TSC1/TSC2 were designed and enriched by Haloplex system, sequenced on a Miseq platform and analyzed in the 40 patients to evaluate the performance of the targeted-NGS method. We demonstrated that the full spectrum of genes associated with pneumothorax including FLCN gene mutations can be identified simultaneously in multiplexed sequence data. Noteworthy, by our in-house copy number analysis of the sequence data, we could not only detect intragenic deletions, but also determine approximate deletion junctions simultaneously. NGS based Haloplex target enrichment technology is proved to be a rapid and cost-effective screening strategy for the comprehensive molecular diagnosis of BHDS in PSP patients, as it can replace Sanger sequencing and MLPA by simultaneously detecting exonic and intronic SNVs, small indels, large intragenic deletions and determining deletion junctions in PSP-related genes.
Evolutionary advantage via common action of recombination and neutrality

NASA Astrophysics Data System (ADS)

Saakian, David B.; Hu, Chin-Kun

2013-11-01

We investigate evolution models with recombination and neutrality. We consider the Crow-Kimura (parallel) mutation-selection model with the neutral fitness landscape, in which there is a central peak with high fitness A, and some of 1-point mutants have the same high fitness A, while the fitness of other sequences is 0. We find that the effect of recombination and neutrality depends on the concrete version of both neutrality and recombination. We consider three versions of neutrality: (a) all the nearest neighbor sequences of the peak sequence have the same high fitness A; (b) all the l-point mutations in a piece of genome of length l≥1 are neutral; (c) the neutral sequences are randomly distributed among the nearest neighbors of the peak sequences. We also consider three versions of recombination: (I) the simple horizontal gene transfer (HGT) of one nucleotide; (II) the exchange of a piece of genome of length l, HGT-l; (III) two-point crossover recombination (2CR). For the case of (a), the 2CR gives a rather strong contribution to the mean fitness, much stronger than that of HGT for a large genome length L. For the random distribution of neutral sequences there is a critical degree of neutrality νc, and for μ<μc and (μc-μ) is not large, the 2CR suppresses the mean fitness while HGT increases it; for ν much larger than νc, the 2CR and HGT-l increase the mean fitness larger than that of the HGT. We also consider the recombination in the case of smooth fitness landscapes. The recombination gives some advantage in the evolutionary dynamics, where recombination distinguishes clearly the mean-field-like evolutionary factors from the fluctuation-like ones. By contrast, mutations affect the mean-field-like and fluctuation-like factors similarly. Consequently, recombination can accelerate the non-mean-field (fluctuation) type dynamics without considerably affecting the mean-field-like factors.
Semantic orchestration of image processing services for environmental analysis

NASA Astrophysics Data System (ADS)

Ranisavljević, Élisabeth; Devin, Florent; Laffly, Dominique; Le Nir, Yannick

2013-09-01

In order to analyze environmental dynamics, a major process is the classification of the different phenomena of the site (e.g. ice and snow for a glacier). When using in situ pictures, this classification requires data pre-processing. Not all the pictures need the same sequence of processes depending on the disturbances. Until now, these sequences have been done manually, which restricts the processing of large amount of data. In this paper, we present how to realize a semantic orchestration to automate the sequencing for the analysis. It combines two advantages: solving the problem of the amount of processing, and diversifying the possibilities in the data processing. We define a BPEL description to express the sequences. This BPEL uses some web services to run the data processing. Each web service is semantically annotated using an ontology of image processing. The dynamic modification of the BPEL is done using SPARQL queries on these annotated web services. The results obtained by a prototype implementing this method validate the construction of the different workflows that can be applied to a large number of pictures.
HMM-ModE: implementation, benchmarking and validation with HMMER3

PubMed Central

2014-01-01

Background HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. Results The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. Conclusions The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families. PMID:25073805
Effects of methylation-sensitive enzymes on the enrichment of genic SNPs and the degree of genome complexity reduction in a two-enzyme genotyping-by-sequencing (GBS) approach: a case study in oil palm (Elaeis guineensis).

PubMed

Pootakham, Wirulda; Sonthirod, Chutima; Naktang, Chaiwat; Jomchai, Nukoon; Sangsrakru, Duangjai; Tangphatsornruang, Sithichoke

2016-01-01

Advances in next generation sequencing have facilitated a large-scale single nucleotide polymorphism (SNP) discovery in many crop species. Genotyping-by-sequencing (GBS) approach couples next generation sequencing with genome complexity reduction techniques to simultaneously identify and genotype SNPs. Choice of enzymes used in GBS library preparation depends on several factors including the number of markers required, the desired level of multiplexing, and whether the enrichment of genic SNP is preferred. We evaluated various combinations of methylation-sensitive ( Aat II, Pst I, Msp I) and methylation-insensitive ( Sph I, Mse I) enzymes for their effectiveness in genome complexity reduction and enrichment of genic SNPs. We discovered that the use of two methylation-sensitive enzymes effectively reduced genome complexity and did not require a size selection step. On the contrary, the genome coverage of libraries constructed with methylation-insensitive enzymes was quite high, and the additional size selection step may be required to increase the overall read depth. We also demonstrated the effectiveness of methylation-sensitive enzymes in enriching for SNPs located in genic regions. When two methylation-insensitive enzymes were used, only 16% of SNPs identified were located in genes and 18% in the vicinity (± 5 kb) of the genic regions, while most SNPs resided in the intergenic regions. In contrast, a remarkable degree of enrichment was observed when two methylation-sensitive enzymes were employed. Almost two thirds of the SNPs were located either inside (32-36%) or in the vicinity (28-31%) of the genic regions. These results provide useful information to help researchers choose appropriate GBS enzymes in oil palm and other crop species.
A multislice gradient echo pulse sequence for CEST imaging.

PubMed

Dixon, W Thomas; Hancu, Ileana; Ratnakar, S James; Sherry, A Dean; Lenkinski, Robert E; Alsop, David C

2010-01-01

Chemical exchange-dependent saturation transfer and paramagnetic chemical exchange-dependent saturation transfer are agent-mediated contrast mechanisms that depend on saturating spins at the resonant frequency of the exchangeable protons on the agent, thereby indirectly saturating the bulk water. In general, longer saturating pulses produce stronger chemical and paramagnetic exchange-dependent saturation transfer effects, with returns diminishing for pulses longer than T1. This could make imaging slow, so one approach to chemical exchange-dependent saturation transfer imaging has been to follow a long, frequency-selective saturation period by a fast imaging method. A new approach is to insert a short frequency-selective saturation pulse before each spatially selective observation pulse in a standard, two-dimensional, gradient-echo pulse sequence. Being much less than T1 apart, the saturation pulses have a cumulative effect. Interleaved, multislice imaging is straightforward. Observation pulses directed at one slice did not produce observable, unintended chemical exchange-dependent saturation transfer effects in another slice. Pulse repetition time and signal-to noise ratio increase in the normal way as more slices are imaged simultaneously. Copyright (c) 2009 Wiley-Liss, Inc.
Mapping DNA polymerase errors by single-molecule sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, David F.; Lu, Jenny; Chang, Seungwoo

Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
Mapping DNA polymerase errors by single-molecule sequencing

DOE PAGES

Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...

2016-05-16

Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
A "turn-on" fluorescent copper biosensor based on DNA cleavage-dependent graphene-quenched DNAzyme.

PubMed

Liu, Meng; Zhao, Huimin; Chen, Shuo; Yu, Hongtao; Zhang, Yaobin; Quan, Xie

2011-06-15

A novel and promising "turn-on" fluorescent Cu(2+) biosensor is designed based on graphene-DNAzyme catalytic beacon. Due to the essential surface and quenching properties of two-dimensional graphene, it can function as both "scaffold" and "quencher" of the Cu(2+)-dependent DNAzyme, facilitating the formation of self-assembled graphene-quenched DNAzyme complex. However, Cu(2+)-induced catalytic reaction disturbs the graphene-DNAzyme conformation, which will produce internal DNA cleavage-dependent effect. In this case, the quenched fluorescence in graphene-DNAzyme is quickly recovered to a large extent in 15 min. Compared with common DNAzyme-based sensors, the presented graphene-based catalytic beacon greatly improves the signal-to-background ratio, hence increasing the sensitivity (LOD=0.365 nM). Furthermore, the controllable DNA cleavage reaction provides an original and alternative internal method to regulate the interaction between graphene and DNA relative to the previous external sequence-specific hybridization-dependent regulation, which will open new opportunities for nucleic studies and sensing applications in the future. Copyright © 2011 Elsevier B.V. All rights reserved.
Control of neuronal excitability by Group I metabotropic glutamate receptors.

PubMed

Correa, Ana Maria Bernal; Guimarães, Jennifer Diniz Soares; Dos Santos E Alhadas, Everton; Kushmerick, Christopher

2017-10-01

Metabotropic glutamate (mGlu) receptors couple through G proteins to regulate a large number of cell functions. Eight mGlu receptor isoforms have been cloned and classified into three Groups based on sequence, signal transduction mechanisms and pharmacology. This review will focus on Group I mGlu receptors, comprising the isoforms mGlu 1 and mGlu 5 . Activation of these receptors initiates both G protein-dependent and -independent signal transduction pathways. The G-protein-dependent pathway involves mainly Gα q , which can activate PLCβ, leading initially to the formation of IP 3 and diacylglycerol. IP 3 can release Ca 2+ from cellular stores resulting in activation of Ca 2+ -dependent ion channels. Intracellular Ca 2+ , together with diacylglycerol, activates PKC, which has many protein targets, including ion channels. Thus, activation of the G-protein-dependent pathway affects cellular excitability though several different effectors. In parallel, G protein-independent pathways lead to activation of non-selective cationic currents and metabotropic synaptic currents and potentials. Here, we provide a survey of the membrane transport proteins responsible for these electrical effects of Group I metabotropic glutamate receptors.
A New Method for Setting Calculation Sequence of Directional Relay Protection in Multi-Loop Networks

NASA Astrophysics Data System (ADS)

Haijun, Xiong; Qi, Zhang

2016-08-01

Workload of relay protection setting calculation in multi-loop networks may be reduced effectively by optimization setting calculation sequences. A new method of setting calculation sequences of directional distance relay protection in multi-loop networks based on minimum broken nodes cost vector (MBNCV) was proposed to solve the problem experienced in current methods. Existing methods based on minimum breakpoint set (MBPS) lead to more break edges when untying the loops in dependent relationships of relays leading to possibly more iterative calculation workloads in setting calculations. A model driven approach based on behavior trees (BT) was presented to improve adaptability of similar problems. After extending the BT model by adding real-time system characters, timed BT was derived and the dependency relationship in multi-loop networks was then modeled. The model was translated into communication sequence process (CSP) models and an optimization setting calculation sequence in multi-loop networks was finally calculated by tools. A 5-nodes multi-loop network was applied as an example to demonstrate effectiveness of the modeling and calculation method. Several examples were then calculated with results indicating the method effectively reduces the number of forced broken edges for protection setting calculation in multi-loop networks.
"FLIPSY"—A New Solvent-Suppression Sequence for Nonexchanging Solutes Offering Improved Integral Accuracy Relative to 1D NOESY

NASA Astrophysics Data System (ADS)

Neuhaus, David; Ismail, Ismail M.; Chung, Chun-Wa

A new method of solvent suppression is described, based on presaturation in combination with volume selection; the name "FLIPSY" is proposed for this sequence. A low-flip-angle pulse is used for excitation, immediately followed by two 180° pulses, each of which is independently phase cycled through Exorcycle. The phase-cycled inversion pulses achieve volume selection in a way similar to the widely used 1D NOESY sequence, thereby largely eliminating any residual "hump" signal from the solvent. The two 180° pulses combine to produce a net 360° rotation for zmagnetization and either a 180° or a 360° rotation for transverse magnetization, depending on the step in the phase cycle. This allows the overall flip angle of the sequence to be controlled by adjusting the length of the initial excitation pulse. It is demonstrated that this property allows one to choose freely a suitable compromise between signal strength and integral accuracy when using FLIPSY, just as when using single-pulse excitation. Such a choice cannot be made when using 1D NOESY, since the effective flip angle in that experiment is always 90°. The application of FLIPSY to recording LC-NMR spectra is demonstrated.
Dissociable effects of practice variability on learning motor and timing skills.

PubMed

Caramiaux, Baptiste; Bevilacqua, Frédéric; Wanderley, Marcelo M; Palmer, Caroline

2018-01-01

Motor skill acquisition inherently depends on the way one practices the motor task. The amount of motor task variability during practice has been shown to foster transfer of the learned skill to other similar motor tasks. In addition, variability in a learning schedule, in which a task and its variations are interweaved during practice, has been shown to help the transfer of learning in motor skill acquisition. However, there is little evidence on how motor task variations and variability schedules during practice act on the acquisition of complex motor skills such as music performance, in which a performer learns both the right movements (motor skill) and the right time to perform them (timing skill). This study investigated the impact of rate (tempo) variability and the schedule of tempo change during practice on timing and motor skill acquisition. Complete novices, with no musical training, practiced a simple musical sequence on a piano keyboard at different rates. Each novice was assigned to one of four learning conditions designed to manipulate the amount of tempo variability across trials (large or small tempo set) and the schedule of tempo change (randomized or non-randomized order) during practice. At test, the novices performed the same musical sequence at a familiar tempo and at novel tempi (testing tempo transfer), as well as two novel (but related) sequences at a familiar tempo (testing spatial transfer). We found that practice conditions had little effect on learning and transfer performance of timing skill. Interestingly, practice conditions influenced motor skill learning (reduction of movement variability): lower temporal variability during practice facilitated transfer to new tempi and new sequences; non-randomized learning schedule improved transfer to new tempi and new sequences. Tempo (rate) and the sequence difficulty (spatial manipulation) affected performance variability in both timing and movement. These findings suggest that there is a dissociable effect of practice variability on learning complex skills that involve both motor and timing constraints.
A novel hybrid genetic algorithm to solve the make-to-order sequence-dependent flow-shop scheduling problem

NASA Astrophysics Data System (ADS)

Mirabi, Mohammad; Fatemi Ghomi, S. M. T.; Jolai, F.

2014-04-01

Flow-shop scheduling problem (FSP) deals with the scheduling of a set of n jobs that visit a set of m machines in the same order. As the FSP is NP-hard, there is no efficient algorithm to reach the optimal solution of the problem. To minimize the holding, delay and setup costs of large permutation flow-shop scheduling problems with sequence-dependent setup times on each machine, this paper develops a novel hybrid genetic algorithm (HGA) with three genetic operators. Proposed HGA applies a modified approach to generate a pool of initial solutions, and also uses an improved heuristic called the iterated swap procedure to improve the initial solutions. We consider the make-to-order production approach that some sequences between jobs are assumed as tabu based on maximum allowable setup cost. In addition, the results are compared to some recently developed heuristics and computational experimental results show that the proposed HGA performs very competitively with respect to accuracy and efficiency of solution.
msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.

PubMed

Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James

2018-02-01

Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).
SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

2002-01-01

Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less
Age Spreads and the Temperature Dependence of Age Estimates in Upper Sco

NASA Astrophysics Data System (ADS)

Fang, Qiliang; Herczeg, Gregory J.; Rizzuto, Aaron

2017-06-01

Past estimates for the age of the Upper Sco Association are typically 11–13 Myr for intermediate-mass stars and 4–5 Myr for low-mass stars. In this study, we simulate populations of young stars to investigate whether this apparent dependence of estimated age on spectral type may be explained by the star formation history of the association. Solar and intermediate mass stars begin their pre-main sequence evolution on the Hayashi track, with fully convective interiors and cool photospheres. Intermediate-mass stars quickly heat up and transition onto the radiative Henyey track. As a consequence, for clusters in which star formation occurs on a timescale similar to that of the transition from a convective to a radiative interior, discrepancies in ages will arise when ages are calculated as a function of temperature instead of mass. Simple simulations of a cluster with constant star formation over several Myr may explain about half of the difference in inferred ages versus photospheric temperature; speculative constructions that consist of a constant star formation followed by a large supernova-driven burst could fully explain the differences, including those between F and G stars where evolutionary tracks may be more accurate. The age spreads of low-mass stars predicted from these prescriptions for star formation are consistent with the observed luminosity spread of Upper Sco. The conclusion that a lengthy star formation history will yield a temperature dependence in ages is expected from the basic physics of pre-main sequence evolution, and is qualitatively robust to the large uncertainties in pre-main sequence evolutionary models.
Who's for dinner? High-throughput sequencing reveals bat dietary differentiation in a biodiversity hotspot where prey taxonomy is largely undescribed.

PubMed

Burgar, Joanna M; Murray, Daithi C; Craig, Michael D; Haile, James; Houston, Jayne; Stokes, Vicki; Bunce, Michael

2014-08-01

Effective management and conservation of biodiversity requires understanding of predator-prey relationships to ensure the continued existence of both predator and prey populations. Gathering dietary data from predatory species, such as insectivorous bats, often presents logistical challenges, further exacerbated in biodiversity hot spots because prey items are highly speciose, yet their taxonomy is largely undescribed. We used high-throughput sequencing (HTS) and bioinformatic analyses to phylogenetically group DNA sequences into molecular operational taxonomic units (MOTUs) to examine predator-prey dynamics of three sympatric insectivorous bat species in the biodiversity hotspot of south-western Australia. We could only assign between 4% and 20% of MOTUs to known genera or species, depending on the method used, underscoring the importance of examining dietary diversity irrespective of taxonomic knowledge in areas lacking a comprehensive genetic reference database. MOTU analysis confirmed that resource partitioning occurred, with dietary divergence positively related to the ecomorphological divergence of the three bat species. We predicted that bat species' diets would converge during times of high energetic requirements, that is, the maternity season for females and the mating season for males. There was an interactive effect of season on female, but not male, bat species' diets, although small sample sizes may have limited our findings. Contrary to our predictions, females of two ecomorphologically similar species showed dietary convergence during the mating season rather than the maternity season. HTS-based approaches can help elucidate complex predator-prey relationships in highly speciose regions, which should facilitate the conservation of biodiversity in genetically uncharacterized areas, such as biodiversity hotspots. © 2013 John Wiley & Sons Ltd.
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.

PubMed

Fimereli, Danai; Detours, Vincent; Konopka, Tomasz

2013-04-01

High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
A rapid and cost-effective method for sequencing pooled cDNA clones by using a combination of transposon insertion and Gateway technology.

PubMed

Morozumi, Takeya; Toki, Daisuke; Eguchi-Ogawa, Tomoko; Uenishi, Hirohide

2011-09-01

Large-scale cDNA-sequencing projects require an efficient strategy for mass sequencing. Here we describe a method for sequencing pooled cDNA clones using a combination of transposon insertion and Gateway technology. Our method reduces the number of shotgun clones that are unsuitable for reconstruction of cDNA sequences, and has the advantage of reducing the total costs of the sequencing project.

Phosphorylation-dependent mineral-type specificity for apatite-binding peptide sequences.

PubMed

Addison, William N; Miller, Sharon J; Ramaswamy, Janani; Mansouri, Ahmad; Kohn, David H; McKee, Marc D

2010-12-01

Apatite-binding peptides discovered by phage display provide an alternative design method for creating functional biomaterials for bone and tooth tissue repair. A limitation of this approach is the absence of display peptide phosphorylation--a post-translational modification important to mineral-binding proteins. To refine the material specificity of a recently identified apatite-binding peptide, and to determine critical design parameters (net charge, charge distribution, amino acid sequence and composition) controlling peptide affinity for mineral, we investigated the effects of phosphorylation and sequence scrambling on peptide adsorption to four different apatites (bone-like mineral, and three types of apatite containing initially 0, 5.6 and 10.5% carbonate). Phosphorylation of the VTKHLNQISQSY peptide (VTK peptide) led to a 10-fold increase in peptide adsorption (compared to nonphosphorylated peptide) to bone-like mineral, and a 2-fold increase in adsorption to the carbonated apatite, but there was no effect of phosphorylation on peptide affinity to pure hydroxyapatite (without carbonate). Sequence scrambling of the nonphosphorylated VTK peptide enhanced its specificity for the bone-like mineral, but scrambled phosphorylated VTK peptide (pVTK) did not significantly alter mineral-binding suggesting that despite the importance of sequence order and/or charge distribution to mineral-binding, the enhanced binding after phosphorylation exceeds any further enhancement by altered sequence order. Osteoblast culture mineralization was dose-dependently inhibited by pVTK and to a significantly lesser extent by scrambled pVTK, while the nonphosphorylated and scrambled forms had no effect, indicating that inhibition of osteoblast mineralization is dependent on both peptide sequence and charge. Computational modeling of peptide-mineral interactions indicated a favorable change in binding energy upon phosphorylation that was unaffected by scrambling. In conclusion, phosphorylation of serine residues increases peptide specificity for bone-like mineral, whose adsorption is determined primarily by sequence composition and net charge as opposed to sequence order. However, sequence order in addition to net charge modulates the mineralization of osteoblast cultures. The ability of such peptides to inhibit mineralization has potential utility in the management of pathologic calcification. Copyright © 2010 Elsevier Ltd. All rights reserved.
Memory effect in M ≥ 7 earthquakes of Taiwan

NASA Astrophysics Data System (ADS)

Wang, Jeen-Hwa

2014-07-01

The M ≥ 7 earthquakes that occurred in the Taiwan region during 1906-2006 are taken to study the possibility of memory effect existing in the sequence of those large earthquakes. Those events are all mainshocks. The fluctuation analysis technique is applied to analyze two sequences in terms of earthquake magnitude and inter-event time represented in the natural time domain. For both magnitude and inter-event time, the calculations are made for three data sets, i.e., the original order data, the reverse-order data, and that of the mean values. Calculated results show that the exponents of scaling law of fluctuation versus window length are less than 0.5 for the sequences of both magnitude and inter-event time data. In addition, the phase portraits of two sequent magnitudes and two sequent inter-event times are also applied to explore if large (or small) earthquakes are followed by large (or small) events. Results lead to a negative answer. Together with all types of information in study, we make a conclusion that the earthquake sequence in study is short-term corrected and thus the short-term memory effect would be operative.
Cross-correlation patterns in social opinion formation with sequential data

NASA Astrophysics Data System (ADS)

Chakrabarti, Anindya S.

2016-11-01

Recent research on large-scale internet data suggests existence of patterns in the collective behavior of billions of people even though each of them may pursue own activities. In this paper, we interpret online rating activity as a process of forming social opinion about individual items, where people sequentially choose a rating based on the current information set comprising all previous ratings and own preferences. We construct an opinion index from the sequence of ratings and we show that (1) movie-specific opinion converges much slower than an independent and identically distributed (i.i.d.) sequence of ratings, (2) rating sequence for individual movies shows lesser variation compared to an i.i.d. sequence of ratings, (3) the probability density function of the asymptotic opinions has more spread than that defined over opinion arising from i.i.d. sequence of ratings, (4) opinion sequences across movies are correlated with significantly higher and lower correlation compared to opinion constructed from i.i.d. sequence of ratings, creating a bimodal cross-correlation structure. By decomposing the temporal correlation structures from panel data of movie ratings, we show that the social effects are very prominent whereas group effects cannot be differentiated from those of surrogate data and individual effects are quite small. The former explains a large part of extreme positive or negative correlations between sequences of opinions. In general, this method can be applied to any rating data to extract social or group-specific effects in correlation structures. We conclude that in this particular case, social effects are important in opinion formation process.
Improvement of the trace metal composition of medium for nitrite-dependent anaerobic methane oxidation bacteria: Iron (II) and copper (II) make a difference.

PubMed

He, Zhanfei; Geng, Sha; Pan, Yawei; Cai, Chaoyang; Wang, Jiaqi; Wang, Liqiao; Liu, Shuai; Zheng, Ping; Xu, Xinhua; Hu, Baolan

2015-11-15

Nitrite-dependent anaerobic methane oxidation (n-damo) is a potential bioprocess for treating nitrogen-containing wastewater. This process uses methane, an inexpensive and nontoxic end-product of anaerobic digestion, as an external electron donor. However, the low turnover rate and slow growth rate of n-damo functional bacteria limit the practical application of this process. In the present study, the short- and long-term effects of variations in trace metal concentrations on n-damo bacteria were investigated, and the concentrations of trace metal elements of medium were improved. The results were subsequently verified by a group of long-term inoculations (90 days) and were applied in a sequencing batch reactor (SBR) (84 days). The results indicated that iron (Fe(II)) and copper (Cu(II)) (20 and 10 μmol L(-1), respectively) significantly stimulated the activity and the growth of n-damo bacteria, whereas other trace metal elements, including zinc (Zn), molybdenum (Mo), cobalt (Co), manganese (Mn), and nickel (Ni), had no significant effect on n-damo bacteria in the tested concentration ranges. Interestingly, fluorescence in situ hybridization (FISH) showed that a large number of dense, large aggregates (10-50 μm) of n-damo bacteria were formed by cell adhesion in the SBR reactor after using the improved medium, and to our knowledge this is the first discovery of large aggregates of n-damo bacteria. Copyright © 2015 Elsevier Ltd. All rights reserved.
Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches

PubMed Central

Krobath, Heinrich; Chan, Hue Sun

2016-01-01

Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein’s folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit-chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches. PMID:27253392
Helper-Dependent Adenoviral Vectors.

PubMed

Rosewell, Amanda; Vetrini, Francesco; Ng, Philip

2011-10-29

Helper-dependent adenoviral vectors are devoid of all viral coding sequences, possess a large cloning capacity, and can efficiently transduce a wide variety of cell types from various species independent of the cell cycle to mediate long-term transgene expression without chronic toxicity. These non-integrating vectors hold tremendous potential for a variety of gene transfer and gene therapy applications. Here, we review the production technologies, applications, obstacles to clinical translation and their potential resolutions, and the future challenges and unanswered questions regarding this promising gene transfer technology.
Helper-Dependent Adenoviral Vectors

PubMed Central

Rosewell, Amanda; Vetrini, Francesco; Ng, Philip

2012-01-01

Helper-dependent adenoviral vectors are devoid of all viral coding sequences, possess a large cloning capacity, and can efficiently transduce a wide variety of cell types from various species independent of the cell cycle to mediate long-term transgene expression without chronic toxicity. These non-integrating vectors hold tremendous potential for a variety of gene transfer and gene therapy applications. Here, we review the production technologies, applications, obstacles to clinical translation and their potential resolutions, and the future challenges and unanswered questions regarding this promising gene transfer technology. PMID:24533227
Consecutive analysis of mutation spectrum in the dystrophin gene of 507 Korean boys with Duchenne/Becker muscular dystrophy in a single center.

PubMed

Cho, Anna; Seong, Moon-Woo; Lim, Byung Chan; Lee, Hwa Jeen; Byeon, Jung Hye; Kim, Seung Soo; Kim, Soo Yeon; Choi, Sun Ah; Wong, Ai-Lynn; Lee, Jeongho; Kim, Jon Soo; Ryu, Hye Won; Lee, Jin Sook; Kim, Hunmin; Hwang, Hee; Choi, Ji Eun; Kim, Ki Joong; Hwang, Young Seung; Hong, Ki Ho; Park, Seungman; Cho, Sung Im; Lee, Seung Jun; Park, Hyunwoong; Seo, Soo Hyun; Park, Sung Sup; Chae, Jong Hee

2017-05-01

Duchenne and Becker muscular dystrophies (DMD and BMD) are allelic X-linked recessive muscle diseases caused by mutations in the large and complex dystrophin gene. We analyzed the dystrophin gene in 507 Korean DMD/BMD patients by multiple ligation-dependent probe amplification and direct sequencing. Overall, 117 different deletions, 48 duplications, and 90 pathogenic sequence variations, including 30 novel variations, were identified. Deletions and duplications accounted for 65.4% and 13.3% of Korean dystrophinopathy, respectively, suggesting that the incidence of large rearrangements in dystrophin is similar among different ethnic groups. We also detected sequence variations in >100 probands. The small variations were dispersed across the whole gene, and 12.3% were nonsense mutations. Precise genetic characterization in patients with DMD/BMD is timely and important for implementing nationwide registration systems and future molecular therapeutic trials in Korea and globally. Muscle Nerve 55: 727-734, 2017. © 2016 Wiley Periodicals, Inc.
Temporal Order in Periodically Driven Spins in Star-Shaped Clusters

NASA Astrophysics Data System (ADS)

Pal, Soham; Nishad, Naveen; Mahesh, T. S.; Sreejith, G. J.

2018-05-01

We experimentally study the response of star-shaped clusters of initially unentangled N =4 , 10, and 37 nuclear spin-1 /2 moments to an inexact π -pulse sequence and show that an Ising coupling between the center and the satellite spins results in robust period-2 magnetization oscillations. The period is stable against bath effects, but the amplitude decays with a timescale that depends on the inexactness of the pulse. Simulations reveal a semiclassical picture in which the rigidity of the period is due to a randomizing effect of the Larmor precession under the magnetization of surrounding spins. The timescales with stable periodicity increase with net initial magnetization, even in the presence of perturbations, indicating a robust temporal ordered phase for large systems with finite magnetization per spin.
Polyfluorophore Labels on DNA: Dramatic Sequence Dependence of Quenching

PubMed Central

Teo, Yin Nah; Wilson, James N.

2010-01-01

We describe studies carried out in the DNA context to test how a common fluorescence quencher, dabcyl, interacts with oligodeoxynu-cleoside fluorophores (ODFs)—a system of stacked, electronically interacting fluorophores built on a DNA scaffold. We tested twenty different tetrameric ODF sequences containing varied combinations and orderings of pyrene (Y), benzopyrene (B), perylene (E), dimethylaminostilbene (D), and spacer (S) monomers conjugated to the 3′ end of a DNA oligomer. Hybridization of this probe sequence to a dabcyl-labeled complementary strand resulted in strong quenching of fluorescence in 85% of the twenty ODF sequences. The high efficiency of quenching was also established by their large Stern–Volmer constants (KSV) of between 2.1 × 104 and 4.3 × 105M−1, measured with a free dabcyl quencher. Interestingly, quenching of ODFs displayed strong sequence dependence. This was particularly evident in anagrams of ODF sequences; for example, the sequence BYDS had a KSV that was approximately two orders of magnitude greater than that of BSDY, which has the same dye composition. Other anagrams, for example EDSY and ESYD, also displayed different responses upon quenching by dabcyl. Analysis of spectra showed that apparent excimer and exciplex emission bands were quenched with much greater efficiency compared to monomer emission bands by at least an order of magnitude. This suggests an important role played by delocalized excited states of the π stack of fluorophores in the amplified quenching of fluorescence. PMID:19780115
The Effect of Field-Dependence-Independence and Instructional Sequence on the Achievement of High School Biology Students.

ERIC Educational Resources Information Center

Douglass, Claudia B.

The primary purpose of the reported study was to identify a possible interaction between the cognitive style of the students and the instructional sequence of the materials and their combined effect on achievement. The subjects were 627 biology students from six midwestern high schools. The students were ranked and classified as field-dependent…
Mean convergence theorems and weak laws of large numbers for weighted sums of random variables under a condition of weighted integrability

NASA Astrophysics Data System (ADS)

Ordóñez Cabrera, Manuel; Volodin, Andrei I.

2005-05-01

From the classical notion of uniform integrability of a sequence of random variables, a new concept of integrability (called h-integrability) is introduced for an array of random variables, concerning an array of constantsE We prove that this concept is weaker than other previous related notions of integrability, such as Cesàro uniform integrability [Chandra, Sankhya Ser. A 51 (1989) 309-317], uniform integrability concerning the weights [Ordóñez Cabrera, Collect. Math. 45 (1994) 121-132] and Cesàro [alpha]-integrability [Chandra and Goswami, J. Theoret. ProbabE 16 (2003) 655-669]. Under this condition of integrability and appropriate conditions on the array of weights, mean convergence theorems and weak laws of large numbers for weighted sums of an array of random variables are obtained when the random variables are subject to some special kinds of dependence: (a) rowwise pairwise negative dependence, (b) rowwise pairwise non-positive correlation, (c) when the sequence of random variables in every row is [phi]-mixing. Finally, we consider the general weak law of large numbers in the sense of Gut [Statist. Probab. Lett. 14 (1992) 49-52] under this new condition of integrability for a Banach space setting.
Sequence and Structure Dependent DNA-DNA Interactions

NASA Astrophysics Data System (ADS)

Kopchick, Benjamin; Qiu, Xiangyun

Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.
Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently

PubMed Central

Currin, Andrew; Swainston, Neil; Day, Philip J.

2015-01-01

The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the ‘search space’ of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (K d) and catalytic (k cat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving k cat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the ‘best’ amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust. PMID:25503938
Predicting stability of DNA duplexes in solutions containing magnesium and monovalent cations.

PubMed

Owczarzy, Richard; Moreira, Bernardo G; You, Yong; Behlke, Mark A; Walder, Joseph A

2008-05-13

Accurate predictions of DNA stability in physiological and enzyme buffers are important for the design of many biological and biochemical assays. We therefore investigated the effects of magnesium, potassium, sodium, Tris ions, and deoxynucleoside triphosphates on melting profiles of duplex DNA oligomers and collected large melting data sets. An empirical correction function was developed that predicts melting temperatures, transition enthalpies, entropies, and free energies in buffers containing magnesium and monovalent cations. The new correction function significantly improves the accuracy of predictions and accounts for ion concentration, G-C base pair content, and length of the oligonucleotides. The competitive effects of potassium and magnesium ions were characterized. If the concentration ratio of [Mg (2+)] (0.5)/[Mon (+)] is less than 0.22 M (-1/2), monovalent ions (K (+), Na (+)) are dominant. Effects of magnesium ions dominate and determine duplex stability at higher ratios. Typical reaction conditions for PCR and DNA sequencing (1.5-5 mM magnesium and 20-100 mM monovalent cations) fall within this range. Conditions were identified where monovalent and divalent cations compete and their stability effects are more complex. When duplexes denature, some of the Mg (2+) ions associated with the DNA are released. The number of released magnesium ions per phosphate charge is sequence dependent and decreases surprisingly with increasing oligonucleotide length.
A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences

PubMed Central

Zhu, Youding; Fujimura, Kikuo

2010-01-01

This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from a depth image sequence is challenging due to the need to resolve the depth ambiguity caused by self-occlusions and the difficulty to recover from tracking failure. Human body poses could be estimated through model fitting using dense correspondences between depth data and an articulated human model (local optimization method). Although it usually achieves a high accuracy due to dense correspondences, it may fail to recover from tracking failure. Alternately, human pose may be reconstructed by detecting and tracking human body anatomical landmarks (key-points) based on low-level depth image analysis. While this method (key-point based method) is robust and recovers from tracking failure, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian framework for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed approach. PMID:22399933
Protocol matters: which methylome are you actually studying?

PubMed Central

Robinson, Mark D; Statham, Aaron L; Speed, Terence P; Clark, Susan J

2011-01-01

The field of epigenetics is now capitalizing on the vast number of emerging technologies, largely based on second-generation sequencing, which interrogate DNA methylation status and histone modifications genome-wide. However, getting an exhaustive and unbiased view of a methylome at a reasonable cost is proving to be a significant challenge. In this article, we take a closer look at the impact of the DNA sequence and bias effects introduced to datasets by genome-wide DNA methylation technologies and where possible, explore the bioinformatics tools that deconvolve them. There remains much to be learned about the performance of genome-wide technologies, the data we mine from these assays and how it reflects the actual biology. While there are several methods to interrogate the DNA methylation status genome-wide, our opinion is that no single technique suitably covers the minimum criteria of high coverage and, high resolution at a reasonable cost. In fact, the fraction of the methylome that is studied currently depends entirely on the inherent biases of the protocol employed. There is promise for this to change, as the third generation of sequencing technologies is expected to again ‘revolutionize’ the way that we study genomes and epigenomes. PMID:21566704
Q-learning residual analysis: application to the effectiveness of sequences of antipsychotic medications for patients with schizophrenia.

PubMed

Ertefaie, Ashkan; Shortreed, Susan; Chakraborty, Bibhas

2016-06-15

Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

PubMed Central

2009-01-01

Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

PubMed

Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

2009-08-06

Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.

A Novel Partial Sequence Alignment Tool for Finding Large Deletions

PubMed Central

Aruk, Taner; Ustek, Duran; Kursun, Olcay

2012-01-01

Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777
Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.).

PubMed

Dubey, Anuja; Farmer, Andrew; Schlueter, Jessica; Cannon, Steven B; Abernathy, Brian; Tuteja, Reetu; Woodward, Jimmy; Shah, Trushar; Mulasmanovic, Benjamin; Kudapa, Himabindu; Raju, Nikku L; Gothalwal, Ragini; Pande, Suresh; Xiao, Yongli; Town, Chris D; Singh, Nagendra K; May, Gregory D; Jackson, Scott; Varshney, Rajeev K

2011-06-01

This study reports generation of large-scale genomic resources for pigeonpea, a so-called 'orphan crop species' of the semi-arid tropic regions. FLX/454 sequencing carried out on a normalized cDNA pool prepared from 31 tissues produced 494 353 short transcript reads (STRs). Cluster analysis of these STRs, together with 10 817 Sanger ESTs, resulted in a pigeonpea trancriptome assembly (CcTA) comprising of 127 754 tentative unique sequences (TUSs). Functional analysis of these TUSs highlights several active pathways and processes in the sampled tissues. Comparison of the CcTA with the soybean genome showed similarity to 10 857 and 16 367 soybean gene models (depending on alignment methods). Additionally, Illumina 1G sequencing was performed on Fusarium wilt (FW)- and sterility mosaic disease (SMD)-challenged root tissues of 10 resistant and susceptible genotypes. More than 160 million sequence tags were used to identify FW- and SMD-responsive genes. Sequence analysis of CcTA and the Illumina tags identified a large new set of markers for use in genetics and breeding, including 8137 simple sequence repeats, 12 141 single-nucleotide polymorphisms and 5845 intron-spanning regions. Genomic resources developed in this study should be useful for basic and applied research, not only for pigeonpea improvement but also for other related, agronomically important legumes.
Nanothermometer Based on Resonant Tunneling Diodes: From Cryogenic to Room Temperatures.

PubMed

Pfenning, Andreas; Hartmann, Fabian; Rebello Sousa Dias, Mariama; Castelano, Leonardo Kleber; Süßmeier, Christoph; Langer, Fabian; Höfling, Sven; Kamp, Martin; Marques, Gilmar Eugenio; Worschech, Lukas; Lopez-Richard, Victor

2015-06-23

Sensor miniaturization together with broadening temperature sensing range are fundamental challenges in nanothermometry. By exploiting a large temperature-dependent screening effect observed in a resonant tunneling diode in sequence with a GaInNAs/GaAs quantum well, we present a low dimensional, wide range, and high sensitive nanothermometer. This sensor shows a large threshold voltage shift of the bistable switching of more than 4.5 V for a temperature raise from 4.5 to 295 K, with a linear voltage-temperature response of 19.2 mV K(-1), and a temperature uncertainty in the millikelvin (mK) range. Also, when we monitor the electroluminescence emission spectrum, an optical read-out control of the thermometer is provided. The combination of electrical and optical read-outs together with the sensor architecture excel the device as a thermometer with the capability of noninvasive temperature sensing, high local resolution, and sensitivity.
Interactions between genetic variation and cellular environment in skeletal muscle gene expression.

PubMed

Taylor, D Leland; Knowles, David A; Scott, Laura J; Ramirez, Andrea H; Casale, Francesco Paolo; Wolford, Brooke N; Guan, Li; Varshney, Arushi; Albanus, Ricardo D'Oliveira; Parker, Stephen C J; Narisu, Narisu; Chines, Peter S; Erdos, Michael R; Welch, Ryan P; Kinnunen, Leena; Saramies, Jouko; Sundvall, Jouko; Lakka, Timo A; Laakso, Markku; Tuomilehto, Jaakko; Koistinen, Heikki A; Stegle, Oliver; Boehnke, Michael; Birney, Ewan; Collins, Francis S

2018-01-01

From whole organisms to individual cells, responses to environmental conditions are influenced by genetic makeup, where the effect of genetic variation on a trait depends on the environmental context. RNA-sequencing quantifies gene expression as a molecular trait, and is capable of capturing both genetic and environmental effects. In this study, we explore opportunities of using allele-specific expression (ASE) to discover cis-acting genotype-environment interactions (GxE)-genetic effects on gene expression that depend on an environmental condition. Treating 17 common, clinical traits as approximations of the cellular environment of 267 skeletal muscle biopsies, we identify 10 candidate environmental response expression quantitative trait loci (reQTLs) across 6 traits (12 unique gene-environment trait pairs; 10% FDR per trait) including sex, systolic blood pressure, and low-density lipoprotein cholesterol. Although using ASE is in principle a promising approach to detect GxE effects, replication of such signals can be challenging as validation requires harmonization of environmental traits across cohorts and a sufficient sampling of heterozygotes for a transcribed SNP. Comprehensive discovery and replication will require large human transcriptome datasets, or the integration of multiple transcribed SNPs, coupled with standardized clinical phenotyping.
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.

PubMed

Yu, Qiang; Wei, Dingbang; Huo, Hongwei

2018-06-18

Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.
Age of onset and temporal sequencing of lifetime DSM-IV alcohol use disorders relative to comorbid mood and anxiety disorders.

PubMed

Falk, Daniel E; Yi, Hsiao-Ye; Hilton, Michael E

2008-04-01

Understanding the temporal sequencing of alcohol use disorders (AUDs) and comorbid mood and anxiety disorders may help to disentangle the etiological underpinnings of comorbidity. Methodological limitations of previous studies, however, may have led to inconsistent or inconclusive findings. To describe the temporal sequencing of the onset of AUDs relative to the onset of specific comorbid mood and anxiety disorders using a large, nationally representative survey. AUD onset tended to follow the onset of 2 of the 9 mood and anxiety disorders (specific and social phobia). The onset of alcohol abuse tended to precede the onset of 5 of the 9 mood and anxiety disorders (GAD, panic, panic with agoraphobia, major depression, and dysthymia), whereas the onset of alcohol dependence tended to precede the onset of only 2 of the 9 mood and anxiety disorders (GAD and panic). Lag times between primary and subsequent disorders generally ranged from 7 to 16 years. Comorbid individuals whose alcohol dependence came after panic with agoraphobia, hypomania, and GAD had increased risk of persistent alcohol dependence. Alcohol abuse, but not dependence, precedes many mood and anxiety disorders. If the primary disorder does in fact play a causative or contributing role in the development of the subsequent disorder, this role can best be described as "temporally distal." However, in assessing the risk for persistent alcohol dependence, clinicians should not only consider the type of comorbid mood/anxiety disorder, but also the temporal ordering of these disorders.
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

PubMed

Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

PubMed Central

Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
The right inferior frontal gyrus processes nested non-local dependencies in music.

PubMed

Cheung, Vincent K M; Meyer, Lars; Friederici, Angela D; Koelsch, Stefan

2018-02-28

Complex auditory sequences known as music have often been described as hierarchically structured. This permits the existence of non-local dependencies, which relate elements of a sequence beyond their temporal sequential order. Previous studies in music have reported differential activity in the inferior frontal gyrus (IFG) when comparing regular and irregular chord-transitions based on theories in Western tonal harmony. However, it is unclear if the observed activity reflects the interpretation of hierarchical structure as the effects are confounded by local irregularity. Using functional magnetic resonance imaging (fMRI), we found that violations to non-local dependencies in nested sequences of three-tone musical motifs in musicians elicited increased activity in the right IFG. This is in contrast to similar studies in language which typically report the left IFG in processing grammatical syntax. Effects of increasing auditory working demands are moreover reflected by distributed activity in frontal and parietal regions. Our study therefore demonstrates the role of the right IFG in processing non-local dependencies in music, and suggests that hierarchical processing in different cognitive domains relies on similar mechanisms that are subserved by domain-selective neuronal subpopulations.
Vibronic dephasing model for coherent-to-incoherent crossover in DNA

NASA Astrophysics Data System (ADS)

Karasch, Patrick; Ryndyk, Dmitry A.; Frauenheim, Thomas

2018-05-01

In this paper, we investigate the interplay between coherent and incoherent charge transport in cytosine-guanine (GC-) rich DNA molecules. Our objective is to introduce a physically grounded approach to dephasing in large molecules and to understand the length-dependent charge transport characteristics, and especially the crossover from the coherent tunneling to incoherent hopping regime at different temperatures. Therefore, we apply the vibronic dephasing model and compare the results to the Büttiker probe model which is commonly used to describe decoherence effects in charge transport. Using the full ladder model and simplified one-dimensional model of DNA, we consider molecular junctions with alternating and stacked GC sequences and compare our results to recent experimental measurements.
Design of DNA pooling to allow incorporation of covariates in rare variants analysis.

PubMed

Guan, Weihua; Li, Chun

2014-01-01

Rapid advances in next-generation sequencing technologies facilitate genetic association studies of an increasingly wide array of rare variants. To capture the rare or less common variants, a large number of individuals will be needed. However, the cost of a large scale study using whole genome or exome sequencing is still high. DNA pooling can serve as a cost-effective approach, but with a potential limitation that the identity of individual genomes would be lost and therefore individual characteristics and environmental factors could not be adjusted in association analysis, which may result in power loss and a biased estimate of genetic effect. For case-control studies, we propose a design strategy for pool creation and an analysis strategy that allows covariate adjustment, using multiple imputation technique. Simulations show that our approach can obtain reasonable estimate for genotypic effect with only slight loss of power compared to the much more expensive approach of sequencing individual genomes. Our design and analysis strategies enable more powerful and cost-effective sequencing studies of complex diseases, while allowing incorporation of covariate adjustment.
Pitfalls in setting up genetic studies on preeclampsia.

PubMed

Laivuori, Hannele

2013-04-01

This presentation will consider approaches to discover susceptibility genes for a complex genetic disorder such as preeclampsia. The clinical disease presumably results from the additive effects of multiple sequence variants from the mother and the foetus together with environmental factors. Disease heterogeneity and underpowered study designs are likely to be behind non-reproducible results in candidate gene association studies. To avoid spurious findings, sample size and characteristics of the study populations as well as replication studies in an independent study population should be an essential part of a study design. In family-based linkage studies relationship with genotype and phenotype may be modified by a variety of factors. The large number of families needed in discovering genetic variants with modest effect sizes is difficult to attain. Moreover, the identification of underlying mutations has proven difficult. When pooling data or performing meta-analyses from different populations, disease and locus heterogeneity may become a major issue. First genome-wide association studies (GWAS) have identified risk loci for preeclampsia. Adequately powered replication studies are critical in order to replicate the initial GWAS findings. This approach requires rigorous multiple testing correction. The expected effect sizes of individual sequence variants on preeclampsia are small, but this approach is likely to decipher new clues to the pathogenesis. The rare variants, gene-gene and gene-environmental interactions as well as noncoding genetic variations and epigenetics are expected to explain the missing heritability. Next-generation sequencing technologies will make large amount of data on genomes and transcriptomes available. Complexity of the data poses a challenge. Different depths of coverage might be chosen depending on the design of the study, and validation of the results by different methods is mandatory. In order to minimize disease heterogeneity in genetic studies of preeclampsia, identification of subtypes and intermediate phenotypes would be highly desirable. Copyright © 2013. Published by Elsevier B.V.
Recollection-Dependent Memory for Event Duration in Large-Scale Spatial Navigation

ERIC Educational Resources Information Center

Brunec, Iva K.; Ozubko, Jason D.; Barense, Morgan D.; Moscovitch, Morris

2017-01-01

Time and space represent two key aspects of episodic memories, forming the spatiotemporal context of events in a sequence. Little is known, however, about how temporal information, such as the duration and the order of particular events, are encoded into memory, and if it matters whether the memory representation is based on recollection or…
Functional interactions of HIV-infection and methamphetamine dependence during motor programming.

PubMed

Archibald, Sarah L; Jacobson, Mark W; Fennema-Notestine, Christine; Ogasawara, Miki; Woods, Steven P; Letendre, Scott; Grant, Igor; Jernigan, Terry L

2012-04-30

Methamphetamine (METH) dependence is frequently comorbid with HIV infection and both have been linked to alterations of brain structure and function. In a previous study, we showed that the brain volume loss characteristic of HIV infection contrasts with METH-related volume increases in striatum and parietal cortex, suggesting distinct neurobiological responses to HIV and METH (Jernigan et al., 2005). Functional magnetic resonance imaging (fMRI) has the potential to reveal functional interactions between the effects of HIV and METH. In the present study, 50 participants were studied in four groups: an HIV+ group, a recently METH-dependent group, a dually affected group, and a group of unaffected community comparison subjects. An fMRI paradigm consisting of motor sequencing tasks of varying levels of complexity was administered to examine blood oxygenation level dependent (BOLD) changes. Within all groups, activity increased significantly with increasing task complexity in large clusters within sensorimotor and parietal cortex, basal ganglia, cerebellum, and cingulate. The task complexity effect was regressed on HIV status, METH status, and the HIV×METH interaction term in a simultaneous multiple regression. HIV was associated with less complexity-related activation in striatum, whereas METH was associated with less complexity-related activation in parietal regions. Significant interaction effects were observed in both cortical and subcortical regions; and, contrary to expectations, the complexity-related activation was less aberrant in dually affected than in single risk participants, in spite of comparable levels of neurocognitive impairment among the clinical groups. Thus, HIV and METH dependence, perhaps through their effects on dopaminergic systems, may have opposing functional effects on neural circuits involved in motor programming. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
There is Diversity in Disorder-"In all Chaos there is a Cosmos, in all Disorder a Secret Order".

PubMed

Nielsen, Jakob T; Mulder, Frans A A

2016-01-01

The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.
Effect of sequence-dependent rigidity on plectoneme localization in dsDNA

NASA Astrophysics Data System (ADS)

Medalion, Shlomi; Rabin, Yitzhak

2016-04-01

We use Monte-Carlo simulations to study the effect of variable rigidity on plectoneme formation and localization in supercoiled double-stranded DNA. We show that the presence of soft sequences increases the number of plectoneme branches and that the edges of the branches tend to be localized at these sequences. We propose an experimental approach to test our results in vitro, and discuss the possible role played by plectoneme localization in the search process of transcription factors for their targets (promoter regions) on the bacterial genome.
Fluorescent DNA-templated silver nanoclusters

NASA Astrophysics Data System (ADS)

Lin, Ruoqian

Because of the ultra-small size and biocompatibility of silver nanoclusters, they have attracted much research interest for their applications in biolabeling. Among the many ways of synthesizing silver nanoclusters, DNA templated method is particularly attractive---the high tunability of DNA sequences provides another degree of freedom for controlling the chemical and photophysical properties. However, systematic studies about how DNA sequences and concentrations are controlling the photophysical properties are still lacking. The aim of this thesis is to investigate the binding mechanisms of silver clusters binding and single stranded DNAs. Here in this thesis, we report synthesis and characterization of DNA-templated silver nanoclusters and provide a systematic interrogation of the effects of DNA concentrations and sequences, including lengths and secondary structures. We performed a series of syntheses utilizing five different sequences to explore the optimal synthesis condition. By characterizing samples with UV-vis and fluorescence spectroscopy, we achieved the most proper reactants ratio and synthesis conditions. Two of them were chosen for further concentration dependence studies and sequence dependence studies. We found that cytosine-rich sequences are more likely to produce silver nanoclusters with stronger fluorescence signals; however, sequences with hairpin secondary structures are more capable in stabilizing silver nanoclusters. In addition, the fluorescence peak emission intensities and wavelengths of the DNA templated silver clusters have sequence dependent fingerprints. This potentially can be applied to sequence sensing in the future. However all the current conclusions are not warranted; there is still difficulty in formulating general rules in DNA strand design and silver nanocluster production. Further investigation of more sequences could solve these questions in the future.
A Comparison of the Effects of Temporary Hippocampal Lesions on Single and Dual Context Versions of the Olfactory Sequence Memory Task

PubMed Central

Sill, Orriana C.; Smith, David M.

2012-01-01

In recent years, many animal models of memory have focused on one or more of the various components of episodic memory. For example, the odor sequence memory task requires subjects to remember individual items and events (the odors) and the temporal aspects of the experience (the sequence of odor presentation). The well-known spatial context coding function of the hippocampus, as exemplified by place cell firing, may reflect the ‘where’ component of episodic memory. In the present study, we added a contextual component to the odor sequence memory task by training rats to choose the earlier odor in one context and the later odor in another context and we compared the effects of temporary hippocampal lesions on performance of the original single context task and the new dual context task. Temporary lesions significantly impaired the single context task, although performance remained significantly above chance levels. In contrast, performance dropped all the way to chance when temporary lesions were used in the dual context task. These results demonstrate that rats can learn a dual context version of the odor sequence learning task which requires the use of contextual information along with the requirement to remember the ‘what’ and ‘when’ components of the odor sequence. Moreover, the additional requirement of context-dependent expression of the ‘what-when’ memory made the task fully dependent on the hippocampus. Moreover, the addition of the contextual component made the task fully dependent on the hippocampus. PMID:22687149
Life-assessment technique for nuclear power plant cables

NASA Astrophysics Data System (ADS)

Bartoníček, B.; Hnát, V.; Plaček, V.

1998-06-01

The condition of polymer-based cable material can be best characterized by measuring elongation at break of its insulating materials. However, it is not often possible to take sufficiently large samples for measurement with the tensile testing machine. The problem has been conveniently solved by utilizing differential scanning calorimetry technique. From the tested cable, several microsamples are taken and the oxidation induction time (OIT) is determined. For each cable which is subject to the assessment of the lifetime, the correlation of OIT with elongation at break and the correlation of elongation at break with the cable service time has to be performed. A reliable assessment of the cable lifetime depends on accuracy of these correlations. Consequently, synergistic effects well known at this time - dose rate effects and effects resulting from the different sequence of applying radiation and elevated temperature must be taken into account.
Cancer in the crosshairs: targeting cancer metabolism with hyperpolarized carbon-13 MRI technology.

PubMed

von Morze, Cornelius; Merritt, Matthew E

2018-06-05

Magnetic resonance (MR)-based hyperpolarized (HP) 13 C metabolic imaging is under active pursuit as a new clinical diagnostic method for cancer detection, grading, and monitoring of therapeutic response. Following the tremendous success of metabolic imaging by positron emission tomography, which already plays major roles in clinical oncology, the added value of HP 13 C MRI is emerging. Aberrant glycolysis and central carbon metabolism is a hallmark of many forms of cancer. The chemical transformations associated with these pathways produce metabolites ranging in general from three to six carbons, and are dependent on the redox state and energy charge of the tissue. The significant changes in chemistry associated with flux through these pathways imply that HP imaging can take advantage of the underlying chemical shift information encoded into an MR experiment to produce images of the injected substrate as well as its metabolites. However, imaging of HP metabolites poses unique constraints on pulse sequence design related to detection of X-nuclei, decay of the HP magnetization due to T 1 , and the consumption of HP signal by the inspection pulses. Advancements in the field continue to depend critically on customization of MRI systems and pulse sequences for optimized detection of HP 13 C signals, focused largely on extracting the maximum amount of information during the short lifetime of the HP magnetization. From a clinical perspective, the success of HP 13 C MRI of cancer will largely depend upon the utility of HP pyruvate for the detection of lactate pools associated with the Warburg effect, though several other agents are also under investigation, with novel agents continually being formulated. In this review, the salient aspects of HP 13 C imaging will be highlighted, with an emphasis on both technological challenges and the biochemical aspects of HP experimental design. Copyright © 2018 John Wiley & Sons, Ltd.

Deep Recurrent Neural Networks for Human Activity Recognition

PubMed Central

Murad, Abdulmajid

2017-01-01

Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103
Deep Recurrent Neural Networks for Human Activity Recognition.

PubMed

Murad, Abdulmajid; Pyun, Jae-Young

2017-11-06

Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.
Clustering evolving proteins into homologous families.

PubMed

Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

2013-04-08

Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
Simple chained guide trees give high-quality protein multiple sequence alignments

PubMed Central

Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

2014-01-01

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495
preAssemble: a tool for automatic sequencer trace data processing.

PubMed

Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V

2006-01-17

Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.
Loss and persistence of implicit memory for sound: evidence from auditory stream segregation context effects.

PubMed

Snyder, Joel S; Weintraub, David M

2013-07-01

An important question is the extent to which declines in memory over time are due to passive loss or active interference from other stimuli. The purpose of the present study was to determine the extent to which implicit memory effects in the perceptual organization of sound sequences are subject to loss and interference. Toward this aim, we took advantage of two recently discovered context effects in the perceptual judgments of sound patterns, one that depends on stimulus features of previous sounds and one that depends on the previous perceptual organization of these sounds. The experiments measured how listeners' perceptual organization of a tone sequence (test) was influenced by the frequency separation, or the perceptual organization, of the two preceding sequences (context1 and context2). The results demonstrated clear evidence for loss of context effects over time but little evidence for interference. However, they also revealed that context effects can be surprisingly persistent. The robust effects of loss, followed by persistence, were similar for the two types of context effects. We discuss whether the same auditory memories might contain information about basic stimulus features of sounds (i.e., frequency separation), as well as the perceptual organization of these sounds.
The autism sequencing consortium: large-scale, high-throughput sequencing in autism spectrum disorders.

PubMed

Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W

2012-12-20

Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.
A hybrid model based on neural networks for biomedical relation extraction.

PubMed

Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Zhang, Shaowu; Sun, Yuanyuan; Yang, Liang

2018-05-01

Biomedical relation extraction can automatically extract high-quality biomedical relations from biomedical texts, which is a vital step for the mining of biomedical knowledge hidden in the literature. Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are two major neural network models for biomedical relation extraction. Neural network-based methods for biomedical relation extraction typically focus on the sentence sequence and employ RNNs or CNNs to learn the latent features from sentence sequences separately. However, RNNs and CNNs have their own advantages for biomedical relation extraction. Combining RNNs and CNNs may improve biomedical relation extraction. In this paper, we present a hybrid model for the extraction of biomedical relations that combines RNNs and CNNs. First, the shortest dependency path (SDP) is generated based on the dependency graph of the candidate sentence. To make full use of the SDP, we divide the SDP into a dependency word sequence and a relation sequence. Then, RNNs and CNNs are employed to automatically learn the features from the sentence sequence and the dependency sequences, respectively. Finally, the output features of the RNNs and CNNs are combined to detect and extract biomedical relations. We evaluate our hybrid model using five public (protein-protein interaction) PPI corpora and a (drug-drug interaction) DDI corpus. The experimental results suggest that the advantages of RNNs and CNNs in biomedical relation extraction are complementary. Combining RNNs and CNNs can effectively boost biomedical relation extraction performance. Copyright © 2018 Elsevier Inc. All rights reserved.
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.

PubMed

Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

2008-07-01

UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
'Big data', Hadoop and cloud computing in genomics.

PubMed

O'Driscoll, Aisling; Daugelaite, Jurate; Sleator, Roy D

2013-10-01

Since the completion of the Human Genome project at the turn of the Century, there has been an unprecedented proliferation of genomic sequence data. A consequence of this is that the medical discoveries of the future will largely depend on our ability to process and analyse large genomic data sets, which continue to expand as the cost of sequencing decreases. Herein, we provide an overview of cloud computing and big data technologies, and discuss how such expertise can be used to deal with biology's big data sets. In particular, big data technologies such as the Apache Hadoop project, which provides distributed and parallelised data processing and analysis of petabyte (PB) scale data sets will be discussed, together with an overview of the current usage of Hadoop within the bioinformatics community. Copyright © 2013 Elsevier Inc. All rights reserved.
Quantification of the methylation status of the PWS/AS imprinted region: comparison of two approaches based on bisulfite sequencing and methylation-sensitive MLPA.

PubMed

Dikow, Nicola; Nygren, Anders Oh; Schouten, Jan P; Hartmann, Carolin; Krämer, Nikola; Janssen, Bart; Zschocke, Johannes

2007-06-01

Standard methods used for genomic methylation analysis allow the detection of complete absence of either methylated or non-methylated alleles but are usually unable to detect changes in the proportion of methylated and unmethylated alleles. We compare two methods for quantitative methylation analysis, using the chromosome 15q11-q13 imprinted region as model. Absence of the non-methylated paternal allele in this region leads to Prader-Willi syndrome (PWS) whilst absence of the methylated maternal allele results in Angelman syndrome (AS). A proportion of AS is caused by mosaic imprinting defects which may be missed with standard methods and require quantitative analysis for their detection. Sequence-based quantitative methylation analysis (SeQMA) involves quantitative comparison of peaks generated through sequencing reactions after bisulfite treatment. It is simple, cost-effective and can be easily established for a large number of genes. However, our results support previous suggestions that methods based on bisulfite treatment may be problematic for exact quantification of methylation status. Methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) avoids bisulfite treatment. It detects changes in both CpG methylation as well as copy number of up to 40 chromosomal sequences in one simple reaction. Once established in a laboratory setting, the method is more accurate, reliable and less time consuming.
A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2.

PubMed

Li, Jianli; Dai, Hongzheng; Feng, Yanming; Tang, Jia; Chen, Stella; Tian, Xia; Gorman, Elizabeth; Schmitt, Eric S; Hansen, Terah A A; Wang, Jing; Plon, Sharon E; Zhang, Victor Wei; Wong, Lee-Jun C

2015-09-01

Germline mutations in the DNA mismatch repair gene PMS2 underlie the cancer susceptibility syndrome, Lynch syndrome. However, accurate molecular testing of PMS2 is complicated by a large number of highly homologous sequences. To establish a comprehensive approach for mutation detection of PMS2, we have designed a strategy combining targeted capture next-generation sequencing (NGS), multiplex ligation-dependent probe amplification, and long-range PCR followed by NGS to simultaneously detect point mutations and copy number changes of PMS2. Exonic deletions (E2 to E9, E5 to E9, E8, E10, E14, and E1 to E15), duplications (E11 to E12), and a nonsense mutation, p.S22*, were identified. Traditional multiplex ligation-dependent probe amplification and Sanger sequencing approaches cannot differentiate the origin of the exonic deletions in the 3' region when PMS2 and PMS2CL share identical sequences as a result of gene conversion. Our approach allows unambiguous identification of mutations in the active gene with a straightforward long-range-PCR/NGS method. Breakpoint analysis of multiple samples revealed that recurrent exon 14 deletions are mediated by homologous Alu sequences. Our comprehensive approach provides a reliable tool for accurate molecular analysis of genes containing multiple copies of highly homologous sequences and should improve PMS2 molecular analysis for patients with Lynch syndrome. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Effect of Sequencing Strength and Endurance Training in Young Male Soccer Players.

PubMed

Makhlouf, Issam; Castagna, Carlo; Manzi, Vincenzo; Laurencelle, Louis; Behm, David G; Chaouachi, Anis

2016-03-01

This study examined the effects of strength and endurance training sequence (strength before or after endurance) on relevant fitness variables in youth soccer players. Fifty-seven young elite-level male field soccer players (13.7 ± 0.5 years; 164 ± 8.3 cm; 53.5 ± 8.6 kg; body fat; 15.6 ± 3.9%) were randomly assigned to a control (n = 14, CG) and 3 experimental training groups (twice a week for 12 weeks) strength before (SE, n = 15), after (ES, n = 14) or on alternate days (ASE, n = 14) with endurance training. A significant (p = 0.001) intervention main effect was detected. There were only trivial training sequence differences (ES vs. SE) for all variables (p > 0.05). The CG showed large squat 1 repetition maximum (1RM) and medium sprint, change of direction ability, and jump improvements. ASE demonstrated a trivial difference in endurance performance with ES and SE (p > 0.05). Large to medium greater improvements for SE and ES were reported compared with ASE for sprinting over 10 and 30 m (p < 0.02). The SE squat 1RM was higher than in ASE (moderate, p < 0.02). Postintervention differences between ES and SE with CG fitness variables were small to medium (p ≤ 0.05) except for a large SE advantage with the Yo-Yo intermittent recovery test (p < 0.001, large). This study showed no effect of intrasession training sequence on soccer fitness-relevant variables. However, combining strength and endurance within a single training session provided superior results vs. training on alternate days. Concurrent training may be considered as an effective and safe training method for the development of the prospective soccer player.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder.

PubMed

Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E

2012-09-28

Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder

PubMed Central

2012-01-01

Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841
Radiation effects on MOS devices - dosimetry, annealing, irradiation sequence, and sources

NASA Technical Reports Server (NTRS)

Stassinopoulos, E. G.; Brucker, G. J.; Van Gunten, O.; Knudson, A. R.; Jordan, T. M.

1983-01-01

This paper reports on some investigations of dosimetry, annealing, irradiation sequences, and radioactive sources, involved in the determination of radiation effects on MOS devices. Results show that agreement in the experimental and theoretical surface to average doses support the use of thermo-luminescent dosimeters (manganese activated calcium fluoride) in specifying the surface dose delivered to thin gate insulators of MOS devices. Annealing measurements indicate the existence of at least two energy levels,,s or a activation energies, for recovery of soft oxide MOS devices after irradiation by electrons, protons, and gammas. Damage sensitivities of MOS devices were found to be independent of combinations and sequences of radiation type or energies. Comparison of various gamma sources indicated a small dependence of damage sensitivity on the Cobalt facility, but a more significant dependence in the case of the Cesium source. These results were attributed to differences in the spectral content of the several sources.
Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes

PubMed Central

Liu, Kuan-Liang; Porras-Alfaro, Andrea; Eichorst, Stephanie A.

2012-01-01

Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp). PMID:22194300
Phylogenetically Structured Differences in rRNA Gene Sequence Variation among Species of Arbuscular Mycorrhizal Fungi and Their Implications for Sequence Clustering

PubMed Central

Ekanayake, Saliya; Ruan, Yang; Schütte, Ursel M. E.; Kaonongbua, Wittaya; Fox, Geoffrey; Ye, Yuzhen; Bever, James D.

2016-01-01

ABSTRACT Arbuscular mycorrhizal (AM) fungi form mutualisms with plant roots that increase plant growth and shape plant communities. Each AM fungal cell contains a large amount of genetic diversity, but it is unclear if this diversity varies across evolutionary lineages. We found that sequence variation in the nuclear large-subunit (LSU) rRNA gene from 29 isolates representing 21 AM fungal species generally assorted into genus- and species-level clades, with the exception of species of the genera Claroideoglomus and Entrophospora. However, there were significant differences in the levels of sequence variation across the phylogeny and between genera, indicating that it is an evolutionarily constrained trait in AM fungi. These consistent patterns of sequence variation across both phylogenetic and taxonomic groups pose challenges to interpreting operational taxonomic units (OTUs) as approximations of species-level groups of AM fungi. We demonstrate that the OTUs produced by five sequence clustering methods using 97% or equivalent sequence similarity thresholds failed to match the expected species of AM fungi, although OTUs from AbundantOTU, CD-HIT-OTU, and CROP corresponded better to species than did OTUs from mothur or UPARSE. This lack of OTU-to-species correspondence resulted both from sequences of one species being split into multiple OTUs and from sequences of multiple species being lumped into the same OTU. The OTU richness therefore will not reliably correspond to the AM fungal species richness in environmental samples. Conservatively, this error can overestimate species richness by 4-fold or underestimate richness by one-half, and the direction of this error will depend on the genera represented in the sample. IMPORTANCE Arbuscular mycorrhizal (AM) fungi form important mutualisms with the roots of most plant species. Individual AM fungi are genetically diverse, but it is unclear whether the level of this diversity differs among evolutionary lineages. We found that the amount of sequence variation in an rRNA gene that is commonly used to identify AM fungal species varied significantly between evolutionary groups that correspond to different genera, with the exception of two genera that are genetically indistinguishable from each other. When we clustered groups of similar sequences into operational taxonomic units (OTUs) using five different clustering methods, these patterns of sequence variation caused the number of OTUs to either over- or underestimate the actual number of AM fungal species, depending on the genus. Our results indicate that OTU-based inferences about AM fungal species composition from environmental sequences can be improved if they take these taxonomically structured patterns of sequence variation into account. PMID:27260357
Phase-Specific Vocalizations of Male Mice at the Initial Encounter during the Courtship Sequence

PubMed Central

Matsumoto, Yui K.; Okanoya, Kazuo

2016-01-01

Mice produce ultrasonic vocalizations featuring a variety of syllables. Vocalizations are observed during social interactions. In particular, males produce numerous syllables during courtship. Previous studies have shown that vocalizations change according to sexual behavior, suggesting that males vary their vocalizations depending on the phase of the courtship sequence. To examine this process, we recorded large sets of mouse vocalizations during male–female interactions and acoustically categorized these sounds into 12 vocal types. We found that males emitted predominantly short syllables during the first minute of interaction, more long syllables in the later phases, and mainly harmonic sounds during mounting. These context- and time-dependent changes in vocalization indicate that vocal communication during courtship in mice consists of at least three stages and imply that each vocalization type has a specific role in a phase of the courtship sequence. Our findings suggest that recording for a sufficiently long time and taking the phase of courtship into consideration could provide more insights into the role of vocalization in mouse courtship behavior in future study. PMID:26841117
An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.

PubMed

Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

2017-09-12

Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.

Sequence-dependent effects in drug-DNA interaction: the crystal structure of Hoechst 33258 bound to the d(CGCAAATTTGCG)2 duplex.

PubMed Central

Spink, N; Brown, D G; Skelly, J V; Neidle, S

1994-01-01

The bis-benzimidazole drug Hoechst 33258 has been co-crystallized with the dodecanucleotide sequence d(CGCAAATTTGCG)2. The structure has been solved by molecular replacement and refined to an R factor of 18.5% for 2125 reflections collected on a Xentronics area detector. The drug is bound in the minor groove, at the five base-pair site 5'-ATTTG and is in a unique orientation. This is displaced by one base pair in the 5' direction compared to previously-determined structures of this drug with the sequence d(CGCGAATTCGCG)2. Reasons for this difference in behaviour are discussed in terms of several sequence-dependent structural features of the DNA, with particular reference to differences in propeller twist and minor-groove width. Images PMID:7515488
Magnetotransport of High Mobility Holes in Monolayer and Bilayer WSe2

NASA Astrophysics Data System (ADS)

Tutuc, Emanuel

Transition metal dichalcogenides have attracted significant interest because of their two-dimensional crystal structure, large band-gap, and strong spin-orbit interaction which leads to spin-valley locking. Recent advances in sample fabrication have allowed the experimental study of low temperature magneto-transport of high mobility holes in WSe2. We review here the main results of these studies which reveal clear quantum Hall states in mono- and bilayer WSe2. The data allows the extraction of an effective hole mass of m* = 0.45me (me is the bare electron mass) in both mono and bilayer WSe2. A systematic study of the carrier distribution in bilayer WSe2 determined from a Fourier analysis of the Shubnikov-de Haas oscillations indicates that the two layers are weakly coupled. The individual layer density dependence on gate bias shows negative compressibility, a signature of strong electron-electron interaction in these materials associated with the large effective mass. We discuss the interplay between cyclotron and Zeeman splitting using the dependence of the quantum Hall state sequence on carrier density, and the angle between the magnetic field and the WSe2 plane. Work done in collaboration with B. Fallahazad, H. C. P. Movva, K. Kim, S. K. Banerjee, T. Taniguchi, and K. Watanabe. This work supported by the Nanoelectronics Research Initiative SWAN center, Intel Corp., and National Science Foundation.
ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

PubMed

He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

2013-12-04

Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Complete genome sequence of a Watermelon silver mottle virus isolate from China.

PubMed

Rao, Xueqin; Wu, Zhuyan; Li, Yuan

2013-06-01

The complete genome of a Watermelon silver mottle virus (WSMoV) (genus Tospovirus, family Bunyaviridae) isolate (WSMoV-GZ) from Guangdong province, China was sequenced. The genomes of WSMoV-GZ contained 3,603, 4,909, and 8,914 nt of small (S), medium (M), and large (L) RNA segments, respectively, and had a genomic organization characteristic of members of the genus Tospovirus. The amino acid sequence of the nucleocapsid (N) protein, S RNA-encoded nonstructural (NSs) protein, M RNA-encoded nonstructural (NSm) protein, Gn/Gc glycoprotein precursor, and RNA-dependent RNA polymerase (RdRp) protein showed 94.3-97.5 % identity with those of other WSMoV isolates. Phylogenetic analysis showed that the N protein of WSMoV-GZ was clustered together with those of the WSMoV isolates. The full sequence of WSMoV-GZ provides a reference genome for comparison with other tospoviruses.
Gene discovery in Boophilus microplus, the cattle tick: the transcriptomes of ovaries, salivary glands, and hemocytes.

PubMed

Santos, Isabel K F de Miranda; Valenzuela, Jesus G; Ribeiro, José Marcos C; de Castro, Marilia; Costa, Juliana Nardelli; Costa, Ana Maria; da Silva, Edson Ramiro; Neto, Olavo Bilac Rego; Rocha, Clarisse; Daffre, Sirlei; Ferreira, Beatriz R; da Silva, João Santana; Szabó, Matias Pablo; Bechara, Gervasio Henrique

2004-10-01

The quest for new control strategies for ticks can profit from high throughput genomics. In order to identify genes that are involved in oogenesis and development, in defense, and in hematophagy, the transcriptomes of ovaries, hemocytes, and salivary glands from rapidly ingurgitating females, and of salivary glands from males of Boophilus microplus were PCR amplified, and the expressed sequence tags (EST) of random clones were mass sequenced. So far, more than 1,344 EST have been generated for these tissues, with approximately 30% novelty, depending on the the tissue studied. To date approximately 760 nucleotide sequences from B. microplus are deposited in the NCBI database. Mass sequencing of partial cDNAs of parasite genes can build up this scant database and rapidly generate a large quantity of useful information about potential targets for immunobiological or chemical control.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.

PubMed

Edgar, Robert C

2004-01-01

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
X-ray Reflected Spectra from Accretion Disk Models. I. Constant Density Atmospheres

NASA Technical Reports Server (NTRS)

Garcia, Javier; Kallman, Timothy R.

2009-01-01

We present new models for illuminated accretion disks, their structure and reprocessed emission. We consider the effects of incident X-rays on the surface of an accretion disk by solving simultaneously the equations of radiative transfer, energy balance and ionization equilibrium over a large range of column densities. We assume plane-parallel geometry and azimuthal symmetry, such that each calculation corresponds to a ring at a given distance from the central object. Our models include recent and complete atomic data for K-shell of the iron and oxygen isonuclear sequences. We examine the effect on the spectrum of fluorescent Ka line emission and absorption in the emitted spectrum. We also explore the dependence of the spectrum on the strength of the incident X-rays and other input parameters, and discuss the importance of Comptonization on the emitted spectrum.
Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences.

PubMed

O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S

2011-01-01

Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Gene Cloning and Characterization of the Very Large NAD-Dependent l-Glutamate Dehydrogenase from the Psychrophile Janthinobacterium lividum, Isolated from Cold Soil▿

PubMed Central

Kawakami, Ryushi; Sakuraba, Haruhiko; Ohshima, Toshihisa

2007-01-01

NAD-dependent l-glutamate dehydrogenase (NAD-GDH) activity was detected in cell extract from the psychrophile Janthinobacterium lividum UTB1302, which was isolated from cold soil and purified to homogeneity. The native enzyme (1,065 kDa, determined by gel filtration) is a homohexamer composed of 170-kDa subunits (determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis). Consistent with these findings, gene cloning and sequencing enabled deduction of the amino acid sequence of the subunit, which proved to be comprised of 1,575 amino acids with a combined molecular mass of 169,360 Da. The enzyme from this psychrophile thus appears to belong to the GDH family characterized by very large subunits, like those expressed by Streptomyces clavuligerus and Pseudomonas aeruginosa (about 180 kDa). The entire amino acid sequence of the J. lividum enzyme showed about 40% identity with the sequences from S. clavuligerus and P. aeruginosa enzymes, but the central domains showed higher homology (about 65%). Within the central domain, the residues related to substrate and NAD binding were highly conserved, suggesting that this is the enzyme's catalytic domain. In the presence of NAD, but not in the presence of NADP, this GDH exclusively catalyzed the oxidative deamination of l-glutamate. The stereospecificity of the hydride transfer to NAD was pro-S, which is the same as that of the other known GDHs. Surprisingly, NAD-GDH activity was markedly enhanced by the addition of various amino acids, such as l-aspartate (1,735%) and l-arginine (936%), which strongly suggests that the N- and/or C-terminal domains play regulatory roles and are involved in the activation of the enzyme by these amino acids. PMID:17526698
Gene cloning and characterization of the very large NAD-dependent l-glutamate dehydrogenase from the psychrophile Janthinobacterium lividum, isolated from cold soil.

PubMed

Kawakami, Ryushi; Sakuraba, Haruhiko; Ohshima, Toshihisa

2007-08-01

NAD-dependent l-glutamate dehydrogenase (NAD-GDH) activity was detected in cell extract from the psychrophile Janthinobacterium lividum UTB1302, which was isolated from cold soil and purified to homogeneity. The native enzyme (1,065 kDa, determined by gel filtration) is a homohexamer composed of 170-kDa subunits (determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis). Consistent with these findings, gene cloning and sequencing enabled deduction of the amino acid sequence of the subunit, which proved to be comprised of 1,575 amino acids with a combined molecular mass of 169,360 Da. The enzyme from this psychrophile thus appears to belong to the GDH family characterized by very large subunits, like those expressed by Streptomyces clavuligerus and Pseudomonas aeruginosa (about 180 kDa). The entire amino acid sequence of the J. lividum enzyme showed about 40% identity with the sequences from S. clavuligerus and P. aeruginosa enzymes, but the central domains showed higher homology (about 65%). Within the central domain, the residues related to substrate and NAD binding were highly conserved, suggesting that this is the enzyme's catalytic domain. In the presence of NAD, but not in the presence of NADP, this GDH exclusively catalyzed the oxidative deamination of l-glutamate. The stereospecificity of the hydride transfer to NAD was pro-S, which is the same as that of the other known GDHs. Surprisingly, NAD-GDH activity was markedly enhanced by the addition of various amino acids, such as l-aspartate (1,735%) and l-arginine (936%), which strongly suggests that the N- and/or C-terminal domains play regulatory roles and are involved in the activation of the enzyme by these amino acids.
Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

PubMed

Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo

2016-01-25

DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
Transcriptional insulation of the human keratin 18 gene in transgenic mice.

PubMed Central

Neznanov, N; Thorey, I S; Ceceña, G; Oshima, R G

1993-01-01

Expression of the 10-kb human keratin 18 (K18) gene in transgenic mice results in efficient and appropriate tissue-specific expression in a variety of internal epithelial organs, including liver, lung, intestine, kidney, and the ependymal epithelium of brain, but not in spleen, heart, or skeletal muscle. Expression at the RNA level is directly proportional to the number of integrated K18 transgenes. These results indicate that the K18 gene is able to insulate itself both from the commonly observed cis-acting effects of the sites of integration and from the potential complications of duplicated copies of the gene arranged in head-to-tail fashion. To begin to identify the K18 gene sequences responsible for this property of transcriptional insulation, additional transgenic mouse lines containing deletions of either the 5' or 3' distal end of the K18 gene have been characterized. Deletion of 1.5 kb of the distal 5' flanking sequence has no effect upon either the tissue specificity or the copy number-dependent behavior of the transgene. In contrast, deletion of the 3.5-kb 3' flanking sequence of the gene results in the loss of the copy number-dependent behavior of the gene in liver and intestine. However, expression in kidney, lung, and brain remains efficient and copy number dependent in these transgenic mice. Furthermore, herpes simplex virus thymidine kinase gene expression is copy number dependent in transgenic mice when the gene is located between the distal 5'- and 3'-flanking sequences of the K18 gene. Each adult transgenic male expressed the thymidine kinase gene in testes and brain and proportionally to the number of integrated transgenes. We conclude that the characteristic of copy number-dependent expression of the K18 gene is tissue specific because the sequence requirements for transcriptional insulation in adult liver and intestine are different from those for lung and kidney. In addition, the behavior of the transgenic thymidine kinase gene in testes and brain suggests that the property of transcriptional insulation of the K18 gene may be conferred by the distal flanking sequences of the K18 gene and, additionally, may function for other genes. Images PMID:7681143
Identification of differentially methylated sites with weak methylation effect

USDA-ARS?s Scientific Manuscript database

DNA methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect dif...
Long-term Recurrent Convolutional Networks for Visual Recognition and Description

DTIC Science & Technology

2014-11-17

deep???, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large...models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent...limitation of simple RNN models which strictly integrate state information over time is known as the “vanishing gradient” effect : the ability to
An effective temperature calibration for main-sequence B- to F-type stars using VJHK_{s} colors

NASA Astrophysics Data System (ADS)

Paunzen, Ernst; Netopil, Martin; Herdin, Andreas

2017-01-01

The effective temperature is an important parameter that is needed for numerous astrophysical studies, in particular to place stars in the Hertzsprung-Russell diagram, for example. Although the availability of large spectroscopic surveys increased significantly in the last decade, photometric data are still much more frequent. Homogeneous photometric (all-sky) surveys provide the basis to derive the effective temperature with reasonable accuracy also for objects that are not covered by spectroscopic surveys, or are out of range for the current spectroscopic instrumentations because of too faint magnitudes. We use data of the Two Micron All Sky Survey (2MASS) and broadband visual photometric measurements to derive effective temperature calibrations for the intrinsic colors (V-J), (V-H), (V-K_{s}), and (J-K_{s}), valid for B2 to F9 stars. The effective temperature calibrations are tied to the Strömgren-Crawford uvbyβ photometric system and do not depend on metallicity or rotational velocity.
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

PubMed

Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

2013-06-27

Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

PubMed Central

2013-01-01

Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html. PMID:23802613
Sequence-dependent DNA flexibility mediates DNase I cleavage.

PubMed

Heddi, Brahim; Abi-Ghanem, Josephine; Lavigne, Marc; Hartmann, Brigitte

2010-01-08

Understanding the preference of nonspecific proteins for certain DNA structural features requires an accurate description of the properties of free DNA, especially regarding their possible predisposition to adopt a conformation that favors the formation of a complex. Exploiting previous exhaustive NMR studies performed on free DNA oligomers, we investigated the molecular basis of DNase I sensitivity under conditions where DNase I binding limits the probability of cleavage. We showed that cleavage intensity was correlated with adjacent 3' phosphate linkage flexibility, monitored by (31)P chemical shifts. Examining NMR-refined DNA structures highlighted that sequence-dependent flexible phosphates were associated with large minor groove variations that may promote the affinity of DNase I, according to relevant DNA-protein complexes. In sum, this work demonstrates that specificity in DNA-DNase I interaction is mediated by DNA flexibility, which influences the induced-fit transitions required to form productive complexes.
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

PubMed

Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

2018-02-15

A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Proximity to AGCT sequences dictates MMR-independent versus MMR-dependent mechanisms for AID-induced mutation via UNG2

PubMed Central

Thientosapol, Eddy Sanchai; Sharbeen, George; Lau, K.K. Edwin; Bosnjak, Daniel; Durack, Timothy; Stevanovski, Igor; Weninger, Wolfgang

2017-01-01

Abstract AID deaminates C to U in either strand of Ig genes, exclusively producing C:G/G:C to T:A/A:T transition mutations if U is left unrepaired. Error-prone processing by UNG2 or mismatch repair diversifies mutation, predominantly at C:G or A:T base pairs, respectively. Here, we show that transversions at C:G base pairs occur by two distinct processing pathways that are dictated by sequence context. Within and near AGCT mutation hotspots, transversion mutation at C:G was driven by UNG2 without requirement for mismatch repair. Deaminations in AGCT were refractive both to processing by UNG2 and to high-fidelity base excision repair (BER) downstream of UNG2, regardless of mismatch repair activity. We propose that AGCT sequences resist faithful BER because they bind BER-inhibitory protein(s) and/or because hemi-deaminated AGCT motifs innately form a BER-resistant DNA structure. Distal to AGCT sequences, transversions at G were largely co-dependent on UNG2 and mismatch repair. We propose that AGCT-distal transversions are produced when apyrimidinic sites are exposed in mismatch excision patches, because completion of mismatch repair would require bypass of these sites. PMID:28039326

Revisiting sample size: are big trials the answer?

PubMed

Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J

2012-07-18

The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.
A parallel strategy for predicting the secondary structure of polycistronic microRNAs.

PubMed

Han, Dianwei; Tang, Guiliang; Zhang, Jun

2013-01-01

The biogenesis of a functional microRNA is largely dependent on the secondary structure of the microRNA precursor (pre-miRNA). Recently, it has been shown that microRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design efficient computational methods to predict such structures for microRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. We conducted some experiments to verify the effectiveness of our parallel algorithm. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs.
Ecology has contrasting effects on genetic variation within species versus rates of molecular evolution across species in water beetles.

PubMed

Fujisawa, Tomochika; Vogler, Alfried P; Barraclough, Timothy G

2015-01-22

Comparative analysis is a potentially powerful approach to study the effects of ecological traits on genetic variation and rate of evolution across species. However, the lack of suitable datasets means that comparative studies of correlates of genetic traits across an entire clade have been rare. Here, we use a large DNA-barcode dataset (5062 sequences) of water beetles to test the effects of species ecology and geographical distribution on genetic variation within species and rates of molecular evolution across species. We investigated species traits predicted to influence their genetic characteristics, such as surrogate measures of species population size, latitudinal distribution and habitat types, taking phylogeny into account. Genetic variation of cytochrome oxidase I in water beetles was positively correlated with occupancy (numbers of sites of species presence) and negatively with latitude, whereas substitution rates across species depended mainly on habitat types, and running water specialists had the highest rate. These results are consistent with theoretical predictions from nearly-neutral theories of evolution, and suggest that the comparative analysis using large databases can give insights into correlates of genetic variation and molecular evolution.
An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

PubMed

Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

2015-06-01

The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Unlinking the methylome pattern from nucleotide sequence, revealed by large-scale in vivo genome engineering and methylome editing in medaka fish

PubMed Central

Nakamura, Ryohei; Uno, Ayako; Kumagai, Masahiko; Fukushima, Hiroto S.; Morishita, Shinichi; Takeda, Hiroyuki

2017-01-01

The heavily methylated vertebrate genomes are punctuated by stretches of poorly methylated DNA sequences that usually mark gene regulatory regions. It is known that the methylation state of these regions confers transcriptional control over their associated genes. Given its governance on the transcriptome, cellular functions and identity, genome-wide DNA methylation pattern is tightly regulated and evidently predefined. However, how is the methylation pattern determined in vivo remains enigmatic. Based on in silico and in vitro evidence, recent studies proposed that the regional hypomethylated state is primarily determined by local DNA sequence, e.g., high CpG density and presence of specific transcription factor binding sites. Nonetheless, the dependency of DNA methylation on nucleotide sequence has not been carefully validated in vertebrates in vivo. Herein, with the use of medaka (Oryzias latipes) as a model, the sequence dependency of DNA methylation was intensively tested in vivo. Our statistical modeling confirmed the strong statistical association between nucleotide sequence pattern and methylation state in the medaka genome. However, by manipulating the methylation state of a number of genomic sequences and reintegrating them into medaka embryos, we demonstrated that artificially conferred DNA methylation states were predominantly and robustly maintained in vivo, regardless of their sequences and endogenous states. This feature was also observed in the medaka transgene that had passed across generations. Thus, despite the observed statistical association, nucleotide sequence was unable to autonomously determine its own methylation state in medaka in vivo. Our results apparently argue against the notion of the governance on the DNA methylation by nucleotide sequence, but instead suggest the involvement of other epigenetic factors in defining and maintaining the DNA methylation landscape. Further investigation in other vertebrate models in vivo will be needed for the generalization of our observations made in medaka. PMID:29267279
A multivariate prediction model for Rho-dependent termination of transcription.

PubMed

Nadiras, Cédric; Eveno, Eric; Schwartz, Annie; Figueroa-Bossi, Nara; Boudvillain, Marc

2018-06-21

Bacterial transcription termination proceeds via two main mechanisms triggered either by simple, well-conserved (intrinsic) nucleic acid motifs or by the motor protein Rho. Although bacterial genomes can harbor hundreds of termination signals of either type, only intrinsic terminators are reliably predicted. Computational tools to detect the more complex and diversiform Rho-dependent terminators are lacking. To tackle this issue, we devised a prediction method based on Orthogonal Projections to Latent Structures Discriminant Analysis [OPLS-DA] of a large set of in vitro termination data. Using previously uncharacterized genomic sequences for biochemical evaluation and OPLS-DA, we identified new Rho-dependent signals and quantitative sequence descriptors with significant predictive value. Most relevant descriptors specify features of transcript C>G skewness, secondary structure, and richness in regularly-spaced 5'CC/UC dinucleotides that are consistent with known principles for Rho-RNA interaction. Descriptors collectively warrant OPLS-DA predictions of Rho-dependent termination with a ∼85% success rate. Scanning of the Escherichia coli genome with the OPLS-DA model identifies significantly more termination-competent regions than anticipated from transcriptomics and predicts that regions intrinsically refractory to Rho are primarily located in open reading frames. Altogether, this work delineates features important for Rho activity and describes the first method able to predict Rho-dependent terminators in bacterial genomes.
The distribution of rotational velocities for low-mass stars in the Pleiades

NASA Technical Reports Server (NTRS)

Stauffer, John R.; Hartmann, Lee W.

1987-01-01

The available spectral type and color data for late-type Pleiades members have been reanalyzed, and new reddening estimates are obtained. New photometry for a small number of stars and a compilation of H-alpha equivalent widths for Pleiades dwarfs are presented. These data are used to examine the location of the rapid rotators in color-magnitude diagrams and the correlation between chromospheric activity and rotation. It is shown that the wide range of angular momenta exhibited by Pleiades K and M dwarfs is not necessarily produced by a combination of main-sequence spin-downs and a large age spread; it can also result from a plausible spread in initial angular momenta, coupled with initial main-sequence spin-down rates that are only weakly dependent on rotation. The new reddening estimates confirm Breger's (1985) finding of large extinctions confined to a small region in the southern portion of the Merope nebula.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.

PubMed

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-11-23

Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from http://genoma.unab.cl/juice_system/ or http://www.genomavegetal.cl/juice_system/.
JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

PubMed Central

Latorre, Mariano; Silva, Herman; Saba, Juan; Guziolowski, Carito; Vizoso, Paula; Martinez, Veronica; Maldonado, Jonathan; Morales, Andrea; Caroca, Rodrigo; Cambiazo, Veronica; Campos-Vargas, Reinaldo; Gonzalez, Mauricio; Orellana, Ariel; Retamales, Julio; Meisel, Lee A

2006-01-01

Background Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. Results In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. Conclusion JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from or . PMID:17123449
DNA breathing dynamics distinguish binding from nonbinding consensus sites for transcription factor YY1 in cells.

PubMed

Alexandrov, Boian S; Fukuyo, Yayoi; Lange, Martin; Horikoshi, Nobuo; Gelev, Vladimir; Rasmussen, Kim Ø; Bishop, Alan R; Usheva, Anny

2012-11-01

The genome-wide mapping of the major gene expression regulators, the transcription factors (TFs) and their DNA binding sites, is of great importance for describing cellular behavior and phenotypic diversity. Presently, the methods for prediction of genomic TF binding produce a large number of false positives, most likely due to insufficient description of the physiochemical mechanisms of protein-DNA binding. Growing evidence suggests that, in the cell, the double-stranded DNA (dsDNA) is subject to local transient strands separations (breathing) that contribute to genomic functions. By using site-specific chromatin immunopecipitations, gel shifts, BIOBASE data, and our model that accurately describes the melting behavior and breathing dynamics of dsDNA we report a specific DNA breathing profile found at YY1 binding sites in cells. We find that the genomic flanking sequence variations and SNPs, may exert long-range effects on DNA dynamics and predetermine YY1 binding. The ubiquitous TF YY1 has a fundamental role in essential biological processes by activating, initiating or repressing transcription depending upon the sequence context it binds. We anticipate that consensus binding sequences together with the related DNA dynamics profile may significantly improve the accuracy of genomic TF binding sites and TF binding-related functional SNPs.
Russell body inducing threshold depends on the variable domain sequences of individual human IgG clones and the cellular protein homeostasis.

PubMed

Stoops, Janelle; Byrd, Samantha; Hasegawa, Haruki

2012-10-01

Russell bodies are intracellular aggregates of immunoglobulins. Although the mechanism of Russell body biogenesis has been extensively studied by using truncated mutant heavy chains, the importance of the variable domain sequences in this process and in immunoglobulin biosynthesis remains largely unknown. Using a panel of structurally and functionally normal human immunoglobulin Gs, we show that individual immunoglobulin G clones possess distinctive Russell body inducing propensities that can surface differently under normal and abnormal cellular conditions. Russell body inducing predisposition unique to each immunoglobulin G clone was corroborated by the intrinsic physicochemical properties encoded in the heavy chain variable domain/light chain variable domain sequence combinations that define each immunoglobulin G clone. While the sequence based intrinsic factors predispose certain immunoglobulin G clones to be more prone to induce Russell bodies, extrinsic factors such as stressful cell culture conditions also play roles in unmasking Russell body propensity from immunoglobulin G clones that are normally refractory to developing Russell bodies. By taking advantage of heterologous expression systems, we dissected the roles of individual subunit chains in Russell body formation and examined the effect of non-cognate subunit chain pair co-expression on Russell body forming propensity. The results suggest that the properties embedded in the variable domain of individual light chain clones and their compatibility with the partnering heavy chain variable domain sequences underscore the efficiency of immunoglobulin G biosynthesis, the threshold for Russell body induction, and the level of immunoglobulin G secretion. We propose that an interplay between the unique properties encoded in variable domain sequences and the state of protein homeostasis determines whether an immunoglobulin G expressing cell will develop the Russell body phenotype in a dynamic cellular setting. Copyright © 2012 Elsevier B.V. All rights reserved.
Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

PubMed

Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

2017-01-01

In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes.
CaMELS: In silico prediction of calmodulin binding proteins and their binding sites.

PubMed

Abbasi, Wajid Arshad; Asif, Amina; Andleeb, Saiqa; Minhas, Fayyaz Ul Amir Afsar

2017-09-01

Due to Ca 2+ -dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet-lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet-lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large-margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM-binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome-wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif-based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub-sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels. © 2017 Wiley Periodicals, Inc.
Nucleotide synthetase ribozymes may have emerged first in the RNA world

PubMed Central

Ma, Wentao; Yu, Chunwu; Zhang, Wentao; Hu, Jiming

2007-01-01

Though the “RNA world” hypothesis has gained a central role in ideas concerning the origin of life, the scenario concerning its emergence remains uncertain. It has been speculated that the first scene may have been the emergence of a template-dependent RNA synthetase ribozyme, which catalyzed its own replication: thus, “RNA replicase.” However, the speculation remains uncertain, primarily because of the large sequence length requirement of such a replicase and the lack of a convincing mechanism to ensure its self-favoring features. Instead, we propose a nucleotide synthetase ribozyme as an alternative candidate, especially considering recent experimental evidence suggesting the possibility of effective nonenzymatic template-directed synthesis of RNA. A computer simulation was conducted to support our proposal. The conditions for the emergence of the nucleotide synthetase ribozyme are discussed, based on dynamic analysis on a computer. We suggest the template-dependent RNA synthetase ribozyme emerged later, perhaps after the emergence of protocells. PMID:17878321
The force-dependent mechanism of DnaK-mediated mechanical folding

PubMed Central

Perales-Calvo, Judit; Giganti, David; Stirnemann, Guillaume; Garcia-Manyes, Sergi

2018-01-01

It is well established that chaperones modulate the protein folding free-energy landscape. However, the molecular determinants underlying chaperone-mediated mechanical folding remain largely elusive, primarily because the force-extended unfolded conformation fundamentally differs from that characterized in biochemistry experiments. We use single-molecule force-clamp spectroscopy, combined with molecular dynamics simulations, to study the effect that the Hsp70 system has on the mechanical folding of three mechanically stiff model proteins. Our results demonstrate that, when working independently, DnaJ (Hsp40) and DnaK (Hsp70) work as holdases, blocking refolding by binding to distinct substrate conformations. Whereas DnaK binds to molten globule–like forms, DnaJ recognizes a cryptic sequence in the extended state in an unanticipated force-dependent manner. By contrast, the synergetic coupling of the Hsp70 system exhibits a marked foldase behavior. Our results offer unprecedented molecular and kinetic insights into the mechanisms by which mechanical force finely regulates chaperone binding, directly affecting protein elasticity. PMID:29487911
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

PubMed Central

Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

2008-01-01

Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
A model of metastable dynamics during ongoing and evoked cortical activity

NASA Astrophysics Data System (ADS)

La Camera, Giancarlo

The dynamics of simultaneously recorded spike trains in alert animals often evolve through temporal sequences of metastable states. Little is known about the network mechanisms responsible for the genesis of such sequences, or their potential role in neural coding. In the gustatory cortex of alert rates, state sequences can be observed also in the absence of overt sensory stimulation, and thus form the basis of the so-called `ongoing activity'. This activity is characterized by a partial degree of coordination among neurons, sharp transitions among states, and multi-stability of single neurons' firing rates. A recurrent spiking network model with clustered topology can account for both the spontaneous generation of state sequences and the (network-generated) multi-stability. In the model, each network state results from the activation of specific neural clusters with potentiated intra-cluster connections. A mean field solution of the model shows a large number of stable states, each characterized by a subset of simultaneously active clusters. The firing rate in each cluster during ongoing activity depends on the number of active clusters, so that the same neuron can have different firing rates depending on the state of the network. Because of dense intra-cluster connectivity and recurrent inhibition, in finite networks the stable states lose stability due to finite size effects. Simulations of the dynamics show that the model ensemble activity continuously hops among the different states, reproducing the ongoing dynamics observed in the data. Moreover, when probed with external stimuli, the model correctly predicts the quenching of single neuron multi-stability into bi-stability, the reduction of dimensionality of the population activity, the reduction of trial-to-trial variability, and a potential role for metastable states in the anticipation of expected events. Altogether, these results provide a unified mechanistic model of ongoing and evoked cortical dynamics. NSF IIS-1161852, NIDCD K25-DC013557, NIDCD R01-DC010389.
Pulseq: A rapid and hardware-independent pulse sequence prototyping framework.

PubMed

Layton, Kelvin J; Kroboth, Stefan; Jia, Feng; Littin, Sebastian; Yu, Huijun; Leupold, Jochen; Nielsen, Jon-Fredrik; Stöcker, Tony; Zaitsev, Maxim

2017-04-01

Implementing new magnetic resonance experiments, or sequences, often involves extensive programming on vendor-specific platforms, which can be time consuming and costly. This situation is exacerbated when research sequences need to be implemented on several platforms simultaneously, for example, at different field strengths. This work presents an alternative programming environment that is hardware-independent, open-source, and promotes rapid sequence prototyping. A novel file format is described to efficiently store the hardware events and timing information required for an MR pulse sequence. Platform-dependent interpreter modules convert the file to appropriate instructions to run the sequence on MR hardware. Sequences can be designed in high-level languages, such as MATLAB, or with a graphical interface. Spin physics simulation tools are incorporated into the framework, allowing for comparison between real and virtual experiments. Minimal effort is required to implement relatively advanced sequences using the tools provided. Sequences are executed on three different MR platforms, demonstrating the flexibility of the approach. A high-level, flexible and hardware-independent approach to sequence programming is ideal for the rapid development of new sequences. The framework is currently not suitable for large patient studies or routine scanning although this would be possible with deeper integration into existing workflows. Magn Reson Med 77:1544-1552, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Micronuclear DNA of Oxytricha nova contains sequences with autonomously replicating activity in Saccharomyces cerevisiae.

PubMed Central

Colombo, M M; Swanton, M T; Donini, P; Prescott, D M

1984-01-01

Oxytricha nova is a hypotrichous ciliate with micronuclei and macronuclei. Micronuclei, which contain large, chromosomal-sized DNA, are genetically inert but undergo meiosis and exchange during cell mating. Macronuclei, which contain only small, gene-sized DNA molecules, provide all of the nuclear RNA needed to run the cell. After cell mating the macronucleus is derived from a micronucleus, a derivation that includes excision of the genes from chromosomes and elimination of the remaining DNA. The eliminated DNA includes all of the repetitious sequences and approximately 95% of the unique sequences. We cloned large restriction fragments from the micronucleus that confer replication ability on a replication-deficient plasmid in Saccharomyces cerevisiae. Sequences that confer replication ability are called autonomously replicating sequences. The frequency and effectiveness of autonomously replicating sequences in micronuclear DNA are similar to those reported for DNAs of other organisms introduced into yeast cells. Of the 12 micronuclear fragments with autonomously replicating sequence activity, 9 also showed homology to macronuclear DNA, indicating that they contain a macronuclear gene sequence. We conclude from this that autonomously replicating sequence activity is nonrandomly distributed throughout micronuclear DNA and is preferentially associated with those regions of micronuclear DNA that contain genes. Images PMID:6092934
Downregulation of viral RNA translation by hepatitis C virus non-structural protein NS5A requires the poly(U/UC) sequence in the 3' UTR.

PubMed

Hoffman, Brett; Li, Zhubing; Liu, Qiang

2015-08-01

Hepatitis C virus (HCV) non-structural protein 5A (NS5A) is essential for viral replication; however, its effect on HCV RNA translation remains controversial partially due to the use of reporters lacking the 3' UTR, where NS5A binds to the poly(U/UC) sequence. We investigated the role of NS5A in HCV translation using a monocistronic RNA containing a Renilla luciferase gene flanked by the HCV UTRs. We found that NS5A downregulated viral RNA translation in a dose-dependent manner. This downregulation required both the 5' and 3' UTRs of HCV because substitution of either sequence with the 5' and 3' UTRs of enterovirus 71 or a cap structure at the 5' end eliminated the effects of NS5A on translation. Translation of the HCV genomic RNA was also downregulated by NS5A. The inhibition of HCV translation by NS5A required the poly(U/UC) sequence in the 3' UTR as NS5A did not affect translation when it was deleted. In addition, we showed that, whilst the amphipathic α-helix of NS5A has no effect on viral translation, the three domains of NS5A can inhibit translation independently, also dependent on the presence of the poly(U/UC) sequence in the 3' UTR. These results suggested that NS5A downregulated HCV RNA translation through a mechanism involving the poly(U/UC) sequence in the 3' UTR.

Efficient secretion of small proteins in mammalian cells relies on Sec62-dependent posttranslational translocation

PubMed Central

Lakkaraju, Asvin K. K.; Thankappan, Ratheeshkumar; Mary, Camille; Garrison, Jennifer L.; Taunton, Jack; Strub, Katharina

2012-01-01

Mammalian cells secrete a large number of small proteins, but their mode of translocation into the endoplasmic reticulum is not fully understood. Cotranslational translocation was expected to be inefficient due to the small time window for signal sequence recognition by the signal recognition particle (SRP). Impairing the SRP pathway and reducing cellular levels of the translocon component Sec62 by RNA interference, we found an alternate, Sec62-dependent translocation path in mammalian cells required for the efficient translocation of small proteins with N-terminal signal sequences. The Sec62-dependent translocation occurs posttranslationally via the Sec61 translocon and requires ATP. We classified preproteins into three groups: 1) those that comprise ≤100 amino acids are strongly dependent on Sec62 for efficient translocation; 2) those in the size range of 120–160 amino acids use the SRP pathway, albeit inefficiently, and therefore rely on Sec62 for efficient translocation; and 3) those larger than 160 amino acids depend on the SRP pathway to preserve a transient translocation competence independent of Sec62. Thus, unlike in yeast, the Sec62-dependent translocation pathway in mammalian cells serves mainly as a fail-safe mechanism to ensure efficient secretion of small proteins and provides cells with an opportunity to regulate secretion of small proteins independent of the SRP pathway. PMID:22648169
The role of RT carry-over for congruence sequence effects in masked priming.

PubMed

Huber-Huber, Christoph; Ansorge, Ulrich

2017-05-01

The present study disentangles 2 sources of the congruence sequence effect with masked primes: congruence and response time of the previous trial (reaction time [RT] carry-over). Using arrows as primes and targets and a metacontrast masking procedure we found congruence as well as congruence sequence effects. In addition, congruence sequence effects decreased when RT carry-over was accounted for in a mixed model analysis, suggesting that RT carry-over contributes to congruence sequence effects in masked priming. Crucially, effects of previous trial congruence were not cancelled out completely indicating that RT carry-over and previous trial congruence are 2 sources feeding into the congruence sequence effect. A secondary task requiring response speed judgments demonstrated general awareness of response speed (Experiments 1), but removing this secondary task (Experiment 2) showed that RT carry-over effects were also present in single-task conditions. During (dual-task) prime-awareness test parts of both experiments, however, RT carry-over failed to modulate congruence effects, suggesting that some task sets of the participants can prevent the effect. The basic RT carry-over effects are consistent with the conflict adaptation account, with the adaptation to the statistics of the environment (ASE) model, and possibly with the temporal learning explanation. Additionally considering the task-dependence of RT carry-over, the results are most compatible with the conflict adaptation account. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Flooding greatly affects the diversity of arbuscular mycorrhizal fungi communities in the roots of wetland plants.

PubMed

Wang, Yutao; Huang, Yelin; Qiu, Qiu; Xin, Guorong; Yang, Zhongyi; Shi, Suhua

2011-01-01

The communities of arbuscular mycorrhizal fungi (AMF) colonizing the roots of three mangrove species were characterized along a tidal gradient in a mangrove swamp. A fragment, designated SSU-ITS-LSU, including part of the small subunit (SSU), the entire internal transcribed spacer (ITS) and part of the large subunit (LSU) of rDNA from samples of AMF-colonized roots was amplified, cloned and sequenced using AMF-specific primers. Similar levels of AMF diversity to those observed in terrestrial ecosystems were detected in the roots, indicating that the communities of AMF in wetland ecosystems are not necessarily low in diversity. In total, 761 Glomeromycota sequences were obtained, which grouped, according to phylogenetic analysis using the SSU-ITS-LSU fragment, into 23 phylotypes, 22 of which belonged to Glomeraceae and one to Acaulosporaceae. The results indicate that flooding plays an important role in AMF diversity, and its effects appear to depend on the degree (duration) of flooding. Both host species and tide level affected community structure of AMF, indicating the presence of habitat and host species preferences.
Flooding Greatly Affects the Diversity of Arbuscular Mycorrhizal Fungi Communities in the Roots of Wetland Plants

PubMed Central

Wang, Yutao; Huang, Yelin; Qiu, Qiu; Xin, Guorong; Yang, Zhongyi; Shi, Suhua

2011-01-01

The communities of arbuscular mycorrhizal fungi (AMF) colonizing the roots of three mangrove species were characterized along a tidal gradient in a mangrove swamp. A fragment, designated SSU-ITS-LSU, including part of the small subunit (SSU), the entire internal transcribed spacer (ITS) and part of the large subunit (LSU) of rDNA from samples of AMF-colonized roots was amplified, cloned and sequenced using AMF-specific primers. Similar levels of AMF diversity to those observed in terrestrial ecosystems were detected in the roots, indicating that the communities of AMF in wetland ecosystems are not necessarily low in diversity. In total, 761 Glomeromycota sequences were obtained, which grouped, according to phylogenetic analysis using the SSU-ITS-LSU fragment, into 23 phylotypes, 22 of which belonged to Glomeraceae and one to Acaulosporaceae. The results indicate that flooding plays an important role in AMF diversity, and its effects appear to depend on the degree (duration) of flooding. Both host species and tide level affected community structure of AMF, indicating the presence of habitat and host species preferences. PMID:21931734
Expansion of inverted repeat does not decrease substitution rates in Pelargonium plastid genomes.

PubMed

Weng, Mao-Lun; Ruhlman, Tracey A; Jansen, Robert K

2017-04-01

For species with minor inverted repeat (IR) boundary changes in the plastid genome (plastome), nucleotide substitution rates were previously shown to be lower in the IR than the single copy regions (SC). However, the impact of large-scale IR expansion/contraction on plastid nucleotide substitution rates among closely related species remains unclear. We included plastomes from 22 Pelargonium species, including eight newly sequenced genomes, and used both pairwise and model-based comparisons to investigate the impact of the IR on sequence evolution in plastids. Ten types of plastome organization with different inversions or IR boundary changes were identified in Pelargonium. Inclusion in the IR was not sufficient to explain the variation of nucleotide substitution rates. Instead, the rate heterogeneity in Pelargonium plastomes was a mixture of locus-specific, lineage-specific and IR-dependent effects. Our study of Pelargonium plastomes that vary in IR length and gene content demonstrates that the evolutionary consequences of retaining these repeats are more complicated than previously suggested. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Development and use of molecular markers: past and present.

PubMed

Grover, Atul; Sharma, P C

2016-01-01

Molecular markers, due to their stability, cost-effectiveness and ease of use provide an immensely popular tool for a variety of applications including genome mapping, gene tagging, genetic diversity diversity, phylogenetic analysis and forensic investigations. In the last three decades, a number of molecular marker techniques have been developed and exploited worldwide in different systems. However, only a handful of these techniques, namely RFLPs, RAPDs, AFLPs, ISSRs, SSRs and SNPs have received global acceptance. A recent revolution in DNA sequencing techniques has taken the discovery and application of molecular markers to high-throughput and ultrahigh-throughput levels. Although, the choice of marker will obviously depend on the targeted use, microsatellites, SNPs and genotyping by sequencing (GBS) largely fulfill most of the user requirements. Further, modern transcriptomic and functional markers will lead the ventures onto high-density genetic map construction, identification of QTLs, breeding and conservation strategies in times to come in combination with other high throughput techniques. This review presents an overview of different marker technologies and their variants with a comparative account of their characteristic features and applications.
Self-expressive Dictionary Learning for Dynamic 3D Reconstruction.

PubMed

Zheng, Enliang; Ji, Dinghuang; Dunn, Enrique; Frahm, Jan-Michael

2017-08-22

We target the problem of sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap. To this end, we develop a framework to recover the unknown structure without sequencing information across video sequences. Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning, where the dictionary is defined as an aggregation of the temporally varying 3D structures. Given the smooth motion of dynamic objects, we observe any element in the dictionary can be well approximated by a sparse linear combination of other elements in the same dictionary (i.e. self-expression). Our formulation optimizes a biconvex cost function that leverages a compressed sensing formulation and enforces both structural dependency coherence across video streams, as well as motion smoothness across estimates from common video sources. We further analyze the reconstructability of our approach under different capture scenarios, and its comparison and relation to existing methods. Experimental results on large amounts of synthetic data as well as real imagery demonstrate the effectiveness of our approach.
Field-induced phase transitions and enhanced double negative electrocaloric effects in (Pb,La)(Zr,Sn,Ti)O3 antiferroelectric single crystal

NASA Astrophysics Data System (ADS)

Zhuo, Fangping; Li, Qiang; Qiao, Huimin; Yan, Qingfeng; Zhang, Yiling; Xi, Xiaoqing; Chu, Xiangcheng; Long, Xifa; Cao, Wenwu

2018-03-01

Field-induced phase transitions and electrocaloric effect have been studied in (Pb,La)(Zr,Sn,Ti)O3 (PLZST) antiferroelectric single crystal. Temperature dependent dielectric, Raman spectra, as well as in situ domain evolution demonstrated that the order of phase transitions during heating is in the sequence of orthorhombic antiferroelectric → tetragonal antiferroelectric → cubic paraelectric. Enhanced negative electrocaloric effect value of -3.6 °C and electrocaloric strength of 0.3 K mm/kV at 125 °C have been achieved. Double negative effects (-0.7 °C at 45 °C and -3.6 °C at 125 °C) and a relatively large positive effect (1 °C) near Curie temperature (190 °C) have been found in the PLZST single crystal. Moreover, microscopic dipoles and a phenomenological Landau-type model were employed to understand these unusual electrocaloric effects. Enhanced negative effect and the coexistence of both negative and positive effects in one material are promising for us to develop practical solid-state cooling devices with high efficiency.
Facile Site-Directed Mutagenesis of Large Constructs Using Gibson Isothermal DNA Assembly.

PubMed

Yonemoto, Isaac T; Weyman, Philip D

2017-01-01

Site-directed mutagenesis is a commonly used molecular biology technique to manipulate biological sequences, and is especially useful for studying sequence determinants of enzyme function or designing proteins with improved activity. We describe a strategy using Gibson Isothermal DNA Assembly to perform site-directed mutagenesis on large (>~20 kbp) constructs that are outside the effective range of standard techniques such as QuikChange II (Agilent Technologies), but more reliable than traditional cloning using restriction enzymes and ligation.
Bacteria abundance and diversity of different life stages of Plutella xylostella (Lepidoptera: Plutellidae), revealed by bacteria culture-dependent and PCR-DGGE methods.

PubMed

Lin, Xiao-Li; Pan, Qin-Jian; Tian, Hong-Gang; Douglas, Angela E; Liu, Tong-Xian

2015-03-01

Microbial abundance and diversity of different life stages (fourth instar larvae, pupae and adults) of the diamondback moth, Plutella xylostella L., collected from field and reared in laboratory, were investigated using bacteria culture-dependent method and PCR-DGGE analysis based on the sequence of bacteria 16S rRNA V3 region gene. A large quantity of bacteria was found in all life stages of P. xylostella. Field population had higher quantity of bacteria than laboratory population, and larval gut had higher quantity than pupae and adults. Culturable bacteria differed in different life stages of P. xylostella. Twenty-five different bacterial strains were identified in total, among them 20 strains were presented in larval gut, only 8 strains in pupae and 14 strains in adults were detected. Firmicutes bacteria, Bacillus sp., were the most dominant species in every life stage. 15 distinct bands were obtained from DGGE electrophoresis gel. The sequences blasted in GenBank database showed these bacteria belonged to six different genera. Phylogenetic analysis showed the sequences of the bacteria belonged to the Actinobacteri, Proteobacteria and Firmicutes. Serratia sp. in Proteobacteria was the most abundant species in larval gut. In pupae, unculturable bacteria were the most dominant species, and unculturable bacteria and Serratia sp. were the most dominant species in adults. Our study suggested that a combination of molecular and traditional culturing methods can be effectively used to analyze and to determine the diversity of gut microflora. These known bacteria may play important roles in development of P. xylostella. © 2013 Institute of Zoology, Chinese Academy of Sciences.
Sequence-Specific Model for Peptide Retention Time Prediction in Strong Cation Exchange Chromatography.

PubMed

Gussakovsky, Daniel; Neustaeter, Haley; Spicer, Victor; Krokhin, Oleg V

2017-11-07

The development of a peptide retention prediction model for strong cation exchange (SCX) separation on a Polysulfoethyl A column is reported. Off-line 2D LC-MS/MS analysis (SCX-RPLC) of S. cerevisiae whole cell lysate was used to generate a retention dataset of ∼30 000 peptides, sufficient for identifying the major sequence-specific features of peptide retention mechanisms in SCX. In contrast to RPLC/hydrophilic interaction liquid chromatography (HILIC) separation modes, where retention is driven by hydrophobic/hydrophilic contributions of all individual residues, SCX interactions depend mainly on peptide charge (number of basic residues at acidic pH) and size. An additive model (incorporating the contributions of all 20 residues into the peptide retention) combined with a peptide length correction produces a 0.976 R 2 value prediction accuracy, significantly higher than the additive models for either HILIC or RPLC. Position-dependent effects on peptide retention for different residues were driven by the spatial orientation of tryptic peptides upon interaction with the negatively charged surface functional groups. The positively charged N-termini serve as a primary point of interaction. For example, basic residues (Arg, His, Lys) increase peptide retention when located closer to the N-terminus. We also found that hydrophobic interactions, which could lead to a mixed-mode separation mechanism, are largely suppressed at 20-30% of acetonitrile in the eluent. The accuracy of the final Sequence-Specific Retention Calculator (SSRCalc) SCX model (∼0.99 R 2 value) exceeds all previously reported predictors for peptide LC separations. This also provides a solid platform for method development in 2D LC-MS protocols in proteomics and peptide retention prediction filtering of false positive identifications.
When fruits lose to animals: Disorganized search of semantic memory in Parkinson's disease.

PubMed

Tagini, Sofia; Seyed-Allaei, Shima; Scarpina, Federica; Toraldo, Alessio; Mauro, Alessandro; Cherubini, Paolo; Reverberi, Carlo

2018-04-16

The semantic fluency task is widely used in both clinical and research settings to assess both the integrity of the semantic store and the effectiveness of the search through it. Our aim was to investigate whether nondemented Parkinson's disease (PD) patients show an impairment in the strategic exploration of the semantic store and whether the tested semantic category has an impact on multiple measures of performance. We compared 74 nondemented PD patients with 254 healthy subjects in a semantic fluency test using relatively small (fruits) and large (animals) semantic categories. Number of words produced, number of explored semantic subcategories, and degree of order in the produced sequences were computed as dependent variables. PD patients produced fewer words than healthy subjects did, regardless of the category. Number of subcategories was also lower in PD patients than in healthy subjects, without a significant difference between categories. Critically, PD patients' sequences were less semantically organized than were those of controls, but this effect appeared in only the smaller category (fruits), thus pointing to a lack of strategy in exploring the semantic store. Our results show that the semantic fluency deficit in PD patients has a strategic component, even though that may not be the only cause of the impaired performance. Furthermore, our evidence suggests that the semantic category used in the test influences performance, hence providing an explanation for the failure by previous studies, which often used large categories such as animals, to detect strategy deficits in PD. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Measures of phylogenetic differentiation provide robust and complementary insights into microbial communities.

PubMed

Parks, Donovan H; Beiko, Robert G

2013-01-01

High-throughput sequencing techniques have made large-scale spatial and temporal surveys of microbial communities routine. Gaining insight into microbial diversity requires methods for effectively analyzing and visualizing these extensive data sets. Phylogenetic β-diversity measures address this challenge by allowing the relationship between large numbers of environmental samples to be explored using standard multivariate analysis techniques. Despite the success and widespread use of phylogenetic β-diversity measures, an extensive comparative analysis of these measures has not been performed. Here, we compare 39 measures of phylogenetic β diversity in order to establish the relative similarity of these measures along with key properties and performance characteristics. While many measures are highly correlated, those commonly used within microbial ecology were found to be distinct from those popular within classical ecology, and from the recently recommended Gower and Canberra measures. Many of the measures are surprisingly robust to different rootings of the gene tree, the choice of similarity threshold used to define operational taxonomic units, and the presence of outlying basal lineages. Measures differ considerably in their sensitivity to rare organisms, and the effectiveness of measures can vary substantially under alternative models of differentiation. Consequently, the depth of sequencing required to reveal underlying patterns of relationships between environmental samples depends on the selected measure. Our results demonstrate that using complementary measures of phylogenetic β diversity can further our understanding of how communities are phylogenetically differentiated. Open-source software implementing the phylogenetic β-diversity measures evaluated in this manuscript is available at http://kiwi.cs.dal.ca/Software/ExpressBetaDiversity.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
A noniterative improvement of Guyan reduction

NASA Technical Reports Server (NTRS)

Ganesan, N.

1993-01-01

In determining the natural modes and frequencies of a linear elastic structure, Guyan reduction is often used to reduce the size of the mass and stiffness matrices and the solution of the reduced system is obtained first. The reduced system modes are then expanded to the size of the original system by using a static transformation linking the retained degrees of freedom to the omitted degrees of freedom. In the present paper, the transformation matrix of Guyan reduction is modified to include additional terms from a series accounting for the inertial effects. However, the inertial terms are dependent on the unknown frequencies. A practical approximation is employed to compute the inertial terms without any iteration. This new transformation is implemented in NASTRAN using a DMAP sequence alter. Numerical examples using a cantilever beam illustrate the necessary condition for allowing a large number of additional terms in the proposed series correction of Guyan reduction. A practical example of a large model of the Plasma Motor Generator module to be flown on a Delta launch vehicle is also presented.
siRNA and innate immunity.

PubMed

Robbins, Marjorie; Judge, Adam; MacLachlan, Ian

2009-06-01

Canonical small interfering RNA (siRNA) duplexes are potent activators of the mammalian innate immune system. The induction of innate immunity by siRNA is dependent on siRNA structure and sequence, method of delivery, and cell type. Synthetic siRNA in delivery vehicles that facilitate cellular uptake can induce high levels of inflammatory cytokines and interferons after systemic administration in mammals and in primary human blood cell cultures. This activation is predominantly mediated by immune cells, normally via a Toll-like receptor (TLR) pathway. The siRNA sequence dependency of these pathways varies with the type and location of the TLR involved. Alternatively nonimmune cell activation may also occur, typically resulting from siRNA interaction with cytoplasmic RNA sensors such as RIG1. As immune activation by siRNA-based drugs represents an undesirable side effect due to the considerable toxicities associated with excessive cytokine release in humans, understanding and abrogating this activity will be a critical component in the development of safe and effective therapeutics. This review describes the intracellular mechanisms of innate immune activation by siRNA, the design of appropriate sequences and chemical modification approaches, and suitable experimental methods for studying their effects, with a view toward reducing siRNA-mediated off-target effects.
Native Contact Density and Nonnative Hydrophobic Effects in the Folding of Bacterial Immunity Proteins

PubMed Central

Chen, Tao; Chan, Hue Sun

2015-01-01

The bacterial colicin-immunity proteins Im7 and Im9 fold by different mechanisms. Experimentally, at pH 7.0 and 10°C, Im7 folds in a three-state manner via an intermediate but Im9 folding is two-state-like. Accordingly, Im7 exhibits a chevron rollover, whereas the chevron arm for Im9 folding is linear. Here we address the biophysical basis of their different behaviors by using native-centric models with and without additional transferrable, sequence-dependent energies. The Im7 chevron rollover is not captured by either a pure native-centric model or a model augmented by nonnative hydrophobic interactions with a uniform strength irrespective of residue type. By contrast, a more realistic nonnative interaction scheme that accounts for the difference in hydrophobicity among residues leads simultaneously to a chevron rollover for Im7 and an essentially linear folding chevron arm for Im9. Hydrophobic residues identified by published experiments to be involved in nonnative interactions during Im7 folding are found to participate in the strongest nonnative contacts in this model. Thus our observations support the experimental perspective that the Im7 folding intermediate is largely underpinned by nonnative interactions involving large hydrophobics. Our simulation suggests further that nonnative effects in Im7 are facilitated by a lower local native contact density relative to that of Im9. In a one-dimensional diffusion picture of Im7 folding with a coordinate- and stability-dependent diffusion coefficient, a significant chevron rollover is consistent with a diffusion coefficient that depends strongly on native stability at the conformational position of the folding intermediate. PMID:26016652
Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.

PubMed

Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian

2016-07-12

Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
Oral peptide specific egg antibody to intestinal sodium-dependent phosphate co-transporter-2b is effective at altering phosphate transport in vitro and in vivo.

PubMed

Bobeck, Elizabeth A; Hellestad, Erica M; Sand, Jordan M; Piccione, Michelle L; Bishop, Jeff W; Helvig, Christian; Petkovich, Martin; Cook, Mark E

2015-06-01

Hyperimmunized hens are an effective means of generating large quantities of antigen specific egg antibodies that have use as oral supplements. In this study, we attempted to create a peptide specific antibody that produced outcomes similar to those of the human pharmaceutical, sevelamer HCl, used in the treatment of hyperphosphatemia (a sequela of chronic renal disease). Egg antibodies were generated against 8 different human intestinal sodium-dependent phosphate cotransporter 2b (NaPi2b) peptides, and hNaPi2b peptide egg antibodies were screened for their ability to inhibit phosphate transport in human intestinal Caco-2 cell line. Antibody produced against human peptide sequence TSPSLCWT (anti-h16) was specific for its peptide sequence, and significantly reduced phosphate transport in human Caco-2 cells to 25.3±11.5% of control nonspecific antibody, when compared to nicotinamide, a known inhibitor of phosphate transport (P≤0.05). Antibody was then produced against the mouse-specific peptide h16 counterpart (mouse sequence TSPSYCWT, anti-m16) for further analysis in a murine model. When anti-m16 was fed to mice (1% of diet as dried egg yolk powder), egg yolk immunoglobulin (IgY) was detected using immunohistochemical staining in mouse ileum, and egg anti-m16 IgY colocalized with a commercial goat anti-NaPi2b antibody. The effectiveness of anti-m16 egg antibody in reducing serum phosphate, when compared to sevelamer HCl, was determined in a mouse feeding study. Serum phosphate was reduced 18% (P<0.02) in mice fed anti-m16 (1% as dried egg yolk powder) and 30% (P<0.0001) in mice fed sevelamer HCl (1% of diet) when compared to mice fed nonspecific egg immunoglobulin. The methods described and the findings reported show that oral egg antibodies are useful and easy to prepare reagents for the study and possible treatment of select diseases. © 2015 Poultry Science Association Inc.
Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.

PubMed

Wu, Jiaxin; Li, Yanda; Jiang, Rui

2014-03-01

Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.

High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.

PubMed

Harrison, Lucas B; Hanson, Nancy D

2017-06-01

Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.
Effects on in vitro and in vivo angiogenesis induced by small peptides carrying adhesion sequences.

PubMed

Conconi, Maria Teresa; Ghezzo, Francesca; Dettin, Monica; Urbani, Luca; Grandi, Claudio; Guidolin, Diego; Nico, Beatrice; Di Bello, Carlo; Ribatti, Domenico; Parnigotto, Pier Paolo

2010-07-01

It is well known that tumor growth is strictly dependent on neo-vessel formation inside the tumor mass and that cell adhesion is required to allow EC proliferation and migration inside the tumor. In this work, we have evaluated the in vitro and in vivo effects on angiogenesis of some peptides, originally designed to promote cell adhesion on biomaterials, containing RGD motif mediating cell adhesion via integrin receptors [RGD, GRGDSPK, and (GRGDSP)(4)K] or the heparin-binding sequence of human vitronectin that interacts with HSPGs [HVP(351-359)]. Cell adhesion, proliferation, migration, and capillary-like tube formation in Matrigel were determined on HUVECs, whereas the effects on in vivo angiogenesis were evaluated using the CAM assay. (GRGDSP)(4)K linear sequence inhibited cell adhesion, decreased cell proliferation, migration and morphogenesis in Matrigel, and induced anti-angiogenic responses on CAM at higher degree than that determined after incubation with RGD or GRGDSPK. Moreover, it counteracted both in vitro and in vivo the pro-angiogenic effects induced by the Fibroblast growth factor (FGF-2). On the other hand, HVP was not able to affect cell adhesion and appeared less effective than (GRGDSP)(4)K. Our data indicate that the activity of RGD-containing peptides is related to their adhesive properties, and their effects are modulated by the number of cell adhesion motifs and the aminoacidic residues next to these sequences. The anti-angiogenic properties of (GRGDSP)(4)K seem to depend on its interaction with integrins, whereas the effects of HVP may be partially due to an impairment of HSPGs/FGF-2.
Analysis of delay reducing and fuel saving sequencing and spacing algorithms for arrival traffic

NASA Technical Reports Server (NTRS)

Neuman, Frank; Erzberger, Heinz

1991-01-01

The air traffic control subsystem that performs sequencing and spacing is discussed. The function of the sequencing and spacing algorithms is to automatically plan the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several algorithms are described and their statistical performance is examined. Sequencing brings order to an arrival sequence for aircraft. First-come-first-served sequencing (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the arriving traffic, gaps will remain in the sequence of aircraft. Delays are reduced by time-advancing the leading aircraft of each group while still preserving the FCFS order. Tightly spaced groups of aircraft remain with a mix of heavy and large aircraft. Spacing requirements differ for different types of aircraft trailing each other. Traffic is reordered slightly to take advantage of this spacing criterion, thus shortening the groups and reducing average delays. For heavy traffic, delays for different traffic samples vary widely, even when the same set of statistical parameters is used to produce each sample. This report supersedes NASA TM-102795 on the same subject. It includes a new method of time-advance as well as an efficient method of sequencing and spacing for two dependent runways.
The physical size of transcription factors is key to transcriptional regulation in chromatin domains

NASA Astrophysics Data System (ADS)

Maeshima, Kazuhiro; Kaizu, Kazunari; Tamura, Sachiko; Nozaki, Tadasu; Kokubo, Tetsuro; Takahashi, Koichi

2015-02-01

Genetic information, which is stored in the long strand of genomic DNA as chromatin, must be scanned and read out by various transcription factors. First, gene-specific transcription factors, which are relatively small (˜50 kDa), scan the genome and bind regulatory elements. Such factors then recruit general transcription factors, Mediators, RNA polymerases, nucleosome remodellers, and histone modifiers, most of which are large protein complexes of 1-3 MDa in size. Here, we propose a new model for the functional significance of the size of transcription factors (or complexes) for gene regulation of chromatin domains. Recent findings suggest that chromatin consists of irregularly folded nucleosome fibres (10 nm fibres) and forms numerous condensed domains (e.g., topologically associating domains). Although the flexibility and dynamics of chromatin allow repositioning of genes within the condensed domains, the size exclusion effect of the domain may limit accessibility of DNA sequences by transcription factors. We used Monte Carlo computer simulations to determine the physical size limit of transcription factors that can enter condensed chromatin domains. Small gene-specific transcription factors can penetrate into the chromatin domains and search their target sequences, whereas large transcription complexes cannot enter the domain. Due to this property, once a large complex binds its target site via gene-specific factors it can act as a ‘buoy’ to keep the target region on the surface of the condensed domain and maintain transcriptional competency. This size-dependent specialization of target-scanning and surface-tethering functions could provide novel insight into the mechanisms of various DNA transactions, such as DNA replication and repair/recombination.
Indexcov: fast coverage quality control for whole-genome sequencing.

PubMed

Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E; Quinlan, Aaron R

2017-11-01

The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license. © The Authors 2017. Published by Oxford University Press.
Evidence of protein-free homology recognition in magnetic bead force-extension experiments

NASA Astrophysics Data System (ADS)

O'Lee, D. J.; Danilowicz, C.; Rochester, C.; Kornyshev, A. A.; Prentiss, M.

2016-07-01

Earlier theoretical studies have proposed that the homology-dependent pairing of large tracts of dsDNA may be due to physical interactions between homologous regions. Such interactions could contribute to the sequence-dependent pairing of chromosome regions that may occur in the presence or the absence of double-strand breaks. Several experiments have indicated the recognition of homologous sequences in pure electrolytic solutions without proteins. Here, we report single-molecule force experiments with a designed 60 kb long dsDNA construct; one end attached to a solid surface and the other end to a magnetic bead. The 60 kb constructs contain two 10 kb long homologous tracts oriented head to head, so that their sequences match if the two tracts fold on each other. The distance between the bead and the surface is measured as a function of the force applied to the bead. At low forces, the construct molecules extend substantially less than normal, control dsDNA, indicating the existence of preferential interaction between the homologous regions. The force increase causes no abrupt but continuous unfolding of the paired homologous regions. Simple semi-phenomenological models of the unfolding mechanics are proposed, and their predictions are compared with the data.
Influence of Flow Sequencing Attributed to Climate Change and Climate Variability on the Assessment of Water-dependent Ecosystem Outcomes

NASA Astrophysics Data System (ADS)

Wang, J.; Nathan, R.; Horne, A.

2017-12-01

Traditional approaches to characterize water-dependent ecosystem outcomes in response to flow have been based on time-averaged hydrological indicators, however there is increasing recognition for the need to characterize ecological processes that are highly dependent on the sequencing of flow conditions (i.e. floods and droughts). This study considers the representation of flow regimes when considering assessment of ecological outcomes, and in particular, the need to account for sequencing and variability of flow. We conducted two case studies - one in the largely unregulated Ovens River catchment and one in the highly regulated Murray River catchment (both located in south-eastern Australia) - to explore the importance of flow sequencing to the condition of a typical long-lived ecological asset in Australia, the River Red Gum forests. In the first, the Ovens River case study, the implications of representing climate change using different downscaling methods (annual scaling, monthly scaling, quantile mapping, and weather generator method) on the sequencing of flows and resulting ecological outcomes were considered. In the second, the Murray River catchment, sequencing within a historic drought period was considered by systematically making modest adjustments on an annual basis to the hydrological records. In both cases, the condition of River Red Gum forests was assessed using an ecological model that incorporates transitions between ecological conditions in response to sequences of required flow components. The results of both studies show the importance of considering how hydrological alterations are represented when assessing ecological outcomes. The Ovens case study showed that there is significant variation in the predicted ecological outcomes when different downscaling techniques are applied. Similarly, the analysis in the Murray case study showed that the drought as it historically occurred provided one of the best possible outcomes for River Red Gum forests when compared to other re-arrangements of flow within the same drought. These results have implications for the way we represent climate change impacts and drought risk assessments where ecological outcomes are a key management objective.
Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

PubMed Central

Glunčić, Matko; Paar, Vladimir

2013-01-01

The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes). PMID:22977183
The Global Statistical Response of the Outer Radiation Belt During Geomagnetic Storms

NASA Astrophysics Data System (ADS)

Murphy, K. R.; Watt, C. E. J.; Mann, I. R.; Jonathan Rae, I.; Sibeck, D. G.; Boyd, A. J.; Forsyth, C. F.; Turner, D. L.; Claudepierre, S. G.; Baker, D. N.; Spence, H. E.; Reeves, G. D.; Blake, J. B.; Fennell, J.

2018-05-01

Using the total radiation belt electron content calculated from Van Allen Probe phase space density, the time-dependent and global response of the outer radiation belt during storms is statistically studied. Using phase space density reduces the impacts of adiabatic changes in the main phase, allowing a separation of adiabatic and nonadiabatic effects and revealing a clear modality and repeatable sequence of events in storm time radiation belt electron dynamics. This sequence exhibits an important first adiabatic invariant (μ)-dependent behavior in the seed (150 MeV/G), relativistic (1,000 MeV/G), and ultrarelativistic (4,000 MeV/G) populations. The outer radiation belt statistically shows an initial phase dominated by loss followed by a second phase of rapid acceleration, while the seed population shows little loss and immediate enhancement. The time sequence of the transition to the acceleration is also strongly μ dependent and occurs at low μ first, appearing to be repeatable from storm to storm.
Experimental investigation of measurement-induced disturbance and time symmetry in quantum physics

NASA Astrophysics Data System (ADS)

Curic, D.; Richardson, M. C.; Thekkadath, G. S.; Flórez, J.; Giner, L.; Lundeen, J. S.

2018-04-01

Unlike regular time evolution governed by the Schrödinger equation, standard quantum measurement appears to violate time-reversal symmetry. Measurement creates random disturbances (e.g., collapse) that prevent back-tracing the quantum state of the system. The effect of these disturbances is explicit in the results of subsequent measurements. In this way, the joint result of sequences of measurements depends on the order in time in which those measurements are performed. One might expect that if the disturbance could be eliminated this time-ordering dependence would vanish. Following a recent theoretical proposal [Bednorz, Franke, and Belzig, New J. Phys. 15, 023043 (2013), 10.1088/1367-2630/15/2/023043], we experimentally investigate this dependence for a kind of measurement that creates an arbitrarily small disturbance: weak measurement. We perform various sequences of a set of polarization weak measurements on photons. We experimentally demonstrate that, although the weak measurements are minimally disturbing, their time ordering affects the outcome of the measurement sequence for quantum systems.
Memory for tonal pitches: a music-length effect hypothesis.

PubMed

Akiva-Kabiri, Lilach; Vecchi, Tomaso; Granot, Roni; Basso, Demis; Schön, Daniele

2009-07-01

One of the most studied effects of verbal working memory (WM) is the influence of the length of the words that compose the list to be remembered. This work aims to investigate the nature of musical WM by replicating the word length effect in the musical domain. Length and rate of presentation were manipulated in a recognition task of tone sequences. Results showed significant effects for both factors (length and presentation rate) as well as their interaction, suggesting the existence of different strategies (e.g., chunking and rehearsal) for the immediate memory of musical information, depending upon the length of the sequences.
Microbial community analysis using MEGAN.

PubMed

Huson, Daniel H; Weber, Nico

2013-01-01

Metagenomics, the study of microbes in the environment using DNA sequencing, depends upon dedicated software tools for processing and analyzing very large sequencing datasets. One such tool is MEGAN (MEtaGenome ANalyzer), which can be used to interactively analyze and compare metagenomic and metatranscriptomic data, both taxonomically and functionally. To perform a taxonomic analysis, the program places the reads onto the NCBI taxonomy, while functional analysis is performed by mapping reads to the SEED, COG, and KEGG classifications. Samples can be compared taxonomically and functionally, using a wide range of different charting and visualization techniques. PCoA analysis and clustering methods allow high-level comparison of large numbers of samples. Different attributes of the samples can be captured and used within analysis. The program supports various input formats for loading data and can export analysis results in different text-based and graphical formats. The program is designed to work with very large samples containing many millions of reads. It is written in Java and installers for the three major computer operating systems are available from http://www-ab.informatik.uni-tuebingen.de. © 2013 Elsevier Inc. All rights reserved.
Implications of hydrologic variability on the succession of plants in Great Lakes wetlands

USGS Publications Warehouse

Wilcox, Douglas A.

2004-01-01

Primary succession of plant communities directed toward a climax is not a typical occurrence in wetlands because these ecological systems are inherently dependent on hydrology, and temporal hydrologic variability often causes reversals or setbacks in succession. Wetlands of the Great Lakes provide good examples for demonstrating the implications of hydrology in driving successional processes and for illustrating potential misinterpretations of apparent successional sequences. Most Great Lakes coastal wetlands follow cyclic patterns in which emergent communities are reduced in area or eliminated by high lake levels and then regenerated from the seed bank during low lake levels. Thus, succession never proceeds for long. Wetlands also develop in ridge and swale terrains in many large embayments of the Great Lakes. These formations contain sequences of wetlands of similar origin but different age that can be several thousand years old, with older wetlands always further from the lake. Analyses of plant communities across a sequence of wetlands at the south end of Lake Michigan showed an apparent successional pattern from submersed to floating to emergent plants as water depth decreased with wetland age. However, paleoecological analyses showed that the observed vegetation changes were driven largely by disturbances associated with increased human settlement in the area. Climate-induced hydrologic changes were also shown to have greater effects on plant-community change than autogenic processes. Other terms, such as zonation, maturation, fluctuations, continuum concept, functional guilds, centrifugal organization, pulse stability, and hump-back models provide additional means of describing organization and changes in vegetation; some of them overlap with succession in describing vegetation processes in Great Lakes wetlands, but each must be used in the proper context with regard to short- and long-term hydrologic variability.
The VMC Survey. XXVII. Young Stellar Structures in the LMC’s Bar Star-forming Complex

NASA Astrophysics Data System (ADS)

Sun, Ning-Chen; de Grijs, Richard; Subramanian, Smitha; Bekki, Kenji; Bell, Cameron P. M.; Cioni, Maria-Rosa L.; Ivanov, Valentin D.; Marconi, Marcella; Oliveira, Joana M.; Piatti, Andrés E.; Ripepi, Vincenzo; Rubele, Stefano; Tatton, Ben L.; van Loon, Jacco Th.

2017-11-01

Star formation is a hierarchical process, forming young stellar structures of star clusters, associations, and complexes over a wide range of scales. The star-forming complex in the bar region of the Large Magellanic Cloud is investigated with upper main-sequence stars observed by the VISTA Survey of the Magellanic Clouds. The upper main-sequence stars exhibit highly nonuniform distributions. Young stellar structures inside the complex are identified from the stellar density map as density enhancements of different significance levels. We find that these structures are hierarchically organized such that larger, lower-density structures contain one or several smaller, higher-density ones. They follow power-law size and mass distributions, as well as a lognormal surface density distribution. All these results support a scenario of hierarchical star formation regulated by turbulence. The temporal evolution of young stellar structures is explored by using subsamples of upper main-sequence stars with different magnitude and age ranges. While the youngest subsample, with a median age of log(τ/yr) = 7.2, contains the most substructure, progressively older ones are less and less substructured. The oldest subsample, with a median age of log(τ/yr) = 8.0, is almost indistinguishable from a uniform distribution on spatial scales of 30-300 pc, suggesting that the young stellar structures are completely dispersed on a timescale of ˜100 Myr. These results are consistent with the characteristics of the 30 Doradus complex and the entire Large Magellanic Cloud, suggesting no significant environmental effects. We further point out that the fractal dimension may be method dependent for stellar samples with significant age spreads.
Impact of divalent metal ions on regulation of adenylyl cyclase isoforms by forskolin analogs.

PubMed

Erdorf, Miriam; Mou, Tung-Chung; Seifert, Roland

2011-12-01

Mammalian membranous adenylyl cyclases (mACs) play an important role in transmembrane signalling events in almost every cell and represent an interesting drug target. Forskolin (FS) is an invaluable research tool, activating AC isoforms 1-8. However, there is a paucity of AC isoform-selective FS analogs. Therefore, we examined the effects of FS and six FS derivatives on recombinant ACs 1, 2 and 5, representing members of different mAC families. Correlations of the pharmacological properties of the different AC isoforms revealed pronounced differences between ACs 1, 2 and 5. Additionally, potencies and efficacies of FS derivatives changed for any given AC isoform, depending on the metal ion, Mg(2+) or Mn(2+). The most striking effects of Mg(2+) and Mn(2+) on the diterpene profile were observed for AC2 where the large inhibitory effect of BODIPY-FS in the presence of Mg(2+) was considerably reduced in the presence of Mn(2+). Sequence alignment and docking experiments confirmed an exceptional position of AC2 compared to ACs 1 and 5 with respect to the structural environment of the catalytic core and cation-dependent diterpene effects. In conclusion, mAC isoforms 1, 2 and 5 exhibit a distinct pharmacological diterpene profile, depending on the divalent cation present. mAC crystal structures and modelling/docking studies provided an explanation for the pharmacological differences between the AC isoforms. Our study constitutes an important step towards the development of isoform-specific diterpenes exhibiting stimulatory or inhibitory effects. Copyright © 2011 Elsevier Inc. All rights reserved.
Protein family clustering for structural genomics.

PubMed

Yan, Yongpan; Moult, John

2005-10-28

A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

PubMed

Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

2018-01-01

Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Quantum-Sequencing: Fast electronic single DNA molecule sequencing

NASA Astrophysics Data System (ADS)

Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

2014-03-01

A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Fang Qiliang; Herczeg, Gregory J.; Rizzuto, Aaron

Past estimates for the age of the Upper Sco Association are typically 11–13 Myr for intermediate-mass stars and 4–5 Myr for low-mass stars. In this study, we simulate populations of young stars to investigate whether this apparent dependence of estimated age on spectral type may be explained by the star formation history of the association. Solar and intermediate mass stars begin their pre-main sequence evolution on the Hayashi track, with fully convective interiors and cool photospheres. Intermediate-mass stars quickly heat up and transition onto the radiative Henyey track. As a consequence, for clusters in which star formation occurs on amore » timescale similar to that of the transition from a convective to a radiative interior, discrepancies in ages will arise when ages are calculated as a function of temperature instead of mass. Simple simulations of a cluster with constant star formation over several Myr may explain about half of the difference in inferred ages versus photospheric temperature; speculative constructions that consist of a constant star formation followed by a large supernova-driven burst could fully explain the differences, including those between F and G stars where evolutionary tracks may be more accurate. The age spreads of low-mass stars predicted from these prescriptions for star formation are consistent with the observed luminosity spread of Upper Sco. The conclusion that a lengthy star formation history will yield a temperature dependence in ages is expected from the basic physics of pre-main sequence evolution, and is qualitatively robust to the large uncertainties in pre-main sequence evolutionary models.« less
Biodiversity of fungi in hot desert sands.

PubMed

Murgia, Manuela; Fiamma, Maura; Barac, Aleksandra; Deligios, Massimo; Mazzarello, Vittorio; Paglietti, Bianca; Cappuccinelli, Pietro; Al-Qahtani, Ahmed; Squartini, Andrea; Rubino, Salvatore; Al-Ahdal, Mohammed N

2018-03-05

The fungal community of six sand samples from Saudi Arabia and Jordan deserts was characterized by culture-independent analysis via next generation sequencing of the 18S rRNA genes and by culture-dependent methods followed by sequencing of internal transcribed spacer (ITS) region. By 18S sequencing were identified from 163 to 507 OTUs per sample, with a percentage of fungi ranging from 3.5% to 82.7%. The identified fungal Phyla were Ascomycota, Basal fungi, and Basidiomycota and the most abundant detected classes were Dothideomycetes, Pezizomycetes, and Sordariomycetes. A total of 11 colonies of filamentous fungi were isolated and cultured from six samples, and the ITS sequencing pointed toward five different species of the class Sordariomycetes, belonging to genera Fusarium (F. redolens, F. solani, F. equiseti), Chaetomium (C. madrasense), and Albifimbria (A. terrestris). The results of this study show an unexpectedly large fungal biodiversity in the Middle East desert sand and their possible role and implications on human health. © 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

Transient effects in π-pulse sequences in MAS solid-state NMR

NASA Astrophysics Data System (ADS)

Hellwagner, Johannes; Wili, Nino; Ibáñez, Luis Fábregas; Wittmann, Johannes J.; Meier, Beat H.; Ernst, Matthias

2018-02-01

Dipolar recoupling techniques that use isolated rotor-synchronized π pulses are commonly used in solid-state NMR spectroscopy to gain insight into the structure of biological molecules. These sequences excel through their simplicity, stability towards radio-frequency (rf) inhomogeneity, and low rf requirements. For a theoretical understanding of such sequences, we present a Floquet treatment based on an interaction-frame transformation including the chemical-shift offset dependence. This approach is applied to the homonuclear dipolar-recoupling sequence Radio-Frequency Driven Recoupling (RFDR) and the heteronuclear recoupling sequence Rotational Echo Double Resonance (REDOR). Based on the Floquet approach, we show the influence of effective fields caused by pulse transients and discuss the advantages of pulse-transient compensation. We demonstrate experimentally that the transfer efficiency for homonuclear recoupling can be doubled in some cases in model compounds as well as in simple peptides if pulse-transient compensation is applied to the π pulses. Additionally, we discuss the influence of various phase cycles on the recoupling efficiency in order to reduce the magnitude of effective fields. Based on the findings from RFDR, we are able to explain why the REDOR sequence does not suffer in the recoupling efficiency despite the presence of effective fields.
Mitogenic effect contributes to increased virulence of Streptococcus suis sequence type 7 to cause streptococcal toxic shock-like syndrome.

PubMed

Zheng, H; Ye, C; Segura, M; Gottschalk, M; Xu, J

2008-09-01

Streptococcus suis serotype 2 sequence type 7 strains emerged in 1996 and caused a streptococcal toxic shock-like syndrome in 1998 and 2005 in China. Evidence indicated that the virulence of S. suis sequence type 7 had increased, but the mechanism was unknown. The sequence type 7 strain SC84, isolated from a patient with streptococcal toxic shock-like syndrome during the Sichuan outbreak, and the sequence type 1 strain 31533, a typical highly pathogenic strain isolated from a diseased pig, were used in comparative studies. In this study we show the mechanisms underlying cytokine production differed between the two types of strains. The S. suis sequence type 7 strain SC84 possesses a stronger capacity to stimulate T cells, naive T cells and peripheral blood mononuclear cell proliferation than does S. suis sequence type 1 strain 31533. The T cell response to both strains was dependent upon the presence of antigen-presenting cells. Histo-incompatible antigen-presenting cells were sufficient to provide the accessory signals to naive T cell stimulated by the two strains, indicating that both sequence type 7 and 1 strains possess mitogens; however, the mitogenic effect was different. Therefore, we propose that the difference in the mitogenic effect of sequence type 7 strain SC84 compared with the sequence type 1 strain 31533 of S. suis may be associated with the clinical, epidemiological and microbiological difference, where the ST 7 strains have a larger mitogenic effect.
Mitogenic effect contributes to increased virulence of Streptococcus suis sequence type 7 to cause streptococcal toxic shock-like syndrome

PubMed Central

Zheng, H; Ye, C; Segura, M; Gottschalk, M; Xu, J

2008-01-01

Streptococcus suis serotype 2 sequence type 7 strains emerged in 1996 and caused a streptococcal toxic shock-like syndrome in 1998 and 2005 in China. Evidence indicated that the virulence of S. suis sequence type 7 had increased, but the mechanism was unknown. The sequence type 7 strain SC84, isolated from a patient with streptococcal toxic shock-like syndrome during the Sichuan outbreak, and the sequence type 1 strain 31533, a typical highly pathogenic strain isolated from a diseased pig, were used in comparative studies. In this study we show the mechanisms underlying cytokine production differed between the two types of strains. The S. suis sequence type 7 strain SC84 possesses a stronger capacity to stimulate T cells, naive T cells and peripheral blood mononuclear cell proliferation than does S. suis sequence type 1 strain 31533. The T cell response to both strains was dependent upon the presence of antigen-presenting cells. Histo-incompatible antigen-presenting cells were sufficient to provide the accessory signals to naive T cell stimulated by the two strains, indicating that both sequence type 7 and 1 strains possess mitogens; however, the mitogenic effect was different. Therefore, we propose that the difference in the mitogenic effect of sequence type 7 strain SC84 compared with the sequence type 1 strain 31533 of S. suis may be associated with the clinical, epidemiological and microbiological difference, where the ST 7 strains have a larger mitogenic effect. PMID:18803762
Inducing protein aggregation by extensional flow

PubMed Central

Dobson, John; Kumar, Amit; Willis, Leon F.; Tuma, Roman; Higazi, Daniel R.; Turner, Richard; Lowe, David C.; Ashcroft, Alison E.; Radford, Sheena E.; Kapur, Nikil

2017-01-01

Relative to other extrinsic factors, the effects of hydrodynamic flow fields on protein stability and conformation remain poorly understood. Flow-induced protein remodeling and/or aggregation is observed both in Nature and during the large-scale industrial manufacture of proteins. Despite its ubiquity, the relationships between the type and magnitude of hydrodynamic flow, a protein’s structure and stability, and the resultant aggregation propensity are unclear. Here, we assess the effects of a defined and quantified flow field dominated by extensional flow on the aggregation of BSA, β2-microglobulin (β2m), granulocyte colony stimulating factor (G-CSF), and three monoclonal antibodies (mAbs). We show that the device induces protein aggregation after exposure to an extensional flow field for 0.36–1.8 ms, at concentrations as low as 0.5 mg mL−1. In addition, we reveal that the extent of aggregation depends on the applied strain rate and the concentration, structural scaffold, and sequence of the protein. Finally we demonstrate the in situ labeling of a buried cysteine residue in BSA during extensional stress. Together, these data indicate that an extensional flow readily unfolds thermodynamically and kinetically stable proteins, exposing previously sequestered sequences whose aggregation propensity determines the probability and extent of aggregation. PMID:28416674
On the Accuracy of Atmospheric Parameter Determination in BAFGK Stars

NASA Astrophysics Data System (ADS)

Ryabchikova, T.; Piskunov, N.; Shulyak, D.

2015-04-01

During the past few years, many papers determining the atmospheric parameters in FGK stars appeared in the literature where the accuracy of effective temperatures is given as 20-40 K. For main sequence stars within the 5 000-13 000 K temperature range, we have performed a comparative analysis of the parameters derived from the spectra by using the SME (Spectroscopy Made Easy) package and those found in the literature. Our sample includes standard stars Sirius, Procyon, δ Eri, and the Sun. Combining different spectral regions in the fitting procedure, we investigated an effect different atomic species have on the derived atmospheric parameters. The temperature difference may exceed 100 K depending on the spectral regions used in the SME procedure. It is shown that the atmospheric parameters derived with the SME procedure which includes wings of hydrogen lines in fitting agrees better with the results derived by the other methods and tools across a large part of the main sequence. For three stars—π Cet, 21 Peg, and Procyon—the atmospheric parameters were also derived by fitting a calculated energy distribution to the observed one. We found a substantial difference in the parameters inferred from different sets and combinations of spectrophotometric observations. An intercomparison of our results and literature data shows that the average accuracy of effective temperature determination for cool stars and for the early B-stars is 70-85 K and 170-200 K, respectively.
Content Is King: Databases Preserve the Collective Information of Science.

PubMed

Yates, John R

2018-04-01

Databases store sequence information experimentally gathered to create resources that further science. In the last 20 years databases have become critical components of fields like proteomics where they provide the basis for large-scale and high-throughput proteomic informatics. Amos Bairoch, winner of the Association of Biomolecular Resource Facilities Frederick Sanger Award, has created some of the important databases proteomic research depends upon for accurate interpretation of data.
Biophysics of protein-DNA interactions and chromosome organization

PubMed Central

Marko, John F.

2014-01-01

The function of DNA in cells depends on its interactions with protein molecules, which recognize and act on base sequence patterns along the double helix. These notes aim to introduce basic polymer physics of DNA molecules, biophysics of protein-DNA interactions and their study in single-DNA experiments, and some aspects of large-scale chromosome structure. Mechanisms for control of chromosome topology will also be discussed. PMID:25419039
Unraveling a molecular determinant for clathrin-independent internalization of the M2 muscarinic acetylcholine receptor

PubMed Central

Wan, Min; Zhang, Wenhua; Tian, Yangli; Xu, Chanjuan; Xu, Tao; Liu, Jianfeng; Zhang, Rongying

2015-01-01

Endocytosis and postendocytic sorting of G-protein-coupled receptors (GPCRs) is important for the regulation of both their cell surface density and signaling profile. Unlike the mechanisms of clathrin-dependent endocytosis (CDE), the mechanisms underlying the control of GPCR signaling by clathrin-independent endocytosis (CIE) remain largely unknown. Among the muscarinic acetylcholine receptors (mAChRs), the M4 mAChR undergoes CDE and recycling, whereas the M2 mAChR is internalized through CIE and targeted to lysosomes. Here we investigated the endocytosis and postendocytic trafficking of M2 mAChR based on a comparative analysis of the third cytoplasmic domain in M2 and M4 mAChRs. For the first time, we identified that the sequence 374KKKPPPS380 servers as a sorting signal for the clathrin-independent internalization of M2 mAChR. Switching 374KKKPPPS380 to the i3 loop of the M4 mAChR shifted the receptor into lysosomes through the CIE pathway; and therefore away from CDE and recycling. We also found another previously unidentified sequence that guides CDE of the M2 mAChR, 361VARKIVKMTKQPA373, which is normally masked in the presence of the downstream sequence 374KKKPPPS380. Taken together, our data indicate that endocytosis and postendocytic sorting of GPCRs that undergo CIE could be sequence-dependent. PMID:26094760
[Cytotoxicity of chimera peptides incorporating sequences of cyclin kinases inhibitors].

PubMed

Kharchenko, V P; Kulinich, V G; Lunin, V G; Filiasova, E I; Shishkin, A M; Sergeenko, O V; Riazanova, E M; Voronina, O L; Bozhenko, V K

2007-01-01

The study is concerned with proapoptotic properties of chimera peptides which incorporate sequences of inhibitors of cyclin kinases p161NK4a and p21CIP/WAF1 as well as internalized sequences (Antp and tat). Sequences of the p16 type appeared to be more cytotoxic than the p21 one. Cytotoxic effect proved dependent on orientation with respect to the C or N terminal point of a polypeptide chain rather than on chimera sequence extent. Although p16 endogenous synthesis did not influence chimera peptide levels, apoptosis did not take place in certain cellular lines. Due to the rather unsophisticated nature of such synthesis, it might be used in designing individually-tailored chemotherapeutic drugs.
HLA mismatches and hematopoietic cell transplantation: structural simulations assess the impact of changes in peptide binding specificity on transplant outcome

PubMed Central

Yanover, Chen; Petersdorf, Effie W.; Malkki, Mari; Gooley, Ted; Spellman, Stephen; Velardi, Andrea; Bardy, Peter; Madrigal, Alejandro; Bignon, Jean-Denis; Bradley, Philip

2013-01-01

The success of hematopoietic cell transplantation from an unrelated donor depends in part on the degree of Human Histocompatibility Leukocyte Antigen (HLA) matching between donor and patient. We present a structure-based analysis of HLA mismatching, focusing on individual amino acid mismatches and their effect on peptide binding specificity. Using molecular modeling simulations of HLA-peptide interactions, we find evidence that amino acid mismatches predicted to perturb peptide binding specificity are associated with higher risk of mortality in a large and diverse dataset of patient-donor pairs assembled by the International Histocompatibility Working Group in Hematopoietic Cell Transplantation consortium. This analysis may represent a first step toward sequence-based prediction of relative risk for HLA allele mismatches. PMID:24482668
Dynamically corrected gates for singlet-triplet spin qubits with control-dependent errors

NASA Astrophysics Data System (ADS)

Jacobson, N. Tobias; Witzel, Wayne M.; Nielsen, Erik; Carroll, Malcolm S.

2013-03-01

Magnetic field inhomogeneity due to random polarization of quasi-static local magnetic impurities is a major source of environmentally induced error for singlet-triplet double quantum dot (DQD) spin qubits. Moreover, for singlet-triplet qubits this error may depend on the applied controls. This effect is significant when a static magnetic field gradient is applied to enable full qubit control. Through a configuration interaction analysis, we observe that the dependence of the field inhomogeneity-induced error on the DQD bias voltage can vary systematically as a function of the controls for certain experimentally relevant operating regimes. To account for this effect, we have developed a straightforward prescription for adapting dynamically corrected gate sequences that assume control-independent errors into sequences that compensate for systematic control-dependent errors. We show that accounting for such errors may lead to a substantial increase in gate fidelities. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. DOE's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Double-blind cross-over investigation of the effectiveness and safety of two doses of indoprofen compared with an ASA preparation and placebo in patients suffering from osteoarthritis.

PubMed

Valtonen, E; Bergamini, N; Groppi, W; Mandelli, V

1981-01-01

Eighty patients suffering from osteoarthritis of the large joints were admitted to the study and randomly allocated to a 4-treatment sequence, according to a multiple replication of a 4 x 4 Latin square design, with proper balancing of treatments, of periods and of the residual effects of drugs. Each treatment (indoprofen 300 or 600 mg/day, ASA 1500 + diazepam 6 mg/day, and matching placebo) was administered for 7 days. Examinations were carried out on admission, after a 3-4 day wash-out period, and then repeated at the end of each treatment period. Treatment with active drugs was significantly better than placebo in relieving overall pain, and in patient's and investigator's opinion on effectiveness. Treatment with indoprofen, at both dosages, was preferred more frequently than others. The incidence of adverse events during each period did not seem to depend either on the treatment being given during that period or on the previous one.
DArT Markers Effectively Target Gene Space in the Rye Genome

PubMed Central

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625
DArT Markers Effectively Target Gene Space in the Rye Genome.

PubMed

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.
Association mining of dependency between time series

NASA Astrophysics Data System (ADS)

Hafez, Alaaeldin

2001-03-01

Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.
Taxonomic structure and stability of the bacterial community in belgian sourdough ecosystems as assessed by culture and population fingerprinting.

PubMed

Scheirlinck, Ilse; Van der Meulen, Roel; Van Schoor, Ann; Vancanneyt, Marc; De Vuyst, Luc; Vandamme, Peter; Huys, Geert

2008-04-01

A total of 39 traditional sourdoughs were sampled at 11 bakeries located throughout Belgium which were visited twice with a 1-year interval. The taxonomic structure and stability of the bacterial communities occurring in these traditional sourdoughs were assessed using both culture-dependent and culture-independent methods. A total of 1,194 potential lactic acid bacterium (LAB) isolates were tentatively grouped and identified by repetitive element sequence-based PCR, followed by sequence-based identification using 16S rRNA and pheS genes from a selection of genotypically unique LAB isolates. In parallel, all samples were analyzed by denaturing gradient gel electrophoresis (DGGE) of V3-16S rRNA gene amplicons. In addition, extensive metabolite target analysis of more than 100 different compounds was performed. Both culturing and DGGE analysis showed that the species Lactobacillus sanfranciscensis, Lactobacillus paralimentarius, Lactobacillus plantarum, and Lactobacillus pontis dominated the LAB population of Belgian type I sourdoughs. In addition, DGGE band sequence analysis demonstrated the presence of Acetobacter sp. and a member of the Erwinia/Enterobacter/Pantoea group in some samples. Overall, the culture-dependent and culture-independent approaches each exhibited intrinsic limitations in assessing bacterial LAB diversity in Belgian sourdoughs. Irrespective of the LAB biodiversity, a large majority of the sugar and amino acid metabolites were detected in all sourdough samples. Principal component-based analysis of biodiversity and metabolic data revealed only little variation among the two samples of the sourdoughs produced at the same bakery. The rare cases of instability observed could generally be linked with variations in technological parameters or differences in detection capacity between culture-dependent and culture-independent approaches. Within a sampling interval of 1 year, this study reinforces previous observations that the bakery environment rather than the type or batch of flour largely determines the development of a stable LAB population in sourdoughs.
Taxonomic Structure and Stability of the Bacterial Community in Belgian Sourdough Ecosystems as Assessed by Culture and Population Fingerprinting▿ †

PubMed Central

Scheirlinck, Ilse; Van der Meulen, Roel; Van Schoor, Ann; Vancanneyt, Marc; De Vuyst, Luc; Vandamme, Peter; Huys, Geert

2008-01-01

A total of 39 traditional sourdoughs were sampled at 11 bakeries located throughout Belgium which were visited twice with a 1-year interval. The taxonomic structure and stability of the bacterial communities occurring in these traditional sourdoughs were assessed using both culture-dependent and culture-independent methods. A total of 1,194 potential lactic acid bacterium (LAB) isolates were tentatively grouped and identified by repetitive element sequence-based PCR, followed by sequence-based identification using 16S rRNA and pheS genes from a selection of genotypically unique LAB isolates. In parallel, all samples were analyzed by denaturing gradient gel electrophoresis (DGGE) of V3-16S rRNA gene amplicons. In addition, extensive metabolite target analysis of more than 100 different compounds was performed. Both culturing and DGGE analysis showed that the species Lactobacillus sanfranciscensis, Lactobacillus paralimentarius, Lactobacillus plantarum, and Lactobacillus pontis dominated the LAB population of Belgian type I sourdoughs. In addition, DGGE band sequence analysis demonstrated the presence of Acetobacter sp. and a member of the Erwinia/Enterobacter/Pantoea group in some samples. Overall, the culture-dependent and culture-independent approaches each exhibited intrinsic limitations in assessing bacterial LAB diversity in Belgian sourdoughs. Irrespective of the LAB biodiversity, a large majority of the sugar and amino acid metabolites were detected in all sourdough samples. Principal component-based analysis of biodiversity and metabolic data revealed only little variation among the two samples of the sourdoughs produced at the same bakery. The rare cases of instability observed could generally be linked with variations in technological parameters or differences in detection capacity between culture-dependent and culture-independent approaches. Within a sampling interval of 1 year, this study reinforces previous observations that the bakery environment rather than the type or batch of flour largely determines the development of a stable LAB population in sourdoughs. PMID:18310426
CD4-dependent characteristics of coreceptor use and HIV type 1 V3 sequence in a large population of therapy-naive individuals.

PubMed

Low, Andrew J; Marchant, David; Brumme, Chanson J; Brumme, Zabrina L; Dong, Winnie; Sing, Tobias; Hogg, Robert S; Montaner, Julio S G; Gill, Vikram; Cheung, Peter K; Harrigan, P Richard

2008-02-01

We investigated the associations between coreceptor use, V3 loop sequence, and CD4 count in a cross-sectional analysis of a large cohort of chronically HIV-infected, treatment-naive patients. HIV coreceptor usage was determined in the last pretherapy plasma sample for 977 individuals initiating HAART in British Columbia, Canada using the Monogram Trofile Tropism assay. Relative light unit (RLU) readouts from the Trofile assay, as well as HIV V3 loop sequence data, were examined as a function of baseline CD4 cell count for 953 (97%) samples with both phenotype and genotype data available. Median CCR5 RLUs were high for both R5 and X4-capable samples, while CXCR4 RLUs were orders of magnitude lower for X4 samples (p < 0.001). CCR5 RLUs in R5 samples (N = 799) increased with decreasing CD4 count (p < 0.001), but did not vary with plasma viral load (pVL) (p = 0.74). In X4 samples (N = 178), CCR5 RLUs decreased with decreasing CD4 count (p = 0.046) and decreasing pVL (p = 0.097), while CXCR4 RLUs increased with decreasing pVL (p = 0.0008) but did not vary with CD4 (p = 0.96). RLUs varied with the presence of substitutions at V3 loop positions 11, 25, and 6-8. The prevalence and impact of substitutions at codons 25 and 6-8 were CD4 dependent as was the presence of amino acid mixtures in the V3; substitutions at position 11 were CD4 independent. Assay RLU measures predictably vary with both immunological and virological parameters. The ability to predict X4 virus using genotypic determinants at positions 25 and 6-8 of the V3 loop is CD4 dependent, while position 11 appears to be CD4 independent.
Daytime Sleep Enhances Consolidation of the Spatial but Not Motoric Representation of Motor Sequence Memory

PubMed Central

Albouy, Geneviève; Fogel, Stuart; Pottiez, Hugo; Nguyen, Vo An; Ray, Laura; Lungu, Ovidiu; Carrier, Julie; Robertson, Edwin; Doyon, Julien

2013-01-01

Motor sequence learning is known to rely on more than a single process. As the skill develops with practice, two different representations of the sequence are formed: a goal representation built under spatial allocentric coordinates and a movement representation mediated through egocentric motor coordinates. This study aimed to explore the influence of daytime sleep (nap) on consolidation of these two representations. Through the manipulation of an explicit finger sequence learning task and a transfer protocol, we show that both allocentric (spatial) and egocentric (motor) representations of the sequence can be isolated after initial training. Our results also demonstrate that nap favors the emergence of offline gains in performance for the allocentric, but not the egocentric representation, even after accounting for fatigue effects. Furthermore, sleep-dependent gains in performance observed for the allocentric representation are correlated with spindle density during non-rapid eye movement (NREM) sleep of the post-training nap. In contrast, performance on the egocentric representation is only maintained, but not improved, regardless of the sleep/wake condition. These results suggest that motor sequence memory acquisition and consolidation involve distinct mechanisms that rely on sleep (and specifically, spindle) or simple passage of time, depending respectively on whether the sequence is performed under allocentric or egocentric coordinates. PMID:23300993
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214

A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences

NASA Technical Reports Server (NTRS)

Ho, P. S.; Ellison, M. J.; Quigley, G. J.; Rich, A.

1986-01-01

The ease with which a particular DNA segment adopts the left-handed Z-conformation depends largely on the sequence and on the degree of negative supercoiling to which it is subjected. We describe a computer program (Z-hunt) that is designed to search long sequences of naturally occurring DNA and retrieve those nucleotide combinations of up to 24 bp in length which show a strong propensity for Z-DNA formation. Incorporated into Z-hunt is a statistical mechanical model based on empirically determined energetic parameters for the B to Z transition accumulated to date. The Z-forming potential of a sequence is assessed by ranking its behavior as a function of negative superhelicity relative to the behavior of similar sized randomly generated nucleotide sequences assembled from over 80,000 combinations. The program makes it possible to compare directly the Z-forming potential of sequences with different base compositions and different sequence lengths. Using Z-hunt, we have analyzed the DNA sequences of the bacteriophage phi X174, plasmid pBR322, the animal virus SV40 and the replicative form of the eukaryotic adenovirus-2. The results are compared with those previously obtained by others from experiments designed to locate Z-DNA forming regions in these sequences using probes which show specificity for the left-handed DNA conformation.
Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

PubMed Central

Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

2005-01-01

Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
Long-range barcode labeling-sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Feng; Zhang, Tao; Singh, Kanwar K.

Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

PubMed

Xu, Weijia; Ozer, Stuart; Gutell, Robin R

2009-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

PubMed Central

Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

2010-01-01

With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
De novo assembly of human genomes with massively parallel short read sequencing.

PubMed

Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

2010-02-01

Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Dfam: a database of repetitive DNA based on profile hidden Markov models.

PubMed

Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

2013-01-01

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
Mapping stellar content to dark matter haloes - III. Environmental dependence and conformity of galaxy colours

NASA Astrophysics Data System (ADS)

Zu, Ying; Mandelbaum, Rachel

2018-05-01

Recent studies suggest that the quenching properties of galaxies are correlated over several megaparsecs. The large-scale `galactic conformity' phenomenon around central galaxies has been regarded as a potential signature of `galaxy assembly bias' or `pre-heating', both of which interpret conformity as a result of direct environmental effects acting on galaxy formation. Building on the iHOD halo quenching framework developed in Zu and Mandelbaum, we discover that our fiducial halo mass quenching model, without any galaxy assembly bias, can successfully explain the overall environmental dependence and the conformity of galaxy colours in Sloan Digital Sky Survey, as measured by the mark correlation functions of galaxy colours and the red galaxy fractions around isolated primaries, respectively. Our fiducial iHOD halo quenching mock also correctly predicts the differences in the spatial clustering and galaxy-galaxy lensing signals between the more versus less red galaxy subsamples, split by the red-sequence ridge line at fixed stellar mass. Meanwhile, models that tie galaxy colours fully or partially to halo assembly bias have difficulties in matching all these observables simultaneously. Therefore, we demonstrate that the observed environmental dependence of galaxy colours can be naturally explained by the combination of (1) halo quenching and (2) the variation of halo mass function with environment - an indirect environmental effect mediated by two separate physical processes.
Quantitative PCR high-resolution melting (qPCR-HRM) curve analysis, a new approach to simultaneously screen point mutations and large rearrangements: application to MLH1 germline mutations in Lynch syndrome.

PubMed

Rouleau, Etienne; Lefol, Cédrick; Bourdon, Violaine; Coulet, Florence; Noguchi, Tetsuro; Soubrier, Florent; Bièche, Ivan; Olschwang, Sylviane; Sobol, Hagay; Lidereau, Rosette

2009-06-01

Several techniques have been developed to screen mismatch repair (MMR) genes for deleterious mutations. Until now, two different techniques were required to screen for both point mutations and large rearrangements. For the first time, we propose a new approach, called "quantitative PCR (qPCR) high-resolution melting (HRM) curve analysis (qPCR-HRM)," which combines qPCR and HRM to obtain a rapid and cost-effective method suitable for testing a large series of samples. We designed PCR amplicons to scan the MLH1 gene using qPCR HRM. Seventy-six patients were fully scanned in replicate, including 14 wild-type patients and 62 patients with known mutations (57 point mutations and five rearrangements). To validate the detected mutations, we used sequencing and/or hybridization on a dedicated MLH1 array-comparative genomic hybridization (array-CGH). All point mutations and rearrangements detected by denaturing high-performance liquid chromatography (dHPLC)+multiplex ligation-dependent probe amplification (MLPA) were successfully detected by qPCR HRM. Three large rearrangements were characterized with the dedicated MLH1 array-CGH. One variant was detected with qPCR HRM in a wild-type patient and was located within the reverse primer. One variant was not detected with qPCR HRM or with dHPLC due to its proximity to a T-stretch. With qPCR HRM, prescreening for point mutations and large rearrangements are performed in one tube and in one step with a single machine, without the need for any automated sequencer in the prescreening process. In replicate, its reagent cost, sensitivity, and specificity are comparable to those of dHPLC+MLPA techniques. However, qPCR HRM outperformed the other techniques in terms of its rapidity and amount of data provided.
A Generative Angular Model of Protein Structure Evolution

PubMed Central

Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

2017-01-01

Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Mitochondrial genome of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa): A linear DNA molecule encoding a putative DNA-dependent DNA polymerase.

PubMed

Shao, Zhiyong; Graf, Shannon; Chaga, Oleg Y; Lavrov, Dennis V

2006-10-15

The 16,937-nuceotide sequence of the linear mitochondrial DNA (mt-DNA) molecule of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa) - the first mtDNA sequence from the class Scypozoa and the first sequence of a linear mtDNA from Metazoa - has been determined. This sequence contains genes for 13 energy pathway proteins, small and large subunit rRNAs, and methionine and tryptophan tRNAs. In addition, two open reading frames of 324 and 969 base pairs in length have been found. The deduced amino-acid sequence of one of them, ORF969, displays extensive sequence similarity with the polymerase [but not the exonuclease] domain of family B DNA polymerases, and this ORF has been tentatively identified as dnab. This is the first report of dnab in animal mtDNA. The genes in A. aurita mtDNA are arranged in two clusters with opposite transcriptional polarities; transcription proceeding toward the ends of the molecule. The determined sequences at the ends of the molecule are nearly identical but inverted and lack any obvious potential secondary structures or telomere-like repeat elements. The acquisition of mitochondrial genomic data for the second class of Cnidaria allows us to reconstruct characteristic features of mitochondrial evolution in this animal phylum.
Human Y chromosome copy number variation in the next generation sequencing era and beyond.

PubMed

Massaia, Andrea; Xue, Yali

2017-05-01

The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.
Temporal sequence learning in winner-take-all networks of spiking neurons demonstrated in a brain-based device.

PubMed

McKinstry, Jeffrey L; Edelman, Gerald M

2013-01-01

Animal behavior often involves a temporally ordered sequence of actions learned from experience. Here we describe simulations of interconnected networks of spiking neurons that learn to generate patterns of activity in correct temporal order. The simulation consists of large-scale networks of thousands of excitatory and inhibitory neurons that exhibit short-term synaptic plasticity and spike-timing dependent synaptic plasticity. The neural architecture within each area is arranged to evoke winner-take-all (WTA) patterns of neural activity that persist for tens of milliseconds. In order to generate and switch between consecutive firing patterns in correct temporal order, a reentrant exchange of signals between these areas was necessary. To demonstrate the capacity of this arrangement, we used the simulation to train a brain-based device responding to visual input by autonomously generating temporal sequences of motor actions.
Characterization of Zea mays endosperm C-24 sterol methyltransferase: one of two types of sterol methyltransferase in higher plants.

PubMed

Grebenok, R J; Galbraith, D W; Penna, D D

1997-08-01

We report the characterization of a higher-plant C-24 sterol methyltransferase by yeast complementation. A Zea mays endosperm expressed sequence tag (EST) was identified which, upon complete sequencing, showed 46% identity to the yeast C-24 methyltransferase gene (ERG6) and 75% and 37% amino acid identity to recently isolated higher-plant sterol methyltransferases from soybean and Arabidopsis, respectively. When placed under GALA regulation, the Z. mays cDNA functionally complemented the erg6 mutation, restoring ergosterol production and conferring resistance to cycloheximide. Complementation was both plasmid-dependent and galactose-inducible. The Z. mays cDNA clone contains an open reading frame encoding a 40 kDa protein containing motifs common to a large number of S-adenosyl-L-methionine methyltransferases (SMTs). Sequence comparisons and functional studies of the maize, soybean and Arabidopsis cDNAs indicates two types of C-24 SMTs exist in higher plants.
Microbial diversity of culturable heterotrophs in the rhizosphere of salt marsh grass, Porteresia coarctata (Tateoka) in a mangrove ecosystem.

PubMed

Bharathkumar, Srinivasan; Paul, Diby; Nair, Sudha

2008-02-01

A study was conducted to understand the complexity of bacterial diversity of rhizosphere of Porteresia coarctata based on culture dependent method. A large number of bacteria were isolated on nutrient agar medium supplemented with 1% NaCl and the dominant ones were further analyzed with PCR-RFLP method. The sequence analyses of the dominant strains revealed that most of the sequences belonged to members of gamma proteobacteria, firmicutes, bacteroidetes and uncultured bacteria. The phylogenetic analysis of 16S rRNA gene sequences revealed close relationships to a wide range of clones or bacterial species of various divisions. These results afford an understanding of the role of rhizobacteria in alleviating salt stress in Porteresia coarctata expected to contribute towards long-term goal of improving plant-microbe interactions for salinity affected fields. (c) 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Approaches for in silico finishing of microbial genome sequences

PubMed Central

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

2017-01-01

Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352
Approaches for in silico finishing of microbial genome sequences.

PubMed

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
In vitro selection of high temperature Zn(2+)-dependent DNAzymes.

PubMed

Nelson, Kevin E; Bruesehoff, Peter J; Lu, Yi

2005-08-01

In vitro selection of Zn(2+)-dependent RNA-cleaving DNAzymes with activity at 90 degrees C has yielded a diverse spool of selected sequences. The RNA cleavage efficiency was found in all cases to be specific for Zn(2+) over Pb(2+), Ca(2+), Cd(2+), Co(2+), Hg(2+), and Mg(2+). The Zn(2+)-dependent activity assay of the most active sequence showed that the DNAzyme possesses an apparent Zn(2+)-binding dissociation constant of 234 muM and that its activity increases with increasing temperatures from 50-90 degrees C. A fit of the Arrhenius plot data gave E(a) = 15.3 kcal mol(-1). Surprisingly, the selected Zn(2+)-dependent DNAzymes showed only a modest (approximately 3-fold) activity enhancement over the background rate of cleavage of random sequences containing a single embedded ribonucleotide within an otherwise DNA oligonucleotide. The result is attributable to the ability of DNA to sustain cleavage activity at high temperature with minimal secondary structure when Zn(2+) is present. Since this effect is highly specific for Zn(2+), this metal ion may play a special role in molecular evolution of nucleic acids at high temperature.
The Pan-STARRS1 medium-deep survey: The role of galaxy group environment in the star formation rate versus stellar mass relation and quiescent fraction out to z ∼ 0.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Lihwai; Chen, Chin-Wei; Coupon, Jean

2014-02-10

Using a large optically selected sample of field and group galaxies drawn from the Pan-STARRS1 Medium-Deep Survey (PS1/MDS), we present a detailed analysis of the specific star formation rate (SSFR)—stellar mass (M {sub *}) relation, as well as the quiescent fraction versus M {sub *} relation in different environments. While both the SSFR and the quiescent fraction depend strongly on stellar mass, the environment also plays an important role. Using this large galaxy sample, we confirm that the fraction of quiescent galaxies is strongly dependent on environment at a fixed stellar mass, but that the amplitude and the slope ofmore » the star-forming sequence is similar between the field and groups: in other words, the SSFR-density relation at a fixed stellar mass is primarily driven by the change in the star-forming and quiescent fractions between different environments rather than a global suppression in the star formation rate for the star-forming population. However, when we restrict our sample to the cluster-scale environments (M > 10{sup 14} M {sub ☉}), we find a global reduction in the SSFR of the star-forming sequence of 17% at 4σ confidence as opposed to its field counterpart. After removing the stellar mass dependence of the quiescent fraction seen in field galaxies, the excess in the quiescent fraction due to the environment quenching in groups and clusters is found to increase with stellar mass, although deeper and larger data from the full PS1/MDS will be required to draw firm conclusions. We argue that these results are in favor of galaxy mergers to be the primary environment quenching mechanism operating in galaxy groups whereas strangulation is able to reproduce the observed trend in the environment quenching efficiency and stellar mass relation seen in clusters. Our results also suggest that the relative importance between mass quenching and environment quenching depends on stellar mass—the mass quenching plays a dominant role in producing quiescent galaxies for more massive galaxies, while less massive galaxies are quenched mostly through the environmental effect, with the transition mass around 1-2 × 10{sup 10} M {sub ☉} in the group/cluster environment.« less
Resolution of model Holliday junctions by yeast endonuclease: effect of DNA structure and sequence.

PubMed Central

Parsons, C A; Murchie, A I; Lilley, D M; West, S C

1989-01-01

The resolution of Holliday junctions in DNA involves specific cleavage at or close to the site of the junction. A nuclease from Saccharomyces cerevisiae cleaves model Holliday junctions in vitro by the introduction of nicks in regions of duplex DNA adjacent to the crossover point. In previous studies [Parsons and West (1988) Cell, 52, 621-629] it was shown that cleavage occurred within homologous arm sequences with precise symmetry across the junction. In contrast, junctions with heterologous arm sequences were cleaved asymmetrically. In this work, we have studied the effect of sequence changes and base modification upon the site of cleavage. It is shown that the specificity of cleavage is unchanged providing that perfect homology is maintained between opposing arm sequences. However, in the absence of homology, cleavage depends upon sequence context and is affected by minor changes such as base modification. These data support the proposed mechanism for cleavage of a Holliday junction, which requires homologous alignment of arm sequences in an enzyme--DNA complex as a prerequisite for symmetrical cleavage by the yeast endonuclease. Images PMID:2653810

DNA Looping Facilitates Targeting of a Chromatin Remodeling Enzyme

PubMed Central

Yadon, Adam N; Singh, Badri Nath; Hampsey, Michael; Tsukiyama, Toshio

2013-01-01

Summary ATP-dependent chromatin remodeling enzymes are highly abundant and play pivotal roles regulating DNA-dependent processes. The mechanisms by which they are targeted to specific loci have not been well understood on a genome-wide scale. Here we present evidence that a major targeting mechanism for the Isw2 chromatin remodeling enzyme to specific genomic loci is through sequence-specific transcription factor (TF)-dependent recruitment. Unexpectedly, Isw2 is recruited in a TF-dependent fashion to a large number of loci without TF binding sites. Using the 3C assay, we show that Isw2 can be targeted by Ume6- and TFIIB-dependent DNA looping. These results identify DNA looping as a previously unknown mechanism for the recruitment of a chromatin remodeling enzyme and defines a novel function for DNA looping. We also present evidence suggesting that Ume6-dependent DNA looping is involved in chromatin remodeling and transcriptional repression, revealing a mechanism by which the three-dimensional folding of chromatin affects DNA-dependent processes. PMID:23478442
Listen up! Processing of intensity change differs for vocal and nonvocal sounds.

PubMed

Schirmer, Annett; Simpson, Elizabeth; Escoffier, Nicolas

2007-10-24

Changes in the intensity of both vocal and nonvocal sounds can be emotionally relevant. However, as only vocal sounds directly reflect communicative intent, intensity change of vocal but not nonvocal sounds is socially relevant. Here we investigated whether a change in sound intensity is processed differently depending on its social relevance. To this end, participants listened passively to a sequence of vocal or nonvocal sounds that contained rare deviants which differed from standards in sound intensity. Concurrently recorded event-related potentials (ERPs) revealed a mismatch negativity (MMN) and P300 effect for intensity change. Direction of intensity change was of little importance for vocal stimulus sequences, which recruited enhanced sensory and attentional resources for both loud and soft deviants. In contrast, intensity change in nonvocal sequences recruited more sensory and attentional resources for loud as compared to soft deviants. This was reflected in markedly larger MMN/P300 amplitudes and shorter P300 latencies for the loud as compared to soft nonvocal deviants. Furthermore, while the processing pattern observed for nonvocal sounds was largely comparable between men and women, sex differences for vocal sounds suggest that women were more sensitive to their social relevance. These findings extend previous evidence of sex differences in vocal processing and add to reports of voice specific processing mechanisms by demonstrating that simple acoustic change recruits more processing resources if it is socially relevant.
High-throughput genotyping of hop (Humulus lupulus L.) utilising diversity arrays technology (DArT).

PubMed

Howard, E L; Whittock, S P; Jakše, J; Carling, J; Matthews, P D; Probasco, G; Henning, J A; Darby, P; Cerenak, A; Javornik, B; Kilian, A; Koutoulis, A

2011-05-01

Implementation of molecular methods in hop (Humulus lupulus L.) breeding is dependent on the availability of sizeable numbers of polymorphic markers and a comprehensive understanding of genetic variation. However, use of molecular marker technology is limited due to expense, time inefficiency, laborious methodology and dependence on DNA sequence information. Diversity arrays technology (DArT) is a high-throughput cost-effective method for the discovery of large numbers of quality polymorphic markers without reliance on DNA sequence information. This study is the first to utilise DArT for hop genotyping, identifying 730 polymorphic markers from 92 hop accessions. The marker quality was high and similar to the quality of DArT markers previously generated for other species; although percentage polymorphism and polymorphism information content (PIC) were lower than in previous studies deploying other marker systems in hop. Genetic relationships in hop illustrated by DArT in this study coincide with knowledge generated using alternate methods. Several statistical analyses separated the hop accessions into genetically differentiated North American and European groupings, with hybrids between the two groups clearly distinguishable. Levels of genetic diversity were similar in the North American and European groups, but higher in the hybrid group. The markers produced from this time and cost-efficient genotyping tool will be a valuable resource for numerous applications in hop breeding and genetics studies, such as mapping, marker-assisted selection, genetic identity testing, guidance in the maintenance of genetic diversity and the directed breeding of superior cultivars.
Memory effect in M ≥ 6 earthquakes of South-North Seismic Belt, Mainland China

NASA Astrophysics Data System (ADS)

Wang, Jeen-Hwa

2013-07-01

The M ≥ 6 earthquakes occurred in the South-North Seismic Belt, Mainland China, during 1901-2008 are taken to study the possible existence of memory effect in large earthquakes. The fluctuation analysis technique is applied to analyze the sequences of earthquake magnitude and inter-event time represented in the natural time domain. Calculated results show that the exponents of scaling law of fluctuation versus window length are less than 0.5 for the sequences of earthquake magnitude and inter-event time. The migration of earthquakes in study is taken to discuss the possible correlation between events. The phase portraits of two sequent magnitudes and two sequent inter-event times are also applied to explore if large (or small) earthquakes are followed by large (or small) events. Together with all kinds of given information, we conclude that the earthquakes in study is short-term correlated and thus the short-term memory effect would be operative.
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

PubMed Central

Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C.P.G.M.; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R.; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce BJ; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia MT; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

2016-01-01

SUMMARY The extent to which low-frequency (minor allele frequency [MAF] between 1–5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is largely unknown. Bone mineral density (BMD) is highly heritable, is a major predictor of osteoporotic fractures and has been previously associated with common genetic variants1–8, and rare, population-specific, coding variants9. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n=2,882 from UK10K), whole-exome sequencing (n= 3,549), deep imputation of genotyped samples using a combined UK10K/1000Genomes reference panel (n=26,534), and de-novo replication genotyping (n= 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size 4-fold larger than the mean of previously reported common variants for lumbar spine BMD8 (rs11692564[T], MAF = 1.7%, replication effect size = +0.20 standard deviations [SD], Pmeta = 2×10−14), which was also associated with a decreased risk of fracture (OR = 0.85; P = 2×10−11; ncases = 98,742 and ncontrols = 409,511). Using an En1Cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, likely as a consequence of high bone turn-over. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817[T], MAF = 1.1%, replication effect size = +0.39 SD, Pmeta = 1×10−11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population. PMID:26367794
A novel on-line spatial-temporal k-anonymity method for location privacy protection from sequence rules-based inference attacks.

PubMed

Zhang, Haitao; Wu, Chenxue; Chen, Zewei; Liu, Zhao; Zhu, Yunhong

2017-01-01

Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non-sensitive rules variation ratios with the traditional spatial-temporal k-anonymity method. Furthermore, we also found the performance variation tendency from the parameter K value, which can help achieve the goal of hiding the maximum number of original sensitive rules while generating a minimum of new sensitive rules and affecting a minimum number of non-sensitive rules.
A novel on-line spatial-temporal k-anonymity method for location privacy protection from sequence rules-based inference attacks

PubMed Central

Wu, Chenxue; Liu, Zhao; Zhu, Yunhong

2017-01-01

Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non-sensitive rules variation ratios with the traditional spatial-temporal k-anonymity method. Furthermore, we also found the performance variation tendency from the parameter K value, which can help achieve the goal of hiding the maximum number of original sensitive rules while generating a minimum of new sensitive rules and affecting a minimum number of non-sensitive rules. PMID:28767687
Sequential Dependencies in Categorical Judgments of Radiographic Images

ERIC Educational Resources Information Center

Beckstead, Jason W.; Boutis, Kathy; Pecaric, Martin; Pusic, Martin V.

2017-01-01

Sequential context effects, the psychological interactions occurring between the events of successive trials when a sequence of similar stimuli are judged, have interested psychologists for decades. It has been well established that individuals exhibit sequential context effects in psychophysical experiments involving unidimensional stimuli.…
Nonspatial Sequence Coding in CA1 Neurons

PubMed Central

Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

2016-01-01

The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal region CA1 while rats performed a nonspatial sequence memory task. We found that hippocampal neurons code for the temporal context of items (whether odors were presented in the correct or incorrect sequential position) and that this activity is linked with memory performance. The discovery of this novel form of temporal coding in hippocampal neurons advances our fundamental understanding of the neurobiology of episodic memory and will serve as a foundation for our cross-species, multitechnique approach aimed at elucidating the neural mechanisms underlying memory impairments in aging and dementia. PMID:26843637
Mouse Vk gene classification by nucleic acid sequence similarity.

PubMed

Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

1989-01-01

Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

PubMed

Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

2016-06-01

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Research progress of plant population genomics based on high-throughput sequencing.

PubMed

Wang, Yun-sheng

2016-08-01

Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
Computational studies of sequence-specific driving forces in peptide self-assembly

NASA Astrophysics Data System (ADS)

Jeon, Joohyun

Peptides are biopolymers made from various sequences of twenty different types of amino acids, connected by peptide bonds. There are practically an infinite number of possible sequences and tremendous possible combinations of peptide-peptide interactions. Recently, an increasing number of studies have shown a stark variety of peptide self-assembled nanomaterials whose detailed structures depend on their sequences and environmental factors; these have end uses in medical and bio-electronic applications, for example. To understand the underlying physics of complex peptide self-assembly processes and to delineate sequence specific effects, in this study, I use various simulation tools spanning all-atom molecular dynamics to simple lattice models and quantify the balance of interactions in the peptide self-assembly processes. In contrast to the existing view that peptides' aggregation propensities are proportional to the net sequence hydrophobicity and inversely proportional to the net charge, I show the more nuanced effects of electrostatic interactions, including the cooperative effects between hydrophobic and electrostatic interactions. Notably, I suggest rather unexpected, yet important roles of entropies in the small scale oligomerization processes. Overall, this study broadens our understanding of the role of thermodynamic driving forces in peptide self-assembly.
The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs.

PubMed

Thompson, Jennifer A; Fielding, Katherine; Hargreaves, James; Copas, Andrew

2017-12-01

Background/Aims We sought to optimise the design of stepped wedge trials with an equal allocation of clusters to sequences and explored sample size comparisons with alternative trial designs. Methods We developed a new expression for the design effect for a stepped wedge trial, assuming that observations are equally correlated within clusters and an equal number of observations in each period between sequences switching to the intervention. We minimised the design effect with respect to (1) the fraction of observations before the first and after the final sequence switches (the periods with all clusters in the control or intervention condition, respectively) and (2) the number of sequences. We compared the design effect of this optimised stepped wedge trial to the design effects of a parallel cluster-randomised trial, a cluster-randomised trial with baseline observations, and a hybrid trial design (a mixture of cluster-randomised trial and stepped wedge trial) with the same total cluster size for all designs. Results We found that a stepped wedge trial with an equal allocation to sequences is optimised by obtaining all observations after the first sequence switches and before the final sequence switches to the intervention; this means that the first sequence remains in the control condition and the last sequence remains in the intervention condition for the duration of the trial. With this design, the optimal number of sequences is [Formula: see text], where [Formula: see text] is the cluster-mean correlation, [Formula: see text] is the intracluster correlation coefficient, and m is the total cluster size. The optimal number of sequences is small when the intracluster correlation coefficient and cluster size are small and large when the intracluster correlation coefficient or cluster size is large. A cluster-randomised trial remains more efficient than the optimised stepped wedge trial when the intracluster correlation coefficient or cluster size is small. A cluster-randomised trial with baseline observations always requires a larger sample size than the optimised stepped wedge trial. The hybrid design can always give an equally or more efficient design, but will be at most 5% more efficient. We provide a strategy for selecting a design if the optimal number of sequences is unfeasible. For a non-optimal number of sequences, the sample size may be reduced by allowing a proportion of observations before the first or after the final sequence has switched. Conclusion The standard stepped wedge trial is inefficient. To reduce sample sizes when a hybrid design is unfeasible, stepped wedge trial designs should have no observations before the first sequence switches or after the final sequence switches.
Optical mapping and its potential for large-scale sequencing projects.

PubMed

Aston, C; Mishra, B; Schwartz, D C

1999-07-01

Physical mapping has been rediscovered as an important component of large-scale sequencing projects. Restriction maps provide landmark sequences at defined intervals, and high-resolution restriction maps can be assembled from ensembles of single molecules by optical means. Such optical maps can be constructed from both large-insert clones and genomic DNA, and are used as a scaffold for accurately aligning sequence contigs generated by shotgun sequencing.
TaqMan Real-Time PCR Assays To Assess Arbuscular Mycorrhizal Responses to Field Manipulation of Grassland Biodiversity: Effects of Soil Characteristics, Plant Species Richness, and Functional Traits▿ †

PubMed Central

König, Stephan; Wubet, Tesfaye; Dormann, Carsten F.; Hempel, Stefan; Renker, Carsten; Buscot, François

2010-01-01

Large-scale (temporal and/or spatial) molecular investigations of the diversity and distribution of arbuscular mycorrhizal fungi (AMF) require considerable sampling efforts and high-throughput analysis. To facilitate such efforts, we have developed a TaqMan real-time PCR assay to detect and identify AMF in environmental samples. First, we screened the diversity in clone libraries, generated by nested PCR, of the nuclear ribosomal DNA internal transcribed spacer (ITS) of AMF in environmental samples. We then generated probes and forward primers based on the detected sequences, enabling AMF sequence type-specific detection in TaqMan multiplex real-time PCR assays. In comparisons to conventional clone library screening and Sanger sequencing, the TaqMan assay approach provided similar accuracy but higher sensitivity with cost and time savings. The TaqMan assays were applied to analyze the AMF community composition within plots of a large-scale plant biodiversity manipulation experiment, the Jena Experiment, primarily designed to investigate the interactive effects of plant biodiversity on element cycling and trophic interactions. The results show that environmental variables hierarchically shape AMF communities and that the sequence type spectrum is strongly affected by previous land use and disturbance, which appears to favor disturbance-tolerant members of the genus Glomus. The AMF species richness of disturbance-associated communities can be largely explained by richness of plant species and plant functional groups, while plant productivity and soil parameters appear to have only weak effects on the AMF community. PMID:20418424
Sequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs.

PubMed

Allevato, Michael; Bolotin, Eugene; Grossman, Mark; Mane-Padros, Daniel; Sladek, Frances M; Martinez, Ernest

2017-01-01

The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX) bind Enhancer box (E-box) DNA elements (CANNTG) and have the greatest affinity for the canonical MYC E-box (CME) CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to bind DNA in a "non-specific" fashion in vitro and in vivo. Here, in order to identify potential additional non-canonical binding sites for MYC/MAX, we employed high throughput in vitro protein-binding microarrays, along with electrophoretic mobility-shift assays and bioinformatic analyses of MYC-bound genomic loci in vivo. We identified all hexameric motifs preferentially bound by MYC/MAX in vitro, which include the low-affinity non-E-box sequence AACGTT, and found that the vast majority (87%) of MYC-bound genomic sites in a human B cell line contain at least one of the top 21 motifs bound by MYC:MAX in vitro. We further show that high MYC/MAX concentrations are needed for specific binding to the low-affinity sequence AACGTT in vitro and that elevated MYC levels in vivo more markedly increase the occupancy of AACGTT sites relative to CME sites, especially at distal intergenic and intragenic loci. Hence, MYC binds diverse DNA motifs with a broad range of affinities in a sequence-specific and dose-dependent manner, suggesting that MYC overexpression has more selective effects on the tumor transcriptome than previously thought.
A STELLAR-MASS-DEPENDENT DROP IN PLANET OCCURRENCE RATES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mulders, Gijs D.; Pascucci, Ilaria; Apai, Dániel

2015-01-10

The Kepler spacecraft has discovered a large number of planets with up to one-year periods and down to terrestrial sizes. While the majority of the target stars are main-sequence dwarfs of spectral type F, G, and K, Kepler covers stars with effective temperatures as low as 2500 K, which corresponds to M stars. These cooler stars allow characterization of small planets near the habitable zone, yet it is not clear if this population is representative of that around FGK stars. In this paper, we calculate the occurrence of planets around stars of different spectral types as a function of planetmore » radius and distance from the star and show that they are significantly different from each other. We further identify two trends. First, the occurrence of Earth- to Neptune-sized planets (1-4 R {sub ⊕}) is successively higher toward later spectral types at all orbital periods probed by Kepler; planets around M stars occur twice as frequently as around G stars, and thrice as frequently as around F stars. Second, a drop in planet occurrence is evident at all spectral types inward of a ∼10 day orbital period, with a plateau further out. By assigning to each spectral type a median stellar mass, we show that the distance from the star where this drop occurs is stellar mass dependent, and scales with semi-major axis as the cube root of stellar mass. By comparing different mechanisms of planet formation, trapping, and destruction, we find that this scaling best matches the location of the pre-main-sequence co-rotation radius, indicating efficient trapping of migrating planets or planetary building blocks close to the star. These results demonstrate the stellar-mass dependence of the planet population, both in terms of occurrence rate and of orbital distribution. The prominent stellar-mass dependence of the inner boundary of the planet population shows that the formation or migration of planets is sensitive to the stellar parameters.« less
Effect of ionic strength and cationic DNA affinity binders on the DNA sequence selective alkylation of guanine N7-positions by nitrogen mustards

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hartley, J.A.; Forrow, S.M.; Souhami, R.L.

Large variations in alkylation intensities exist among guanines in a DNA sequence following treatment with chemotherapeutic alkylating agents such as nitrogen mustards, and the substituent attached to the reactive group can impose a distinct sequence preference for reaction. In order to understand further the structural and electrostatic factors which determine the sequence selectivity of alkylation reactions, the effect of increase ionic strength, the intercalator ethidium bromide, AT-specific minor groove binders distamycin A and netropsin, and the polyamine spermine on guanine N7-alkylation by L-phenylalanine mustard (L-Pam), uracil mustard (UM), and quinacrine mustard (QM) was investigated with a modification of the guanine-specificmore » chemical cleavage technique for DNA sequencing. The result differed with both the nitrogen mustard and the cationic agent used. The effect, which resulted in both enhancement and suppression of alkylation sites, was most striking in the case of netropsin and distamycin A, which differed from each other. DNA footprinting indicated that selective binding to AT sequences in the minor groove of DNA can have long-range effects on the alkylation pattern of DNA in the major groove.« less

Order of arrival affects competition in two reef fishes.

PubMed

Geange, Shane W; Stier, Adrian C

2009-10-01

Many communities experience repeated periods of colonization due to seasonally regenerating habitats or pulsed arrival of young-of-year. When an individual's persistence in a community depends upon the strength of competitive interactions, changes in the timing of arrival relative to the arrival of a competitor can modify competitive strength and, ultimately, establishment in the community. We investigated whether the strength of intracohort competitive interactions between recent settlers of the reef fishes Thalassoma hardwicke and T. quinquevittatum are dependent on the sequence and temporal separation of their arrival into communities. To achieve this, we manipulated the sequence and timing of arrival of each species onto experimental patch reefs by simulating settlement pulses and monitoring survival and aggressive interactions. Both species survived best in the absence of competitors, but when competitors were present, they did best when they arrived at the same time. Survival declined as each species entered the community progressively later than its competitor and as aggression by its competitor increased. Intraspecific effects of resident T. hardwicke were similar to interspecific effects. This study shows that the strength of competition depends not only on the identity of competitors, but also on the sequence and timing of their interactions, suggesting that when examining interaction strengths, it is important to identify temporal variability in the direction and magnitude of their effects. Furthermore, our findings provide empirical evidence for the importance of competitive lotteries in the maintenance of species diversity in demographically open marine systems.
DNA isolation protocol effects on nuclear DNA analysis by microarrays, droplet digital PCR, and whole genome sequencing, and on mitochondrial DNA copy number estimation.

PubMed

Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H; Proukakis, Christos

2017-01-01

Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance.
DNA isolation protocol effects on nuclear DNA analysis by microarrays, droplet digital PCR, and whole genome sequencing, and on mitochondrial DNA copy number estimation

PubMed Central

Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M.; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H.

2017-01-01

Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array “waves”, and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance. PMID:28683077
Sequence dependency of canonical base pair opening in the DNA double helix

PubMed Central

Villa, Alessandra

2017-01-01

The flipping-out of a DNA base from the double helical structure is a key step of many cellular processes, such as DNA replication, modification and repair. Base pair opening is the first step of base flipping and the exact mechanism is still not well understood. We investigate sequence effects on base pair opening using extensive classical molecular dynamics simulations targeting the opening of 11 different canonical base pairs in two DNA sequences. Two popular biomolecular force fields are applied. To enhance sampling and calculate free energies, we bias the simulation along a simple distance coordinate using a newly developed adaptive sampling algorithm. The simulation is guided back and forth along the coordinate, allowing for multiple opening pathways. We compare the calculated free energies with those from an NMR study and check assumptions of the model used for interpreting the NMR data. Our results further show that the neighboring sequence is an important factor for the opening free energy, but also indicates that other sequence effects may play a role. All base pairs are observed to have a propensity for opening toward the major groove. The preferred opening base is cytosine for GC base pairs, while for AT there is sequence dependent competition between the two bases. For AT opening, we identify two non-canonical base pair interactions contributing to a local minimum in the free energy profile. For both AT and CG we observe long-lived interactions with water and with sodium ions at specific sites on the open base pair. PMID:28369121
A Single Transcriptome of a Green Toad (Bufo viridis) Yields Candidate Genes for Sex Determination and -Differentiation and Non-Anonymous Population Genetic Markers

PubMed Central

Gerchen, Jörn F.; Reichert, Samuel J.; Röhr, Johannes T.; Dieterich, Christoph; Kloas, Werner

2016-01-01

Large genome size, including immense repetitive and non-coding fractions, still present challenges for capacity, bioinformatics and thus affordability of whole genome sequencing in most amphibians. Here, we test the performance of a single transcriptome to understand whether it can provide a cost-efficient resource for species with large unknown genomes. Using RNA from six different tissues from a single Palearctic green toad (Bufo viridis) specimen and Hiseq2000, we obtained 22,5 Mio reads and publish >100,000 unigene sequences. To evaluate efficacy and quality, we first use this data to identify green toad specific candidate genes, known from other vertebrates for their role in sex determination and differentiation. Of a list of 37 genes, the transcriptome yielded 32 (87%), many of which providing the first such data for this non-model anuran species. However, for many of these genes, only fragments could be retrieved. In order to allow also applications to population genetics, we further used the transcriptome for the targeted development of 21 non-anonymous microsatellites and tested them in genetic families and backcrosses. Eleven markers were specifically developed to be located on the B. viridis sex chromosomes; for eight markers we can indeed demonstrate sex-specific transmission in genetic families. Depending on phylogenetic distance, several markers, which are sex-linked in green toads, show high cross-amplification success across the anuran phylogeny, involving nine systematic anuran families. Our data support the view that single transcriptome sequencing (based on multiple tissues) provides a reliable genomic resource and cost-efficient method for non-model amphibian species with large genome size and, despite limitations, should be considered as long as genome sequencing remains unaffordable for most species. PMID:27232626
DOE Office of Scientific and Technical Information (OSTI.GOV)

Imamura, Yasuhiro, E-mail: yimamura@po.mdu.ac.jp; Wang, Pao-Li; Masuno, Kazuya

Histatins are salivary proteins with antimicrobial activities. We previously reported that histatin 3 binds to heat shock cognate protein 70 (HSC70), which is constitutively expressed, and induces DNA synthesis stimulation and promotes human gingival fibroblast (HGF) survival. However, the underlying mechanisms of histatin 3 remain largely unknown. Here, we found that the KRHH sequence of histatin 3 at the amino acid positions 5–8 was essential for enhancing p27{sup Kip1} (a cyclin-dependent kinase inhibitor) binding to HSC70 that occurred in a dose-dependent manner; histatin 3 enhanced the binding between p27{sup Kip1} and HSC70 during the G{sub 1}/S transition of HGFs asmore » opposed to histatin 3-M(5–8) (substitution of KRHH for EEDD in histatin 3). Histatin 3, but not histatin 3-M(5–8), stimulated DNA synthesis and promoted HGF survival. Histatin 3 dose-dependently enhanced both p27{sup Kip1} and HSC70 ubiquitination, whereas histatin 3-M(5–8) did not. These findings provide further evidence that histatin 3 may be involved in the regulation of cell proliferation, particularly during G{sub 1}/S transition, via the ubiquitin–proteasome system of p27{sup Kip1} and HSC70. - Highlights: • KRHH amino acid sequence was required in histatin 3 to bind HSC70. • Histatin 3 enhanced HSC70 binding to p27{sup Kip1} during the G{sub 1}/S transition in HGFs. • KRHH sequence stimulated DNA synthesis and promoted cell survival. • Histatin 3 dose-dependently enhanced both p27{sup Kip1} and HSC70 ubiquitination. • Histatin 3 stimulates cell proliferation via the ubiquitin–proteasome system.« less
Identification of 47 novel mutations in patients with Alport syndrome and thin basement membrane nephropathy.

PubMed

Weber, Stefanie; Strasser, Katja; Rath, Sabine; Kittke, Achim; Beicht, Sonja; Alberer, Martin; Lange-Sperandio, Bärbel; Hoyer, Peter F; Benz, Marcus R; Ponsel, Sabine; Weber, Lutz T; Klein, Hanns-Georg; Hoefele, Julia

2016-06-01

Alport syndrome (ATS) is a progressive hereditary nephropathy characterized by hematuria and proteinuria. It can be associated with extrarenal manifestations. In contrast, thin basement membrane nephropathy (TBMN) is characterized by microscopic hematuria, is largely asymptomatic, and is rarely associated with proteinuria and end-stage renal disease. Mutations have been identified in the COL4A5 gene in ATS and in the COL4A3 and COL4A4 genes in ATS and TBMN. To date, more than 1000 different mutations in COL4A5, COL4A3, and COL4A4 are known. In this study mutational analysis by exon sequencing and multiplex ligation-dependent probe amplification was performed in a large European cohort of families with ATS and TBMN. Molecular diagnostic testing of 216 individuals led to the detection of 47 novel mutations, thereby expanding the spectrum of known mutations causing ATS and TBMN by up to 10 and 6%, respectively, depending on the database. Remarkably, a high number of ATS patients with only single mutations in COL4A3 and COL4A4 were identified. Additionally, three ATS patients presented with synonymous sequence variants that possible affect correct mRNA splicing, as suggested by in silico analysis. The results of this study clearly broaden the genotypic spectrum of known mutations for ATS and TBMN, which will in turn now facilitate future studies into genotype-phenotype correlations. Further studies should also examine the significance of single heterozygous mutations in COL4A3 and COL4A4 and of synonymous sequence variants associated with ATS.
Regulation of polycystin-1 ciliary trafficking by motifs at its C-terminus and polycystin-2 but not by cleavage at the GPS site

PubMed Central

Su, Xuefeng; Wu, Maoqing; Yao, Gang; El-Jouni, Wassim; Luo, Chong; Tabari, Azadeh; Zhou, Jing

2015-01-01

ABSTRACT Failure to localize membrane proteins to the primary cilium causes a group of diseases collectively named ciliopathies. Polycystin-1 (PC1, also known as PKD1) is a large ciliary membrane protein defective in autosomal dominant polycystic kidney disease (ADPKD). Here, we developed a large set of PC1 expression constructs and identified multiple sequences, including a coiled-coil motif in the C-terminal tail of PC1, regulating full-length PC1 trafficking to the primary cilium. Ciliary trafficking of wild-type and mutant PC1 depends on the dose of polycystin-2 (PC2, also known as PKD2), and the formation of a PC1–PC2 complex. Modulation of the ciliary trafficking module mediated by the VxP ciliary-targeting sequence and Arf4 and Asap1 does not affect the ciliary localization of full-length PC1. PC1 also promotes PC2 ciliary trafficking. PC2 mutations truncating its C-terminal tail but not those changing the VxP sequence to AxA or impairing the pore of the channel, leading to a dead channel, affect PC1 ciliary trafficking. Cleavage at the GPCR proteolytic site (GPS) of PC1 is not required for PC1 trafficking to cilia. We propose a mutually dependent model for the ciliary trafficking of PC1 and PC2, and that PC1 ciliary trafficking is regulated by multiple cis-acting elements. As all pathogenic PC1 mutations tested here are defective in ciliary trafficking, ciliary trafficking might serve as a functional read-out for ADPKD. PMID:26430213
Regulation of polycystin-1 ciliary trafficking by motifs at its C-terminus and polycystin-2 but not by cleavage at the GPS site.

PubMed

Su, Xuefeng; Wu, Maoqing; Yao, Gang; El-Jouni, Wassim; Luo, Chong; Tabari, Azadeh; Zhou, Jing

2015-11-15

Failure to localize membrane proteins to the primary cilium causes a group of diseases collectively named ciliopathies. Polycystin-1 (PC1, also known as PKD1) is a large ciliary membrane protein defective in autosomal dominant polycystic kidney disease (ADPKD). Here, we developed a large set of PC1 expression constructs and identified multiple sequences, including a coiled-coil motif in the C-terminal tail of PC1, regulating full-length PC1 trafficking to the primary cilium. Ciliary trafficking of wild-type and mutant PC1 depends on the dose of polycystin-2 (PC2, also known as PKD2), and the formation of a PC1-PC2 complex. Modulation of the ciliary trafficking module mediated by the VxP ciliary-targeting sequence and Arf4 and Asap1 does not affect the ciliary localization of full-length PC1. PC1 also promotes PC2 ciliary trafficking. PC2 mutations truncating its C-terminal tail but not those changing the VxP sequence to AxA or impairing the pore of the channel, leading to a dead channel, affect PC1 ciliary trafficking. Cleavage at the GPCR proteolytic site (GPS) of PC1 is not required for PC1 trafficking to cilia. We propose a mutually dependent model for the ciliary trafficking of PC1 and PC2, and that PC1 ciliary trafficking is regulated by multiple cis-acting elements. As all pathogenic PC1 mutations tested here are defective in ciliary trafficking, ciliary trafficking might serve as a functional read-out for ADPKD. © 2015. Published by The Company of Biologists Ltd.
Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants?

PubMed

Yilmaz, Yildiz E; Bull, Shelley B

2011-11-29

Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.
SU-G-IeP1-08: MR Geometric Distortion Dependency On Imaging Sequence, Acquisition Orientation and Receiver Bandwidth of a Dedicated 1.5T MR-Simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Law, M; Yuan, J; Wong, O

Purpose: To investigate the 3D geometric distortion of four potential MR sequences for radiotheraptic applications, and its dependency on sequence-type, acquisition-orientation and receiver-bandwidth from a dedicated 1.5T 700mm-wide bore MR-simulator (Magnetom-Aera, Sienmens Healthcare, Erlangen, Germany), using a large customized geometric accuracy phantom. Methods: This work studied 3D gradient-echo (VIBE) and spin-echo (SPACE) sequences for anatomical imaging; a specific ultra-short-TE sequence (PETRA) potentially for bone imaging and MR-based dosimetry; and a motion-insensitive sequence (BLADE) for dynamic applications like 4D-MRI. Integrated geometric-correction was employed, three orthogonal acquisition-orientations and up to three receiver-bandwidths were used, yielding 27 acquisitions for testing (Table 1a).A customizedmore » geometric accuracy phantom (polyurethane, MR/CT invisible, W×L×H:55×55×32.5cm3) was constructed and filled with 3892 spherical markers (6mm diameter, MR/CT visible) arranged on a 25mm-interval 3D isotropic-grid (Fig.1). The marker positions in MR images were quantitatively calculated and compared against those in the CT-reference using customized MatLab scripts. Results: The average distortion within various diameter-of-spherical-volumes (DSVs) and the usable DSVs under various distortion limits were measured (Tables 1b-c). It was observed that distortions fluctuated when sequence-type, acquisition-orientation or receiver-bandwidth changed (e.g. within 300mm-DSV, the lowest/highest average distortions of VIBE were 0.40mm/0.59mm, a 47.5% difference). According to AAPM-TG66 (<1mm distortion, left-most column of Table 1c), PETRA (Largest-DSV:253.9mm) has the potential on brain treatment, while BLADE (Largest-DSV:207.2mm) may need improvement for thoracic/abdominal applications. The results of VIBE (Largest-DSVs:294.3mm, the best among tested acquisitions) and SPACE (Largest-DSVs:267.7mm) suggests their potentials on head and neck applications. These Largest-DSVs were attained on different acquisition-orientations and receiver-bandwidths. Conclusion: Geometric distortion was shown to be dependent on sequence-type, acquisition-orientation and receiver-bandwidth. In the experiment, no configuration in any one of these factors could consistently reduce distortion while the others were varying. The distortion analysis result is a valuable guideline for sequence selection and optimization for MR-aided radiotherapy applications.« less
Using local chromatin structure to improve CRISPR/Cas9 efficiency in zebrafish.

PubMed

Chen, Yunru; Zeng, Shiyang; Hu, Ruikun; Wang, Xiangxiu; Huang, Weilai; Liu, Jiangfang; Wang, Luying; Liu, Guifen; Cao, Ying; Zhang, Yong

2017-01-01

Although the CRISPR/Cas9 has been successfully applied in zebrafish, considerable variations in efficiency have been observed for different gRNAs. The workload and cost of zebrafish mutant screening is largely dependent on the mutation rate of injected embryos; therefore, selecting more effective gRNAs is especially important for zebrafish mutant construction. Besides the sequence features, local chromatin structures may have effects on CRISPR/Cas9 efficiency, which remain largely unexplored. In the only related study in zebrafish, nucleosome organization was not found to have an effect on CRISPR/Cas9 efficiency, which is inconsistent with recent studies in vitro and in mammalian cell lines. To understand the effects of local chromatin structure on CRISPR/Cas9 efficiency in zebrafish, we first determined that CRISPR/Cas9 introduced genome editing mainly before the dome stage. Based on this observation, we reanalyzed our published nucleosome organization profiles and generated chromatin accessibility profiles in the 256-cell and dome stages using ATAC-seq technology. Our study demonstrated that chromatin accessibility showed positive correlation with CRISPR/Cas9 efficiency, but we did not observe a clear correlation between nucleosome organization and CRISPR/Cas9 efficiency. We constructed an online database for zebrafish gRNA selection based on local chromatin structure features that could prove beneficial to zebrafish homozygous mutant construction via CRISPR/Cas9.
Statistical context shapes stimulus-specific adaptation in human auditory cortex

PubMed Central

Henry, Molly J.; Fromboluti, Elisa Kim; McAuley, J. Devin

2015-01-01

Stimulus-specific adaptation is the phenomenon whereby neural response magnitude decreases with repeated stimulation. Inconsistencies between recent nonhuman animal recordings and computational modeling suggest dynamic influences on stimulus-specific adaptation. The present human electroencephalography (EEG) study investigates the potential role of statistical context in dynamically modulating stimulus-specific adaptation by examining the auditory cortex-generated N1 and P2 components. As in previous studies of stimulus-specific adaptation, listeners were presented with oddball sequences in which the presentation of a repeated tone was infrequently interrupted by rare spectral changes taking on three different magnitudes. Critically, the statistical context varied with respect to the probability of small versus large spectral changes within oddball sequences (half of the time a small change was most probable; in the other half a large change was most probable). We observed larger N1 and P2 amplitudes (i.e., release from adaptation) for all spectral changes in the small-change compared with the large-change statistical context. The increase in response magnitude also held for responses to tones presented with high probability, indicating that statistical adaptation can overrule stimulus probability per se in its influence on neural responses. Computational modeling showed that the degree of coadaptation in auditory cortex changed depending on the statistical context, which in turn affected stimulus-specific adaptation. Thus the present data demonstrate that stimulus-specific adaptation in human auditory cortex critically depends on statistical context. Finally, the present results challenge the implicit assumption of stationarity of neural response magnitudes that governs the practice of isolating established deviant-detection responses such as the mismatch negativity. PMID:25652920
Adaptive oscillator networks with conserved overall coupling: Sequential firing and near-synchronized states

NASA Astrophysics Data System (ADS)

Picallo, Clara B.; Riecke, Hermann

2011-03-01

Motivated by recent observations in neuronal systems we investigate all-to-all networks of nonidentical oscillators with adaptive coupling. The adaptation models spike-timing-dependent plasticity in which the sum of the weights of all incoming links is conserved. We find multiple phase-locked states that fall into two classes: near-synchronized states and splay states. Among the near-synchronized states are states that oscillate with a frequency that depends only very weakly on the coupling strength and is essentially given by the frequency of one of the oscillators, which is, however, neither the fastest nor the slowest oscillator. In sufficiently large networks the adaptive coupling is found to develop effective network topologies dominated by one or two loops. This results in a multitude of stable splay states, which differ in their firing sequences. With increasing coupling strength their frequency increases linearly and the oscillators become less synchronized. The essential features of the two classes of states are captured analytically in perturbation analyses of the extended Kuramoto model used in the simulations.
Quantification of the effects of eustasy, subsidence, and sediment supply on Miocene sequences, mid-Atlantic margin of the United States

USGS Publications Warehouse

Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Kominz, M.A.; Sugarman, P.J.; Monteverde, D.; Feigenson, M.D.; Hernandez, J.C.

2006-01-01

We use backstripping to quantify the roles of variations in global sea level (eustasy), subsidence, and sediment supply on the development of the Miocene stratigraphic record of the mid-Atlantic continental margin of the United States (New Jersey, Delaware, and Maryland). Eustasy is a primary influence on sequence patterns, determining the global template of sequences (i.e., times when sequences can be preserved) and explaining similarities in Miocene sequence architecture on margins throughout the world. Sequences can be correlated throughout the mid-Atlantic region with Sr-isotopic chronology (??0.6 m.y. to ??1.2 m.y.). Eight Miocene sequences correlate regionally and can be correlated to global ??18O increases, indicating glacioeustatic control. This margin is dominated by passive subsidence with little evidence for active tectonic overprints, except possibly in Maryland during the early Miocene. However, early Miocene sequences in New Jersey and Delaware display a patchwork distribution that is attributable to minor (tens of meters) intervals of excess subsidence. Backstripping quantifies that excess subsidence began in Delaware at ca. 21 Ma and continued until 12 Ma, with maximum rates from ca. 21-16 Ma. We attribute this enhanced subsidence to local flexural response to the progradation of thick sequences offshore and adjacent to this area. Removing this excess subsidence in Delaware yields a record that is remarkably similar to New Jersey eustatic estimates. We conclude that sea-level rise and fall is a first-order control on accommodation providing similar timing on all margins to the sequence record. Tectonic changes due to movement of the crust can overprint the record, resulting in large gaps in the stratigraphic record. Smaller differences in sequences can be attributed to local flexural loading effects, particularly in regions experiencing large-scale progradation. ?? 2006 Geological Society of America.
Unusual respiratory capacity and nitrogen metabolism in a Parcubacterium (OD1) of the Candidate Phyla Radiation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castelle, Cindy J.; Brown, Christopher T.; Thomas, Brian C.

The Candidate Phyla Radiation (CPR) is a large group of bacteria, the scale of which approaches that of all other bacteria. CPR organisms are inferred to depend on other community members for many basic cellular building blocks and all appear to be obligate anaerobes. To date, there has been no evidence for any significant respiratory capacity in an organism from this radiation. Here we report a curated draft genome for Candidatus Parcunitrobacter nitroensis' a member of the Parcubacteria (OD1) superphylum of the CPR. The genome encodes versatile energy pathways, including fermentative and respiratory capacities, nitrogen and fatty acid metabolism, asmore » well as the first complete electron transport chain described for a member of the CPR. The sequences of all of these enzymes are highly divergent from sequences found in other organisms, suggesting that these capacities were not recently acquired from non-CPR organisms. Although the wide respiration-based repertoire points to a different lifestyle compared to other CPR bacteria, we predict similar obligate dependence on other organisms or the microbial community. The results substantially expand the known metabolic potential of CPR bacteria, although sequence comparisons indicate that these capacities are very rare in members of this radiation.« less
Unusual respiratory capacity and nitrogen metabolism in a Parcubacterium (OD1) of the Candidate Phyla Radiation

DOE PAGES

Castelle, Cindy J.; Brown, Christopher T.; Thomas, Brian C.; ...

2017-01-09

The Candidate Phyla Radiation (CPR) is a large group of bacteria, the scale of which approaches that of all other bacteria. CPR organisms are inferred to depend on other community members for many basic cellular building blocks and all appear to be obligate anaerobes. To date, there has been no evidence for any significant respiratory capacity in an organism from this radiation. Here we report a curated draft genome for Candidatus Parcunitrobacter nitroensis' a member of the Parcubacteria (OD1) superphylum of the CPR. The genome encodes versatile energy pathways, including fermentative and respiratory capacities, nitrogen and fatty acid metabolism, asmore » well as the first complete electron transport chain described for a member of the CPR. The sequences of all of these enzymes are highly divergent from sequences found in other organisms, suggesting that these capacities were not recently acquired from non-CPR organisms. Although the wide respiration-based repertoire points to a different lifestyle compared to other CPR bacteria, we predict similar obligate dependence on other organisms or the microbial community. The results substantially expand the known metabolic potential of CPR bacteria, although sequence comparisons indicate that these capacities are very rare in members of this radiation.« less
Evidence of protein-free homology recognition in magnetic bead force–extension experiments

PubMed Central

(O’) Lee, D. J.; Danilowicz, C.; Rochester, C.; Prentiss, M.

2016-01-01

Earlier theoretical studies have proposed that the homology-dependent pairing of large tracts of dsDNA may be due to physical interactions between homologous regions. Such interactions could contribute to the sequence-dependent pairing of chromosome regions that may occur in the presence or the absence of double-strand breaks. Several experiments have indicated the recognition of homologous sequences in pure electrolytic solutions without proteins. Here, we report single-molecule force experiments with a designed 60 kb long dsDNA construct; one end attached to a solid surface and the other end to a magnetic bead. The 60 kb constructs contain two 10 kb long homologous tracts oriented head to head, so that their sequences match if the two tracts fold on each other. The distance between the bead and the surface is measured as a function of the force applied to the bead. At low forces, the construct molecules extend substantially less than normal, control dsDNA, indicating the existence of preferential interaction between the homologous regions. The force increase causes no abrupt but continuous unfolding of the paired homologous regions. Simple semi-phenomenological models of the unfolding mechanics are proposed, and their predictions are compared with the data. PMID:27493568
The Conservation of Structure and Mechanism of Catalytic Action in a Family of Thiamin Pyrophosphate (TPP)-dependent Enzymes

NASA Technical Reports Server (NTRS)

Dominiak, P.; Ciszak, Ewa

2004-01-01

Thiamin pyrophosphate (TPP)-dependent enzymes are a divergent family of TPP and metal ion binding proteins that perform a wide range of functions with the common decarboxylation steps of a -(O=)C-C(OH)- fragment of alpha-ketoacids and alpha- hydroxyaldehydes. To determine how structure and catalytic action are conserved in the context of large sequence differences existing within this family of enzymes, we have carried out an analysis of TPP-dependent enzymes of known structures. The common structure of TPP-dependent enzymes is formed at the interface of four alpha/beta domains from at least two subunits, which provide for two metal and TPP-binding sites. Residues around these catalytic sites are conserved for functional purpose, while those further away from TPP are conserved for structural reasons. Together they provide a network of contacts required for flip-flop catalytic action within TPP-dependent enzymes. Thus our analysis defines a TPP-action motif that is proposed for annotating TPP-dependent enzymes for advancing functional proteomics.
Is sequence awareness mandatory for perceptual sequence learning: An assessment using a pure perceptual sequence learning design.

PubMed

Deroost, Natacha; Coomans, Daphné

2018-02-01

We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.

A comprehensive evaluation of assembly scaffolding tools

PubMed Central

2014-01-01

Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity. PMID:24581555
Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

NASA Technical Reports Server (NTRS)

Wheeler, Ward C.

2003-01-01

A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Integration of Host Strain Bioengineering and Bioprocess Development Using Ultra-Scale Down Studies to Select the Optimum Combination: An Antibody Fragment Primary Recovery Case Study

PubMed Central

Aucamp, Jean P; Davies, Richard; Hallet, Damien; Weiss, Amanda; Titchener-Hooker, Nigel J

2014-01-01

An ultra scale-down primary recovery sequence was established for a platform E. coli Fab production process. It was used to evaluate the process robustness of various bioengineered strains. Centrifugal discharge in the initial dewatering stage was determined to be the major cause of cell breakage. The ability of cells to resist breakage was dependant on a combination of factors including host strain, vector, and fermentation strategy. Periplasmic extraction studies were conducted in shake flasks and it was demonstrated that key performance parameters such as Fab titre and nucleic acid concentrations were mimicked. The shake flask system also captured particle aggregation effects seen in a large scale stirred vessel, reproducing the fine particle size distribution that impacts the final centrifugal clarification stage. The use of scale-down primary recovery process sequences can be used to screen a larger number of engineered strains. This can lead to closer integration with and better feedback between strain development, fermentation development, and primary recovery studies. Biotechnol. Bioeng. 2014;111: 1971–1981. © 2014 Wiley Periodicals, Inc. PMID:24838387
Recollection-dependent memory for event duration in large-scale spatial navigation

PubMed Central

Barense, Morgan D.

2017-01-01

Time and space represent two key aspects of episodic memories, forming the spatiotemporal context of events in a sequence. Little is known, however, about how temporal information, such as the duration and the order of particular events, are encoded into memory, and if it matters whether the memory representation is based on recollection or familiarity. To investigate this issue, we used a real world virtual reality navigation paradigm where periods of navigation were interspersed with pauses of different durations. Crucially, participants were able to reliably distinguish the durations of events that were subjectively “reexperienced” (i.e., recollected), but not of those that were familiar. This effect was not found in temporal order (ordinal) judgments. We also show that the active experience of the passage of time (holding down a key while waiting) moderately enhanced duration memory accuracy. Memory for event duration, therefore, appears to rely on the hippocampally supported ability to recollect or reexperience an event enabling the reinstatement of both its duration and its spatial context, to distinguish it from other events in a sequence. In contrast, ordinal memory appears to rely on familiarity and recollection to a similar extent. PMID:28202714
Parental origin of sequence variants associated with complex diseases.

PubMed

Kong, Augustine; Steinthorsdottir, Valgerdur; Masson, Gisli; Thorleifsson, Gudmar; Sulem, Patrick; Besenbacher, Soren; Jonasdottir, Aslaug; Sigurdsson, Asgeir; Kristinsson, Kari Th; Jonasdottir, Adalbjorg; Frigge, Michael L; Gylfason, Arnaldur; Olason, Pall I; Gudjonsson, Sigurjon A; Sverrisson, Sverrir; Stacey, Simon N; Sigurgeirsson, Bardur; Benediktsdottir, Kristrun R; Sigurdsson, Helgi; Jonsson, Thorvaldur; Benediktsson, Rafn; Olafsson, Jon H; Johannsson, Oskar Th; Hreidarsson, Astradur B; Sigurdsson, Gunnar; Ferguson-Smith, Anne C; Gudbjartsson, Daniel F; Thorsteinsdottir, Unnur; Stefansson, Kari

2009-12-17

Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.
RNAmutants: a web server to explore the mutational landscape of RNA secondary structures

PubMed Central

Waldispühl, Jerome; Devadas, Srinivas; Berger, Bonnie; Clote, Peter

2009-01-01

The history and mechanism of molecular evolution in DNA have been greatly elucidated by contributions from genetics, probability theory and bioinformatics—indeed, mathematical developments such as Kimura's neutral theory, Kingman's coalescent theory and efficient software such as BLAST, ClustalW, Phylip, etc., provide the foundation for modern population genetics. In contrast to DNA, the function of most noncoding RNA depends on tertiary structure, experimentally known to be largely determined by secondary structure, for which dynamic programming can efficiently compute the minimum free energy secondary structure. For this reason, understanding the effect of pointwise mutations in RNA secondary structure could reveal fundamental properties of structural RNA molecules and improve our understanding of molecular evolution of RNA. The web server RNAmutants provides several efficient tools to compute the ensemble of low-energy secondary structures for all k-mutants of a given RNA sequence, where k is bounded by a user-specified upper bound. As we have previously shown, these tools can be used to predict putative deleterious mutations and to analyze regulatory sequences from the hepatitis C and human immunodeficiency genomes. Web server is available at http://bioinformatics.bc.edu/clotelab/RNAmutants/, and downloadable binaries at http://rnamutants.csail.mit.edu/. PMID:19531740
Spectral shaping of spreading sequences as a mean to address the trade-off between narrowband and multi-access interferences in UWB systems

NASA Astrophysics Data System (ADS)

Mangia, Mauro; Pareschi, Fabio; Rovatti, Riccardo; Setti, Gianluca

This paper presents a way to cope with the need of simultaneously rejecting narrowband interference and multi-access interference in a UWB system based on direct-sequence CDMA. With this aim in mind, we rely on a closed-form expression of the system bit error probability in presence of both effects. By means of such a formula, we evaluate the effect of spectrum shaping techniques applied to the spreading sequences. The availability of a certain number of degrees of freedom in deciding the spectral profile allows us to cope with different configurations depending on the relative interfering power but also on the relative position of the signal center frequency and the narrowband interferer.
Action history influences subsequent movement via two distinct processes

PubMed Central

Poh, Eugene; de Rugy, Aymar

2017-01-01

The characteristics of goal-directed actions tend to resemble those of previously executed actions, but it is unclear whether such effects depend strictly on action history, or also reflect context-dependent processes related to predictive motor planning. Here we manipulated the time available to initiate movements after a target was specified, and studied the effects of predictable movement sequences, to systematically dissociate effects of the most recently executed movement from the movement required next. We found that directional biases due to recent movement history strongly depend upon movement preparation time, suggesting an important contribution from predictive planning. However predictive biases co-exist with an independent source of bias that depends only on recent movement history. The results indicate that past experience influences movement execution through a combination of temporally-stable processes that are strictly use-dependent, and dynamically-evolving and context-dependent processes that reflect prediction of future actions. PMID:29058670
Effect of Noise on DNA Sequencing via Transverse Electronic Transport

PubMed Central

Krems, Matt; Zwolak, Michael; Pershin, Yuriy V.; Di Ventra, Massimiliano

2009-01-01

Abstract Previous theoretical studies have shown that measuring the transverse current across DNA strands while they translocate through a nanopore or channel may provide a statistically distinguishable signature of the DNA bases, and may thus allow for rapid DNA sequencing. However, fluctuations of the environment, such as ionic and DNA motion, introduce important scattering processes that may affect the viability of this approach to sequencing. To understand this issue, we have analyzed a simple model that captures the role of this complex environment in electronic dephasing and its ability to remove charge carriers from current-carrying states. We find that these effects do not strongly influence the current distributions due to the off-resonant nature of tunneling through the nucleotides—a result we expect to be a common feature of transport in molecular junctions. In particular, only large scattering strengths, as compared to the energetic gap between the molecular states and the Fermi level, significantly alter the form of the current distributions. Since this gap itself is quite large, the current distributions remain protected from this type of noise, further supporting the possibility of using transverse electronic transport measurements for DNA sequencing. PMID:19804730
SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses.

PubMed

Vetrovský, Tomáš; Baldrian, Petr; Morais, Daniel; Berger, Bonnie

2018-02-14

Modern molecular methods have increased our ability to describe microbial communities. Along with the advances brought by new sequencing technologies, we now require intensive computational resources to make sense of the large numbers of sequences continuously produced. The software developed by the scientific community to address this demand, although very useful, require experience of the command-line environment, extensive training and have steep learning curves, limiting their use. We created SEED 2, a graphical user interface for handling high-throughput amplicon-sequencing data under Windows operating systems. SEED 2 is the only sequence visualizer that empowers users with tools to handle amplicon-sequencing data of microbial community markers. It is suitable for any marker genes sequences obtained through Illumina, IonTorrent or Sanger sequencing. SEED 2 allows the user to process raw sequencing data, identify specific taxa, produce of OTU-tables, create sequence alignments and construct phylogenetic trees. Standard dual core laptops with 8 GB of RAM can handle ca. 8 million of Illumina PE 300 bp sequences, ca. 4GB of data. SEED 2 was implemented in Object Pascal and uses internal functions and external software for amplicon data processing. SEED 2 is a freeware software, available at http://www.biomed.cas.cz/mbu/lbwrf/seed/ as a self-contained file, including all the dependencies, and does not require installation. Supplementary data contain a comprehensive list of supported functions. daniel.morais@biomed.cas.cz. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.
Effects of Main-Sequence Mass Loss on Stellar and Galactic Chemical Evolution.

NASA Astrophysics Data System (ADS)

Guzik, Joyce Ann

1988-06-01

L. A. Willson, G. H. Bowen and C. Struck -Marcell have proposed that 1 to 3 solar mass stars may experience evolutionarily significant mass loss during the early part of their main-sequence phase. The suggested mass-loss mechanism is pulsation, facilitated by rapid rotation. Initial mass-loss rates may be as large as several times 10^{-9}M o/yr, diminishing over several times 10^8 years. We attempted to test this hypothesis by comparing some theoretical implications with observations. Three areas are addressed: Solar models, cluster HR diagrams, and galactic chemical evolution. Mass-losing solar models were evolved that match the Sun's luminosity and radius at its present age. The most extreme viable models have initial mass 2.0 M o, and mass-loss rates decreasing exponentially over 2-3 times 10^8 years. Compared to a constant -mass model, these models require a reduced initial ^4He abundance, have deeper envelope convection zones and higher ^8B neutrino fluxes. Early processing of present surface layers at higher interior temperatures increases the surface ^3He abundance, destroys Li, Be and B, and decreases the surface C/N ratio following first dredge-up. Evolution calculations incorporating main-sequence mass loss were completed for a grid of models with initial masses 1.25 to 2.0 Mo and mass loss timescales 0.2 to 2.0 Gyr. Cluster HR diagrams synthesized with these models confirm the potential for the hypothesis to explain observed spreads or bifurcations in the upper main sequence, blue stragglers, anomalous giants, and poor fits of main-sequence turnoffs by standard isochrones. Simple closed galactic chemical evolution models were used to test the effects of main-sequence mass loss on the F and G dwarf distribution. Stars between 3.0 M o and a metallicity -dependent lower mass are assumed to lose mass. The models produce a 30 to 60% increase in the stars to stars-plus -remnants ratio, with fewer early-F dwarfs and many more late-F dwarfs remaining on the main sequence to the present. The ratio of stars to stellar remnants and the white dwarf age distribution may prove valuable in distinguishing between explanations for the observed bimodal present-day stellar mass function.
Experimental Influences in the Accurate Measurement of Cartilage Thickness in MRI.

PubMed

Wang, Nian; Badar, Farid; Xia, Yang

2018-01-01

Objective To study the experimental influences to the measurement of cartilage thickness by magnetic resonance imaging (MRI). Design The complete thicknesses of healthy and trypsin-degraded cartilage were measured at high-resolution MRI under different conditions, using two intensity-based imaging sequences (ultra-short echo [UTE] and multislice-multiecho [MSME]) and 3 quantitative relaxation imaging sequences (T 1 , T 2 , and T 1 ρ). Other variables included different orientations in the magnet, 2 soaking solutions (saline and phosphate buffered saline [PBS]), and external loading. Results With cartilage soaked in saline, UTE and T 1 methods yielded complete and consistent measurement of cartilage thickness, while the thickness measurement by T 2 , T 1 ρ, and MSME methods were orientation dependent. The effect of external loading on cartilage thickness is also sequence and orientation dependent. All variations in cartilage thickness in MRI could be eliminated with the use of a 100 mM PBS or imaged by UTE sequence. Conclusions The appearance of articular cartilage and the measurement accuracy of cartilage thickness in MRI can be influenced by a number of experimental factors in ex vivo MRI, from the use of various pulse sequences and soaking solutions to the health of the tissue. T 2 -based imaging sequence, both proton-intensity sequence and quantitative relaxation sequence, similarly produced the largest variations. With adequate resolution, the accurate measurement of whole cartilage tissue in clinical MRI could be utilized to detect differences between healthy and osteoarthritic cartilage after compression.
Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

NASA Astrophysics Data System (ADS)

Weigt, Martin

Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia

PubMed Central

Shortt, Jonathan A.; Card, Daren C.; Schield, Drew R.; Liu, Yang; Zhong, Bo; Castoe, Todd A.

2017-01-01

Background In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. Methodology/Principal Findings We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. Conclusions/Significance This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes. PMID:28107347
StructRNAfinder: an automated pipeline and web server for RNA families prediction.

PubMed

Arias-Carrasco, Raúl; Vásquez-Morán, Yessenia; Nakaya, Helder I; Maracaja-Coutinho, Vinicius

2018-02-17

The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. This makes the large-scale prediction and annotation of RNAs a daunting task even to researchers with good computational or bioinformatics skills. We present an automated pipeline named StructRNAfinder that predicts and annotates RNA families in transcript or genome sequences. This single tool not only displays the sequence/structural consensus alignments for each RNA family, according to Rfam database but also provides a taxonomic overview for each assigned functional RNA. Moreover, we implemented a user-friendly web service that allows researchers to upload their own nucleotide sequences in order to perform the whole analysis. Finally, we provided a stand-alone version of StructRNAfinder to be used in large-scale projects. The tool was developed under GNU General Public License (GPLv3) and is freely available at http://structrnafinder.integrativebioinformatics.me . The main advantage of StructRNAfinder relies on the large-scale processing and integrating the data obtained by each tool and database employed along the workflow, of which several files are generated and displayed in user-friendly reports, useful for downstream analyses and data exploration.
Analyzing ion distributions around DNA: sequence-dependence of potassium ion distributions from microsecond molecular dynamics

PubMed Central

Pasi, Marco; Maddocks, John H.; Lavery, Richard

2015-01-01

Microsecond molecular dynamics simulations of B-DNA oligomers carried out in an aqueous environment with a physiological salt concentration enable us to perform a detailed analysis of how potassium ions interact with the double helix. The oligomers studied contain all 136 distinct tetranucleotides and we are thus able to make a comprehensive analysis of base sequence effects. Using a recently developed curvilinear helicoidal coordinate method we are able to analyze the details of ion populations and densities within the major and minor grooves and in the space surrounding DNA. The results show higher ion populations than have typically been observed in earlier studies and sequence effects that go beyond the nature of individual base pairs or base pair steps. We also show that, in some special cases, ion distributions converge very slowly and, on a microsecond timescale, do not reflect the symmetry of the corresponding base sequence. PMID:25662221
Rogue taxa phenomenon: a biological companion to simulation analysis

PubMed Central

Westover, Kristi M.; Rusinko, Joseph P.; Hoin, Jon; Neal, Matthew

2013-01-01

To provide a baseline biological comparison to simulation study predictions about the frequency of rogue taxa effects, we evaluated the frequency of a rogue taxa effect using viral data sets which differed in diversity. Using a quartet-tree framework, we measured the frequency of a rogue taxa effect in three data sets of increasing genetic variability (within viral serotype, between viral serotype, and between viral family) to test whether the rogue taxa was correlated with the mean sequence diversity of the respective data sets. We found a slight increase in the percentage of rogues as nucleotide diversity increased. Even though the number of rogues increased with diversity, the distribution of the types of rogues (friendly, crazy, or evil) did not depend on the diversity and in the case of the order-level data set the net rogue effect was slightly positive. This study, assessing frequency of the rogue taxa effect using biological data, indicated that simulation studies may over-predict the prevalence of the rogue taxa effect. Further investigations are necessary to understand which types of data sets are susceptible to a negative rogue effect and thus merit the removal of taxa from large phylogenetic reconstructions. PMID:23707704
Rogue taxa phenomenon: a biological companion to simulation analysis.

PubMed

Westover, Kristi M; Rusinko, Joseph P; Hoin, Jon; Neal, Matthew

2013-10-01

To provide a baseline biological comparison to simulation study predictions about the frequency of rogue taxa effects, we evaluated the frequency of a rogue taxa effect using viral data sets which differed in diversity. Using a quartet-tree framework, we measured the frequency of a rogue taxa effect in three data sets of increasing genetic variability (within viral serotype, between viral serotype, and between viral family) to test whether the rogue taxa was correlated with the mean sequence diversity of the respective data sets. We found a slight increase in the percentage of rogues as nucleotide diversity increased. Even though the number of rogues increased with diversity, the distribution of the types of rogues (friendly, crazy, or evil) did not depend on the diversity and in the case of the order-level data set the net rogue effect was slightly positive. This study, assessing frequency of the rogue taxa effect using biological data, indicated that simulation studies may over-predict the prevalence of the rogue taxa effect. Further investigations are necessary to understand which types of data sets are susceptible to a negative rogue effect and thus merit the removal of taxa from large phylogenetic reconstructions. Copyright © 2013 Elsevier Inc. All rights reserved.
GeneiASE: Detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information

PubMed Central

Edsgärd, Daniel; Iglesias, Maria Jesus; Reilly, Sarah-Jayne; Hamsten, Anders; Tornvall, Per; Odeberg, Jacob; Emanuelsson, Olof

2016-01-01

Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We investigated ASE through RNA-sequencing of primary white blood cells from eight human individuals before and after the controlled induction of an inflammatory response, and detected condition-dependent and static ASE at 211 and 13021 variants, respectively. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We observed condition-dependent ASE related to the inflammatory response in 19 genes, and static ASE in 1389 genes. Allele-specific expression was confirmed by validation of variants through real-time quantitative RT-PCR, with RNA-seq and RT-PCR ASE effect-size correlations r = 0.67 and r = 0.94 for static and condition-dependent ASE, respectively. PMID:26887787
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium

PubMed Central

2014-01-01

We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838

Processing multiple non-adjacent dependencies: evidence from sequence learning

PubMed Central

de Vries, Meinou H.; Petersson, Karl Magnus; Geukes, Sebastian; Zwitserlood, Pienie; Christiansen, Morten H.

2012-01-01

Processing non-adjacent dependencies is considered to be one of the hallmarks of human language. Assuming that sequence-learning tasks provide a useful way to tap natural-language-processing mechanisms, we cross-modally combined serial reaction time and artificial-grammar learning paradigms to investigate the processing of multiple nested (A1A2A3B3B2B1) and crossed dependencies (A1A2A3B1B2B3), containing either three or two dependencies. Both reaction times and prediction errors highlighted problems with processing the middle dependency in nested structures (A1A2A3B3_B1), reminiscent of the ‘missing-verb effect’ observed in English and French, but not with crossed structures (A1A2A3B1_B3). Prior linguistic experience did not play a major role: native speakers of German and Dutch—which permit nested and crossed dependencies, respectively—showed a similar pattern of results for sequences with three dependencies. As for sequences with two dependencies, reaction times and prediction errors were similar for both nested and crossed dependencies. The results suggest that constraints on the processing of multiple non-adjacent dependencies are determined by the specific ordering of the non-adjacent dependencies (i.e. nested or crossed), as well as the number of non-adjacent dependencies to be resolved (i.e. two or three). Furthermore, these constraints may not be specific to language but instead derive from limitations on structured sequence learning. PMID:22688641
Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples.

PubMed

Reiman, Mario; Laan, Maris; Rull, Kristiina; Sõber, Siim

2017-08-01

RNA degradation is a ubiquitous process that occurs in living and dead cells, as well as during handling and storage of extracted RNA. Reduced RNA quality caused by degradation is an established source of uncertainty for all RNA-based gene expression quantification techniques. RNA sequencing is an increasingly preferred method for transcriptome analyses, and dependence of its results on input RNA integrity is of significant practical importance. This study aimed to characterize the effects of varying input RNA integrity [estimated as RNA integrity number (RIN)] on transcript level estimates and delineate the characteristic differences between transcripts that differ in degradation rate. The study used ribodepleted total RNA sequencing data from a real-life clinically collected set ( n = 32) of human solid tissue (placenta) samples. RIN-dependent alterations in gene expression profiles were quantified by using DESeq2 software. Our results indicate that small differences in RNA integrity affect gene expression quantification by introducing a moderate and pervasive bias in expression level estimates that significantly affected 8.1% of studied genes. The rapidly degrading transcript pool was enriched in pseudogenes, short noncoding RNAs, and transcripts with extended 3' untranslated regions. Typical slowly degrading transcripts (median length, 2389 nt) represented protein coding genes with 4-10 exons and high guanine-cytosine content.-Reiman, M., Laan, M., Rull, K., Sõber, S. Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples. © FASEB.
Drinking from the Fire Hose: Why the Flight Management System Can Be Hard to Train and Difficult to Use

NASA Technical Reports Server (NTRS)

Sherry, Lance; Feary, Michael; Polson, Peter; Fennell, Karl

2003-01-01

The Flight Management Computer (FMC) and its interface, the Multi-function Control and Display Unit (MCDU) have been identified by researchers and airlines as difficult to train and use. Specifically, airline pilots have described the "drinking from the fire-hose" effect during training. Previous research has identified memorized action sequences as a major factor in a user s ability to learn and operate complex devices. This paper discusses the use of a method to examine the quantity of memorized action sequences required to perform a sample of 102 tasks, using features of the Boeing 777 Flight Management Computer Interface. The analysis identified a large number of memorized action sequences that must be learned during training and then recalled during line operations. Seventy-five percent of the tasks examined require recall of at least one memorized action sequence. Forty-five percent of the tasks require recall of a memorized action sequence and occur infrequently. The large number of memorized action sequences may provide an explanation for the difficulties in training and usage of the automation. Based on these findings, implications for training and the design of new user-interfaces are discussed.
Facile Recovery of Individual High-Molecular-Weight, Low-Copy-Number Natural Plasmids for Genomic Sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, L.E.; Detter, C,; Barrie, K.

2006-06-01

Sequencing of the large (>50 kb), low-copy-number (<5 per cell) plasmids that mediate horizontal gene transfer has been hindered by the difficulty and expense of isolating DNA from individual plasmids of this class. We report here that a kit method previously devised for purification of bacterial artificial chromosomes (BACs) can be adapted for effective preparation of individual plasmids up to 220 kb from wild gram-negative and gram-positive bacteria. Individual plasmid DNA recovered from less than 10 ml of Escherichia coli, Staphylococcus, and Corynebacterium cultures was of sufficient quantity and quality for construction of highcoverage libraries, as shown by sequencing fivemore » native plasmids ranging in size from 30 kb to 94 kb. We also report recommendations for vector screening to optimize plasmid sequence assembly, preliminary annotation of novel plasmid genomes, and insights on mobile genetic element biology derived from these sequences. Adaptation of this BAC method for large plasmid isolation removes one major technical hurdle to expanding our knowledge of the natural plasmid gene pool.« less
Nanowire-nanopore transistor sensor for DNA detection during translocation

NASA Astrophysics Data System (ADS)

Xie, Ping; Xiong, Qihua; Fang, Ying; Qing, Quan; Lieber, Charles

2011-03-01

Nanopore sequencing, as a promising low cost, high throughput sequencing technique, has been proposed more than a decade ago. Due to the incompatibility between small ionic current signal and fast translocation speed and the technical difficulties on large scale integration of nanopore for direct ionic current sequencing, alternative methods rely on integrated DNA sensors have been proposed, such as using capacitive coupling or tunnelling current etc. But none of them have been experimentally demonstrated yet. Here we show that for the first time an amplified sensor signal has been experimentally recorded from a nanowire-nanopore field effect transistor sensor during DNA translocation. Independent multi-channel recording was also demonstrated for the first time. Our results suggest that the signal is from highly localized potential change caused by DNA translocation in none-balanced buffer condition. Given this method may produce larger signal for smaller nanopores, we hope our experiment can be a starting point for a new generation of nanopore sequencing devices with larger signal, higher bandwidth and large-scale multiplexing capability and finally realize the ultimate goal of low cost high throughput sequencing.
Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

PubMed

Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

2017-01-01

Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

PubMed Central

Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

2017-01-01

Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
The Challenges of Implementing Next Generation Sequencing Across a Large Healthcare System, and the Molecular Epidemiology and Antibiotic Susceptibilities of Carbapenemase-Producing Bacteria in the Healthcare System of the U.S. Department of Defense

PubMed Central

Lesho, Emil; Clifford, Robert; Onmus-Leone, Fatma; Appalla, Lakshmi; Snesrud, Erik; Kwak, Yoon; Ong, Ana; Maybank, Rosslyn; Waterman, Paige; Rohrbeck, Patricia; Julius, Michael; Roth, Amanda; Martinez, Joshua; Nielsen, Lindsey; Steele, Eric; McGann, Patrick; Hinkle, Mary

2016-01-01

Objective We sought to: 1) provide an overview of the genomic epidemiology of an extensive collection of carbapenemase-producing bacteria (CPB) collected in the U.S. Department of Defense health system; 2) increase awareness of the public availability of the sequences, isolates, and customized antimicrobial resistance database of that system; and 3) illustrate challenges and offer mitigations for implementing next generation sequencing (NGS) across large health systems. Design Prospective surveillance and system-wide implementation of NGS. Setting 288-hospital healthcare network. Methods All phenotypically carbapenem resistant bacteria underwent CarbaNP® testing and PCR, followed by NGS. Commercial (Newbler and Geneious), on-line (ResFinder), and open-source software (Btrim, FLASh, Bowtie2, an Samtools) were used for assembly, SNP detection and clustering. Laboratory capacity, throughput, and response time were assessed. Results From 2009 through 2015, 27,000 multidrug-resistant Gram-negative isolates were submitted. 225 contained carbapenemase-encoding genes (most commonly blaKPC, blaNDM, and blaOXA23). These were found in 15 species from 146 inpatients in 19 facilities. Genetically related CPB were found in more than one hospital. Other clusters or outbreaks were not clonal and involved genetically related plasmids, while some involved several unrelated plasmids. Relatedness depended on the clustering algorithm used. Transmission patterns of plasmids and other mobile genetic elements could not be determined without ultra-long read, single-molecule real-time sequencing. 80% of carbapenem-resistant phenotypes retained susceptibility to aminoglycosides, and 70% retained susceptibility to fluoroquinolones. However, among the CPB-confirmed genotypes, fewer than 25% retained susceptibility to aminoglycosides or fluoroquinolones. Conclusion Although NGS is increasingly acclaimed to revolutionize clinical practice, resource-constrained environments, large or geographically dispersed healthcare networks, and military or government-funded public health laboratories are likely to encounter constraints and challenges as they implement NGS across their health systems. These include lack of standardized definitions and quality control metrics, limitations of short-read sequencing, insufficient bandwidth, and the current limited availability of very expensive and scarcely available sequencing platforms. Possible solutions and mitigations are also proposed. PMID:27196272
The Challenges of Implementing Next Generation Sequencing Across a Large Healthcare System, and the Molecular Epidemiology and Antibiotic Susceptibilities of Carbapenemase-Producing Bacteria in the Healthcare System of the U.S. Department of Defense.

PubMed

Lesho, Emil; Clifford, Robert; Onmus-Leone, Fatma; Appalla, Lakshmi; Snesrud, Erik; Kwak, Yoon; Ong, Ana; Maybank, Rosslyn; Waterman, Paige; Rohrbeck, Patricia; Julius, Michael; Roth, Amanda; Martinez, Joshua; Nielsen, Lindsey; Steele, Eric; McGann, Patrick; Hinkle, Mary

2016-01-01

We sought to: 1) provide an overview of the genomic epidemiology of an extensive collection of carbapenemase-producing bacteria (CPB) collected in the U.S. Department of Defense health system; 2) increase awareness of the public availability of the sequences, isolates, and customized antimicrobial resistance database of that system; and 3) illustrate challenges and offer mitigations for implementing next generation sequencing (NGS) across large health systems. Prospective surveillance and system-wide implementation of NGS. 288-hospital healthcare network. All phenotypically carbapenem resistant bacteria underwent CarbaNP® testing and PCR, followed by NGS. Commercial (Newbler and Geneious), on-line (ResFinder), and open-source software (Btrim, FLASh, Bowtie2, an Samtools) were used for assembly, SNP detection and clustering. Laboratory capacity, throughput, and response time were assessed. From 2009 through 2015, 27,000 multidrug-resistant Gram-negative isolates were submitted. 225 contained carbapenemase-encoding genes (most commonly blaKPC, blaNDM, and blaOXA23). These were found in 15 species from 146 inpatients in 19 facilities. Genetically related CPB were found in more than one hospital. Other clusters or outbreaks were not clonal and involved genetically related plasmids, while some involved several unrelated plasmids. Relatedness depended on the clustering algorithm used. Transmission patterns of plasmids and other mobile genetic elements could not be determined without ultra-long read, single-molecule real-time sequencing. 80% of carbapenem-resistant phenotypes retained susceptibility to aminoglycosides, and 70% retained susceptibility to fluoroquinolones. However, among the CPB-confirmed genotypes, fewer than 25% retained susceptibility to aminoglycosides or fluoroquinolones. Although NGS is increasingly acclaimed to revolutionize clinical practice, resource-constrained environments, large or geographically dispersed healthcare networks, and military or government-funded public health laboratories are likely to encounter constraints and challenges as they implement NGS across their health systems. These include lack of standardized definitions and quality control metrics, limitations of short-read sequencing, insufficient bandwidth, and the current limited availability of very expensive and scarcely available sequencing platforms. Possible solutions and mitigations are also proposed.
Disbond Detection in Bonded Aluminum Joints Using Lamb Wave Amplitude and Time-of-Flight

NASA Technical Reports Server (NTRS)

Sun, Keun J.; Johnston, Patrick H.

1992-01-01

In recent years, there was a need of developing efficient nondestructive integrity assessment techniques for large area laminate structures, such as detections of disbond, crack, and corrosion in fuselage of an aircraft. Together with the improving tomography and computer technologies, progress has been made in many fields in NDE towards a faster inspection. Ultrasonically, Lamb wave is considered to be a candidate for large area inspections based on its capability of propagating a relatively long distance in thin plates and its media-thickness-dependent propagation properties. Moreover, the occurence of disbonds, corrosion, and even cracks often results in reduction of effective thickness of a laminate. The idea is to assess the condition of a structure by sensing the response of propagating Lamb waves to these flaws over long path length. A series of tests in the sequence of disbond, corrosion, and crack have been done on various types of specimen to investigate the feasibility of this approach. This paper will present some of the test results for disbond detection on aluminum lap splice joints.
Nulling Data Reduction and On-Sky Performance of the Large Binocular Telescope Interferometer

NASA Technical Reports Server (NTRS)

Defrere, D.; Hinz, P. M.; Mennesson, B.; Hoffman, W. F.; Millan-Gabet, R.; Skemer, A. J.; Bailey, V.; Danchi, W. C.; Downy, E. C.; Durney, O.;

2016-01-01

The Large Binocular Telescope Interferometer (LBTI) is a versatile instrument designed for high angular resolution and high-contrast infrared imaging (1.5-13 micrometers). In this paper, we focus on the mid-infrared (8-13 micrometers) nulling mode and present its theory of operation, data reduction, and on-sky performance as of the end of the commissioning phase in 2015 March. With an interferometric baseline of 14.4 m, the LBTI nuller is specifically tuned to resolve the habitable zone of nearby main-sequence stars, where warm exozodiacal dust emission peaks. Measuring the exozodi luminosity function of nearby main-sequence stars is a key milestone to prepare for future exo-Earth direct imaging instruments. Thanks to recent progress in wavefront control and phase stabilization, as well as in data reduction techniques, the LBTI demonstrated in 2015 February a calibrated null accuracy of 0.05% over a 3 hr long observing sequence on the bright nearby A3V star Beta Leo. This is equivalent to an exozodiacal disk density of 15-30 zodi for a Sun-like star located at 10 pc, depending on the adopted disk model. This result sets a new record for high-contrast mid-infrared interferometric imaging and opens a new window on the study of planetary systems.

Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.

PubMed

Wood, Henry M; Belvedere, Ornella; Conway, Caroline; Daly, Catherine; Chalkley, Rebecca; Bickerdike, Melissa; McKinley, Claire; Egan, Phil; Ross, Lisa; Hayward, Bruce; Morgan, Joanne; Davidson, Leslie; MacLennan, Ken; Ong, Thian K; Papagiannopoulos, Kostas; Cook, Ian; Adams, David J; Taylor, Graham R; Rabbitts, Pamela

2010-08-01

The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.
GDC 2: Compression of large collections of genomes

PubMed Central

Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

2015-01-01

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279
Preparation of highly multiplexed small RNA sequencing libraries.

PubMed

Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos

2017-08-01

MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.
GDC 2: Compression of large collections of genomes.

PubMed

Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

2015-06-25

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about.
Striatal and Hippocampal Involvement in Motor Sequence Chunking Depends on the Learning Strategy

PubMed Central

Lungu, Ovidiu; Monchi, Oury; Albouy, Geneviève; Jubault, Thomas; Ballarin, Emanuelle; Burnod, Yves; Doyon, Julien

2014-01-01

Motor sequences can be learned using an incremental approach by starting with a few elements and then adding more as training evolves (e.g., learning a piano piece); conversely, one can use a global approach and practice the whole sequence in every training session (e.g., shifting gears in an automobile). Yet, the neural correlates associated with such learning strategies in motor sequence learning remain largely unexplored to date. Here we used functional magnetic resonance imaging to measure the cerebral activity of individuals executing the same 8-element sequence after they completed a 4-days training regimen (2 sessions each day) following either a global or incremental strategy. A network comprised of striatal and fronto-parietal regions was engaged significantly regardless of the learning strategy, whereas the global training regimen led to additional cerebellar and temporal lobe recruitment. Analysis of chunking/grouping of sequence elements revealed a common prefrontal network in both conditions during the chunk initiation phase, whereas execution of chunk cores led to higher mediotemporal activity (involving the hippocampus) after global than incremental training. The novelty of our results relate to the recruitment of mediotemporal regions conditional of the learning strategy. Thus, the present findings may have clinical implications suggesting that the ability of patients with lesions to the medial temporal lobe to learn and consolidate new motor sequences may benefit from using an incremental strategy. PMID:25148078
Striatal and hippocampal involvement in motor sequence chunking depends on the learning strategy.

PubMed

Lungu, Ovidiu; Monchi, Oury; Albouy, Geneviève; Jubault, Thomas; Ballarin, Emanuelle; Burnod, Yves; Doyon, Julien

2014-01-01

Motor sequences can be learned using an incremental approach by starting with a few elements and then adding more as training evolves (e.g., learning a piano piece); conversely, one can use a global approach and practice the whole sequence in every training session (e.g., shifting gears in an automobile). Yet, the neural correlates associated with such learning strategies in motor sequence learning remain largely unexplored to date. Here we used functional magnetic resonance imaging to measure the cerebral activity of individuals executing the same 8-element sequence after they completed a 4-days training regimen (2 sessions each day) following either a global or incremental strategy. A network comprised of striatal and fronto-parietal regions was engaged significantly regardless of the learning strategy, whereas the global training regimen led to additional cerebellar and temporal lobe recruitment. Analysis of chunking/grouping of sequence elements revealed a common prefrontal network in both conditions during the chunk initiation phase, whereas execution of chunk cores led to higher mediotemporal activity (involving the hippocampus) after global than incremental training. The novelty of our results relate to the recruitment of mediotemporal regions conditional of the learning strategy. Thus, the present findings may have clinical implications suggesting that the ability of patients with lesions to the medial temporal lobe to learn and consolidate new motor sequences may benefit from using an incremental strategy.
Score distributions of gapped multiple sequence alignments down to the low-probability tail

NASA Astrophysics Data System (ADS)

Fieth, Pascal; Hartmann, Alexander K.

2016-08-01

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

PubMed

Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

2017-11-15

An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

PubMed

You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

2018-06-06

As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

PubMed Central

Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong

2016-01-01

High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116
Efficient use of unlabeled data for protein sequence classification: a comparative study.

PubMed

Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

2009-04-29

Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.
Click chemistry-mediated cyclic cleavage of metal ion-dependent DNAzymes for amplified and colorimetric detection of human serum copper (II).

PubMed

Li, Daxiu; Xie, Jiaqing; Zhou, Wenjiao; Jiang, Bingying; Yuan, Ruo; Xiang, Yun

2017-11-01

The determination of the level of Cu 2+ plays important roles in disease diagnosis and environmental monitoring. By coupling Cu + -catalyzed click chemistry and metal ion-dependent DNAzyme cyclic amplification, we have developed a convenient and sensitive colorimetric sensing method for the detection of Cu 2+ in human serums. The target Cu 2+ can be reduced by ascorbate to form Cu + , which catalyzes the azide-alkyne cycloaddition between the azide- and alkyne-modified DNAs to form Mg 2+ -dependent DNAzymes. Subsequently, the Mg 2+ ions catalyze the cleavage of the hairpin DNA substrate sequences of the DNAzymes and trigger cyclic generation of a large number of free G-quadruplex sequences, which bind hemin to form the G-quadruplex/hemin artificial peroxidase to cause significant color transition of the sensing solution for sensitive colorimetric detection of Cu 2+ . This method shows a dynamic range of 5 to 500 nM and a detection limit of 2 nM for Cu 2+ detection. Besides, the level of Cu 2+ in human serums can also be determined by using this sensing approach. With the advantages of simplicity and high sensitivity, such sensing method thus holds great potential for on-site determination of Cu 2+ in different samples. Graphical abstract Sensitive colorimetric detection of copper (II) by coupling click chemistry with metal ion-dependentDNAzymes.
Quality of experience enhancement of high efficiency video coding video streaming in wireless packet networks using multiple description coding

NASA Astrophysics Data System (ADS)

Boumehrez, Farouk; Brai, Radhia; Doghmane, Noureddine; Mansouri, Khaled

2018-01-01

Recently, video streaming has attracted much attention and interest due to its capability to process and transmit large data. We propose a quality of experience (QoE) model relying on high efficiency video coding (HEVC) encoder adaptation scheme, in turn based on the multiple description coding (MDC) for video streaming. The main contributions of the paper are (1) a performance evaluation of the new and emerging video coding standard HEVC/H.265, which is based on the variation of quantization parameter (QP) values depending on different video contents to deduce their influence on the sequence to be transmitted, (2) QoE support multimedia applications in wireless networks are investigated, so we inspect the packet loss impact on the QoE of transmitted video sequences, (3) HEVC encoder parameter adaptation scheme based on MDC is modeled with the encoder parameter and objective QoE model. A comparative study revealed that the proposed MDC approach is effective for improving the transmission with a peak signal-to-noise ratio (PSNR) gain of about 2 to 3 dB. Results show that a good choice of QP value can compensate for transmission channel effects and improve received video quality, although HEVC/H.265 is also sensitive to packet loss. The obtained results show the efficiency of our proposed method in terms of PSNR and mean-opinion-score.
Methodology Development for Passive Component Reliability Modeling in a Multi-Physics Simulation Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aldemir, Tunc; Denning, Richard; Catalyurek, Umit

Reduction in safety margin can be expected as passive structures and components undergo degradation with time. Limitations in the traditional probabilistic risk assessment (PRA) methodology constrain its value as an effective tool to address the impact of aging effects on risk and for quantifying the impact of aging management strategies in maintaining safety margins. A methodology has been developed to address multiple aging mechanisms involving large numbers of components (with possibly statistically dependent failures) within the PRA framework in a computationally feasible manner when the sequencing of events is conditioned on the physical conditions predicted in a simulation environment, suchmore » as the New Generation System Code (NGSC) concept. Both epistemic and aleatory uncertainties can be accounted for within the same phenomenological framework and maintenance can be accounted for in a coherent fashion. The framework accommodates the prospective impacts of various intervention strategies such as testing, maintenance, and refurbishment. The methodology is illustrated with several examples.« less
A phase transition in energy-filtered RNA secondary structures.

PubMed

Han, Hillary S W; Reidys, Christian M

2012-10-01

In this article we study the effect of energy parameters on minimum free energy (mfe) RNA secondary structures. Employing a simplified combinatorial energy model that is only dependent on the diagram representation and is not sequence-specific, we prove the following dichotomy result. Mfe structures derived via the Turner energy parameters contain only finitely many complex irreducible substructures, and just minor parameter changes produce a class of mfe structures that contain a large number of small irreducibles. We localize the exact point at which the distribution of irreducibles experiences this phase transition from a discrete limit to a central limit distribution and, subsequently, put our result into the context of quantifying the effect of sparsification of the folding of these respective mfe structures. We show that the sparsification of realistic mfe structures leads to a constant time and space reduction, and that the sparsification of the folding of structures with modified parameters leads to a linear time and space reduction. We, furthermore, identify the limit distribution at the phase transition as a Rayleigh distribution.
Identification of Differentially Methylated Sites with Weak Methylation Effects

PubMed Central

Tran, Hong; Zhu, Hongxiao; Wu, Xiaowei; Kim, Gunjune; Clarke, Christopher R.; Larose, Hailey; Haak, David C.; Westwood, James H.; Zhang, Liqing

2018-01-01

Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities—both datasets have weak methylation effects of <1%—show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs. PMID:29419727
Intrusion Detection in Control Systems using Sequence Characteristics

NASA Astrophysics Data System (ADS)

Kiuchi, Mai; Onoda, Takashi

Intrusion detection is considered effective in control systems. Sequences of the control application behavior observed in the communication, such as the order of the control device to be controlled, are important in control systems. However, most intrusion detection systems do not effectively reflect sequences in the application layer into the detection rules. In our previous work, we considered utilizing sequences for intrusion detection in control systems, and demonstrated the usefulness of sequences for intrusion detection. However, manually writing the detection rules for a large system can be difficult, so using machine learning methods becomes feasible. Also, in the case of control systems, there have been very few observed cyber attacks, so we have very little knowledge of the attack data that should be used to train the intrusion detection system. In this paper, we use an approach that combines CRF (Conditional Random Field) considering the sequence of the system, thus able to reflect the characteristics of control system sequences into the intrusion detection system, and also does not need the knowledge of attack data to construct the detection rules.
Integrated sequencing of exome and mRNA of large-sized single cells.

PubMed

Wang, Lily Yan; Guo, Jiajie; Cao, Wei; Zhang, Meng; He, Jiankui; Li, Zhoufang

2018-01-10

Current approaches of single cell DNA-RNA integrated sequencing are difficult to call SNPs, because a large amount of DNA and RNA is lost during DNA-RNA separation. Here, we performed simultaneous single-cell exome and transcriptome sequencing on individual mouse oocytes. Using microinjection, we kept the nuclei intact to avoid DNA loss, while retaining the cytoplasm inside the cell membrane, to maximize the amount of DNA and RNA captured from the single cell. We then conducted exome-sequencing on the isolated nuclei and mRNA-sequencing on the enucleated cytoplasm. For single oocytes, exome-seq can cover up to 92% of exome region with an average sequencing depth of 10+, while mRNA-sequencing reveals more than 10,000 expressed genes in enucleated cytoplasm, with similar performance for intact oocytes. This approach provides unprecedented opportunities to study DNA-RNA regulation, such as RNA editing at single nucleotide level in oocytes. In future, this method can also be applied to other large cells, including neurons, large dendritic cells and large tumour cells for integrated exome and transcriptome sequencing.
cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA.

PubMed

De Bruin, Lennart; Maddocks, John H

2018-06-14

The sequence-dependent statistical mechanical properties of fragments of double-stranded DNA is believed to be pertinent to its biological function at length scales from a few base pairs (or bp) to a few hundreds of bp, e.g. indirect read-out protein binding sites, nucleosome positioning sequences, phased A-tracts, etc. In turn, the equilibrium statistical mechanics behaviour of DNA depends upon its ground state configuration, or minimum free energy shape, as well as on its fluctuations as governed by its stiffness (in an appropriate sense). We here present cgDNAweb, which provides browser-based interactive visualization of the sequence-dependent ground states of double-stranded DNA molecules, as predicted by the underlying cgDNA coarse-grain rigid-base model of fragments with arbitrary sequence. The cgDNAweb interface is specifically designed to facilitate comparison between ground state shapes of different sequences. The server is freely available at cgDNAweb.epfl.ch with no login requirement.
The impact of reward and punishment on skill learning depends on task demands

PubMed Central

Steel, Adam; Silson, Edward H.; Stagg, Charlotte J.; Baker, Chris I.

2016-01-01

Reward and punishment motivate behavior, but it is unclear exactly how they impact skill performance and whether the effect varies across skills. The present study investigated the effect of reward and punishment in both a sequencing skill and a motor skill context. Participants trained on either a sequencing skill (serial reaction time task) or a motor skill (force-tracking task). Skill knowledge was tested immediately after training, and again 1 hour, 24–48 hours, and 30 days after training. We found a dissociation of the effects of reward and punishment on the tasks, primarily reflecting the impact of punishment. While punishment improved serial reaction time task performance, it impaired force-tracking task performance. In contrast to prior literature, neither reward nor punishment benefitted memory retention, arguing against the common assumption that reward ubiquitously benefits skill retention. Collectively, these results suggest that punishment impacts skilled behavior more than reward in a complex, task dependent fashion. PMID:27786302
The impact of reward and punishment on skill learning depends on task demands.

PubMed

Steel, Adam; Silson, Edward H; Stagg, Charlotte J; Baker, Chris I

2016-10-27

Reward and punishment motivate behavior, but it is unclear exactly how they impact skill performance and whether the effect varies across skills. The present study investigated the effect of reward and punishment in both a sequencing skill and a motor skill context. Participants trained on either a sequencing skill (serial reaction time task) or a motor skill (force-tracking task). Skill knowledge was tested immediately after training, and again 1 hour, 24-48 hours, and 30 days after training. We found a dissociation of the effects of reward and punishment on the tasks, primarily reflecting the impact of punishment. While punishment improved serial reaction time task performance, it impaired force-tracking task performance. In contrast to prior literature, neither reward nor punishment benefitted memory retention, arguing against the common assumption that reward ubiquitously benefits skill retention. Collectively, these results suggest that punishment impacts skilled behavior more than reward in a complex, task dependent fashion.
An efficient and accurate approach to MTE-MART for time-resolved tomographic PIV

NASA Astrophysics Data System (ADS)

Lynch, K. P.; Scarano, F.

2015-03-01

The motion-tracking-enhanced MART (MTE-MART; Novara et al. in Meas Sci Technol 21:035401, 2010) has demonstrated the potential to increase the accuracy of tomographic PIV by the combined use of a short sequence of non-simultaneous recordings. A clear bottleneck of the MTE-MART technique has been its computational cost. For large datasets comprising time-resolved sequences, MTE-MART becomes unaffordable and has been barely applied even for the analysis of densely seeded tomographic PIV datasets. A novel implementation is proposed for tomographic PIV image sequences, which strongly reduces the computational burden of MTE-MART, possibly below that of regular MART. The method is a sequential algorithm that produces a time-marching estimation of the object intensity field based on an enhanced guess, which is built upon the object reconstructed at the previous time instant. As the method becomes effective after a number of snapshots (typically 5-10), the sequential MTE-MART (SMTE) is most suited for time-resolved sequences. The computational cost reduction due to SMTE simply stems from the fewer MART iterations required for each time instant. Moreover, the method yields superior reconstruction quality and higher velocity field measurement precision when compared with both MART and MTE-MART. The working principle is assessed in terms of computational effort, reconstruction quality and velocity field accuracy with both synthetic time-resolved tomographic images of a turbulent boundary layer and two experimental databases documented in the literature. The first is the time-resolved data of flow past an airfoil trailing edge used in the study of Novara and Scarano (Exp Fluids 52:1027-1041, 2012); the second is a swirling jet in a water flow. In both cases, the effective elimination of ghost particles is demonstrated in number and intensity within a short temporal transient of 5-10 frames, depending on the seeding density. The increased value of the velocity space-time correlation coefficient demonstrates the increased velocity field accuracy of SMTE compared with MART.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.

PubMed

Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

2014-09-01

The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Sequencing consolidates molecular markers with plant breeding practice.

PubMed

Yang, Huaan; Li, Chengdao; Lam, Hon-Ming; Clements, Jonathan; Yan, Guijun; Zhao, Shancen

2015-05-01

Plenty of molecular markers have been developed by contemporary sequencing technologies, whereas few of them are successfully applied in breeding, thus we present a review on how sequencing can facilitate marker-assisted selection in plant breeding. The growing global population and shrinking arable land area require efficient plant breeding. Novel strategies assisted by certain markers have proven effective for genetic gains. Fortunately, cutting-edge sequencing technologies bring us a deluge of genomes and genetic variations, enlightening the potential of marker development. However, a large gap still exists between the potential of molecular markers and actual plant breeding practices. In this review, we discuss marker-assisted breeding from a historical perspective, describe the road from crop sequencing to breeding, and highlight how sequencing facilitates the application of markers in breeding practice.
Infrared reflectivity investigation of the phase transition sequence in Pr0.5Ca0.5MnO3

NASA Astrophysics Data System (ADS)

Ribeiro, J. L.; Vieira, L. G.; Gomes, I. T.; Araújo, J. P.; Tavares, P.; Almeida, B. G.

2016-06-01

This work reports an infrared reflectivity study of the phase transition sequence observed in Pr0.5Ca0.5MnO3. The need to measure over an extended spectral range in order to properly take into account the effects of the high frequency polaronic absorption is circumvented by adopting a simple approximate method, based on the asymmetry present in the Kramers Kronig inversion of the phonon spectrum. The temperature dependence of the phonon optical conductivity is then investigated by monitoring the behavior of three relevant spectral moments of the optical conductivity. This combined methodology allows us to disclose subtle effects of the orbital, charge and magnetic orders on the lattice dynamics of the compound. The characteristic transition temperatures inferred from the spectroscopic measurements are compared and correlated with those obtained from the temperature dependence of the induced magnetization and electrical resistivity.
Imaging different components of a tectonic tremor sequence in southwestern Japan using an automatic statistical detection and location method

NASA Astrophysics Data System (ADS)

Poiata, Natalia; Vilotte, Jean-Pierre; Bernard, Pascal; Satriano, Claudio; Obara, Kazushige

2018-06-01

In this study, we demonstrate the capability of an automatic network-based detection and location method to extract and analyse different components of tectonic tremor activity by analysing a 9-day energetic tectonic tremor sequence occurring at the downdip extension of the subducting slab in southwestern Japan. The applied method exploits the coherency of multiscale, frequency-selective characteristics of non-stationary signals recorded across the seismic network. Use of different characteristic functions, in the signal processing step of the method, allows to extract and locate the sources of short-duration impulsive signal transients associated with low-frequency earthquakes and of longer-duration energy transients during the tectonic tremor sequence. Frequency-dependent characteristic functions, based on higher-order statistics' properties of the seismic signals, are used for the detection and location of low-frequency earthquakes. This allows extracting a more complete (˜6.5 times more events) and time-resolved catalogue of low-frequency earthquakes than the routine catalogue provided by the Japan Meteorological Agency. As such, this catalogue allows resolving the space-time evolution of the low-frequency earthquakes activity in great detail, unravelling spatial and temporal clustering, modulation in response to tide, and different scales of space-time migration patterns. In the second part of the study, the detection and source location of longer-duration signal energy transients within the tectonic tremor sequence is performed using characteristic functions built from smoothed frequency-dependent energy envelopes. This leads to a catalogue of longer-duration energy sources during the tectonic tremor sequence, characterized by their durations and 3-D spatial likelihood maps of the energy-release source regions. The summary 3-D likelihood map for the 9-day tectonic tremor sequence, built from this catalogue, exhibits an along-strike spatial segmentation of the long-duration energy-release regions, matching the large-scale clustering features evidenced from the low-frequency earthquake's activity analysis. Further examination of the two catalogues showed that the extracted short-duration low-frequency earthquakes activity coincides in space, within about 10-15 km distance, with the longer-duration energy sources during the tectonic tremor sequence. This observation provides a potential constraint on the size of the longer-duration energy-radiating source region in relation with the clustering of low-frequency earthquakes activity during the analysed tectonic tremor sequence. We show that advanced statistical network-based methods offer new capabilities for automatic high-resolution detection, location and monitoring of different scale-components of tectonic tremor activity, enriching existing slow earthquakes catalogues. Systematic application of such methods to large continuous data sets will allow imaging the slow transient seismic energy-release activity at higher resolution, and therefore, provide new insights into the underlying multiscale mechanisms of slow earthquakes generation.
Imaging different components of a tectonic tremor sequence in southwestern Japan using an automatic statistical detection and location method

NASA Astrophysics Data System (ADS)

Poiata, Natalia; Vilotte, Jean-Pierre; Bernard, Pascal; Satriano, Claudio; Obara, Kazushige

2018-02-01

In this study, we demonstrate the capability of an automatic network-based detection and location method to extract and analyse different components of tectonic tremor activity by analysing a 9-day energetic tectonic tremor sequence occurring at the down-dip extension of the subducting slab in southwestern Japan. The applied method exploits the coherency of multi-scale, frequency-selective characteristics of non-stationary signals recorded across the seismic network. Use of different characteristic functions, in the signal processing step of the method, allows to extract and locate the sources of short-duration impulsive signal transients associated with low-frequency earthquakes and of longer-duration energy transients during the tectonic tremor sequence. Frequency-dependent characteristic functions, based on higher-order statistics' properties of the seismic signals, are used for the detection and location of low-frequency earthquakes. This allows extracting a more complete (˜6.5 times more events) and time-resolved catalogue of low-frequency earthquakes than the routine catalogue provided by the Japan Meteorological Agency. As such, this catalogue allows resolving the space-time evolution of the low-frequency earthquakes activity in great detail, unravelling spatial and temporal clustering, modulation in response to tide, and different scales of space-time migration patterns. In the second part of the study, the detection and source location of longer-duration signal energy transients within the tectonic tremor sequence is performed using characteristic functions built from smoothed frequency-dependent energy envelopes. This leads to a catalogue of longer-duration energy sources during the tectonic tremor sequence, characterized by their durations and 3-D spatial likelihood maps of the energy-release source regions. The summary 3-D likelihood map for the 9-day tectonic tremor sequence, built from this catalogue, exhibits an along-strike spatial segmentation of the long-duration energy-release regions, matching the large-scale clustering features evidenced from the low-frequency earthquake's activity analysis. Further examination of the two catalogues showed that the extracted short-duration low-frequency earthquakes activity coincides in space, within about 10-15 km distance, with the longer-duration energy sources during the tectonic tremor sequence. This observation provides a potential constraint on the size of the longer-duration energy-radiating source region in relation with the clustering of low-frequency earthquakes activity during the analysed tectonic tremor sequence. We show that advanced statistical network-based methods offer new capabilities for automatic high-resolution detection, location and monitoring of different scale-components of tectonic tremor activity, enriching existing slow earthquakes catalogues. Systematic application of such methods to large continuous data sets will allow imaging the slow transient seismic energy-release activity at higher resolution, and therefore, provide new insights into the underlying multi-scale mechanisms of slow earthquakes generation.
Treetrimmer: a method for phylogenetic dataset size reduction.

PubMed

Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M

2013-04-12

With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.
Comparison of pulse sequences for R1-based electron paramagnetic resonance oxygen imaging.

PubMed

Epel, Boris; Halpern, Howard J

2015-05-01

Electron paramagnetic resonance (EPR) spin-lattice relaxation (SLR) oxygen imaging has proven to be an indispensable tool for assessing oxygen partial pressure in live animals. EPR oxygen images show remarkable oxygen accuracy when combined with high precision and spatial resolution. Developing more effective means for obtaining SLR rates is of great practical, biological and medical importance. In this work we compared different pulse EPR imaging protocols and pulse sequences to establish advantages and areas of applicability for each method. Tests were performed using phantoms containing spin probes with oxygen concentrations relevant to in vivo oxymetry. We have found that for small animal size objects the inversion recovery sequence combined with the filtered backprojection reconstruction method delivers the best accuracy and precision. For large animals, in which large radio frequency energy deposition might be critical, free induction decay and three pulse stimulated echo sequences might find better practical usage. Copyright © 2015 Elsevier Inc. All rights reserved.

Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

PubMed

Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C

2018-06-01

Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

PubMed Central

Porter, Teresita M.; Golding, G. Brian

2012-01-01

Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215
Stimulation and inhibition of bacterial growth by caffeine dependent on chloramphenicol and a phenolic uncoupler--a ternary toxicity study using microfluid segment technique.

PubMed

Cao, Jialan; Kürsten, Dana; Schneider, Steffen; Köhler, J Michael

2012-10-01

A droplet-based microfluidic technique for the fast generation of three dimensional concentration spaces within nanoliter segments was introduced. The technique was applied for the evaluation of the effect of two selected antibiotic substances on the toxicity and activation of bacterial growth by caffeine. Therefore a three-dimensional concentration space was completely addressed by generating large sequences with about 1150 well separated microdroplets containing 216 different combinations of concentrations. To evaluate the toxicity of the ternary mixtures a time-resolved miniaturized optical double endpoint detection unit using a microflow-through fluorimeter and a two channel microflow-through photometer was used for the simultaneous analysis of changes on the endogenous cellular fluorescence signal and on the cell density of E. coli cultivated inside 500 nL microfluid segments. Both endpoints supplied similar results for the dose related cellular response. Strong non-linear combination effects, concentration dependent stimulation and the formation of activity summits on bolographic maps were determined. The results reflect a complex response of growing bacterial cultures in dependence on the combined effectors. A strong caffeine induced enhancement of bacterial growth was found at sublethal chloramphenicol and sublethal 2,4-dinitrophenol concentrations. The reliability of the method was proved by a high redundancy of fluidic experiments. The results indicate the importance of multi-parameter investigations for toxicological studies and prove the potential of the microsegmented flow technique for such requirements.
Genetic abnormalities in myelodysplasia and secondary acute myeloid leukemia: impact on outcome of stem cell transplantation

PubMed Central

Yoshizato, Tetsuichi; Nannya, Yasuhito; Atsuta, Yoshiko; Shiozawa, Yusuke; Iijima-Yamashita, Yuka; Yoshida, Kenichi; Shiraishi, Yuichi; Suzuki, Hiromichi; Nagata, Yasunobu; Sato, Yusuke; Kakiuchi, Nobuyuki; Matsuo, Keitaro; Onizuka, Makoto; Kataoka, Keisuke; Chiba, Kenichi; Tanaka, Hiroko; Ueno, Hiroo; Nakagawa, Masahiro M.; Przychodzen, Bartlomiej; Haferlach, Claudia; Kern, Wolfgang; Aoki, Kosuke; Itonaga, Hidehiro; Kanda, Yoshinobu; Sekeres, Mikkael A.; Maciejewski, Jaroslaw P.; Haferlach, Torsten; Miyazaki, Yasushi; Horibe, Keizo; Sanada, Masashi; Miyano, Satoru; Makishima, Hideki

2017-01-01

Genetic alterations, including mutations and copy-number alterations, are central to the pathogenesis of myelodysplastic syndromes and related diseases (myelodysplasia), but their roles in allogeneic stem cell transplantation have not fully been studied in a large cohort of patients. We enrolled 797 patients who had been diagnosed with myelodysplasia at initial presentation and received transplantation via the Japan Marrow Donor Program. Targeted-capture sequencing was performed to identify mutations in 69 genes, together with copy-number alterations, whose effects on transplantation outcomes were investigated. We identified 1776 mutations and 927 abnormal copy segments among 617 patients (77.4%). In multivariate modeling using Cox proportional-hazards regression, genetic factors explained 30% of the total hazards for overall survival; clinical characteristics accounted for 70% of risk. TP53 and RAS-pathway mutations, together with complex karyotype (CK) as detected by conventional cytogenetics and/or sequencing-based analysis, negatively affected posttransplant survival independently of clinical factors. Regardless of disease subtype, TP53-mutated patients with CK were characterized by unique genetic features and associated with an extremely poor survival with frequent early relapse, whereas outcomes were substantially better in TP53-mutated patients without CK. By contrast, the effects of RAS-pathway mutations depended on disease subtype and were confined to myelodysplastic/myeloproliferative neoplasms (MDS/MPNs). Our results suggest that TP53 and RAS-pathway mutations predicted a dismal prognosis, when associated with CK and MDS/MPNs, respectively. However, for patients with mutated TP53 or CK alone, long-term survival could be obtained with transplantation. Clinical sequencing provides vital information for accurate prognostication in transplantation. PMID:28223278
Genetic abnormalities in myelodysplasia and secondary acute myeloid leukemia: impact on outcome of stem cell transplantation.

PubMed

Yoshizato, Tetsuichi; Nannya, Yasuhito; Atsuta, Yoshiko; Shiozawa, Yusuke; Iijima-Yamashita, Yuka; Yoshida, Kenichi; Shiraishi, Yuichi; Suzuki, Hiromichi; Nagata, Yasunobu; Sato, Yusuke; Kakiuchi, Nobuyuki; Matsuo, Keitaro; Onizuka, Makoto; Kataoka, Keisuke; Chiba, Kenichi; Tanaka, Hiroko; Ueno, Hiroo; Nakagawa, Masahiro M; Przychodzen, Bartlomiej; Haferlach, Claudia; Kern, Wolfgang; Aoki, Kosuke; Itonaga, Hidehiro; Kanda, Yoshinobu; Sekeres, Mikkael A; Maciejewski, Jaroslaw P; Haferlach, Torsten; Miyazaki, Yasushi; Horibe, Keizo; Sanada, Masashi; Miyano, Satoru; Makishima, Hideki; Ogawa, Seishi

2017-04-27

Genetic alterations, including mutations and copy-number alterations, are central to the pathogenesis of myelodysplastic syndromes and related diseases (myelodysplasia), but their roles in allogeneic stem cell transplantation have not fully been studied in a large cohort of patients. We enrolled 797 patients who had been diagnosed with myelodysplasia at initial presentation and received transplantation via the Japan Marrow Donor Program. Targeted-capture sequencing was performed to identify mutations in 69 genes, together with copy-number alterations, whose effects on transplantation outcomes were investigated. We identified 1776 mutations and 927 abnormal copy segments among 617 patients (77.4%). In multivariate modeling using Cox proportional-hazards regression, genetic factors explained 30% of the total hazards for overall survival; clinical characteristics accounted for 70% of risk. TP53 and RAS-pathway mutations, together with complex karyotype (CK) as detected by conventional cytogenetics and/or sequencing-based analysis, negatively affected posttransplant survival independently of clinical factors. Regardless of disease subtype, TP53 -mutated patients with CK were characterized by unique genetic features and associated with an extremely poor survival with frequent early relapse, whereas outcomes were substantially better in TP53 -mutated patients without CK. By contrast, the effects of RAS-pathway mutations depended on disease subtype and were confined to myelodysplastic/myeloproliferative neoplasms (MDS/MPNs). Our results suggest that TP53 and RAS-pathway mutations predicted a dismal prognosis, when associated with CK and MDS/MPNs, respectively. However, for patients with mutated TP53 or CK alone, long-term survival could be obtained with transplantation. Clinical sequencing provides vital information for accurate prognostication in transplantation. © 2017 by The American Society of Hematology.
Effects of 16S rDNA sampling on estimates of the number of endosymbiont lineages in sucking lice

PubMed Central

Burleigh, J. Gordon; Light, Jessica E.; Reed, David L.

2016-01-01

Phylogenetic trees can reveal the origins of endosymbiotic lineages of bacteria and detect patterns of co-evolution with their hosts. Although taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships are based on few available bacterial sequences. Here we examined how different sampling strategies of Gammaproteobacteria sequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from other Gammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. Sampling of 16S rDNA sequences affects the estimates of endosymbiont diversity in sucking lice until we reach a threshold of genetic diversity, the size of which depends on the sampling strategy. Sampling by maximizing the diversity of 16S rDNA sequences is more efficient than randomly sampling available 16S rDNA sequences. Although simulation results validate estimates of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that the precise number of endosymbiont origins is still uncertain. PMID:27547523
Functional plasticity of the N/OFQ-NOP receptor system determines analgesic properties of NOP receptor agonists

PubMed Central

Schröder, W; Lambert, D G; Ko, M C; Koch, T

2014-01-01

Despite high sequence similarity between NOP (nociceptin/orphanin FQ opioid peptide) and opioid receptors, marked differences in endogenous ligand selectivity, signal transduction, phosphorylation, desensitization, internalization and trafficking have been identified; underscoring the evolutionary difference between NOP and opioid receptors. Activation of NOP receptors affects nociceptive transmission in a site-specific manner, with antinociceptive effects prevailing after peripheral and spinal activation, and pronociceptive effects after supraspinal activation in rodents. The net effect of systemically administered NOP receptor agonists on nociception is proposed to depend on the relative contribution of peripheral, spinal and supraspinal activation, and this may depend on experimental conditions. Functional expression and regulation of NOP receptors at peripheral and central sites of the nociceptive pathway exhibits a high degree of plasticity under conditions of neuropathic and inflammatory pain. In rodents, systemically administered NOP receptor agonists exerted antihypersensitive effects in models of neuropathic and inflammatory pain. However, they were largely ineffective in acute pain while concomitantly evoking severe motor side effects. In contrast, systemic administration of NOP receptor agonists to non-human primates (NHPs) exerted potent and efficacious antinociception in the absence of motor and sedative side effects. The reason for this species difference with respect to antinociceptive efficacy and tolerability is not clear. Moreover, co-activation of NOP and μ-opioid peptide (MOP) receptors synergistically produced antinociception in NHPs. Hence, both selective NOP receptor as well as NOP/MOP receptor agonists may hold potential for clinical use as analgesics effective in conditions of acute and chronic pain. PMID:24762001
The siRNA Non-seed Region and Its Target Sequences Are Auxiliary Determinants of Off-Target Effects.

PubMed

Kamola, Piotr J; Nakano, Yuko; Takahashi, Tomoko; Wilson, Paul A; Ui-Tei, Kumiko

2015-12-01

RNA interference (RNAi) is a powerful tool for post-transcriptional gene silencing. However, the siRNA guide strand may bind unintended off-target transcripts via partial sequence complementarity by a mechanism closely mirroring micro RNA (miRNA) silencing. To better understand these off-target effects, we investigated the correlation between sequence features within various subsections of siRNA guide strands, and its corresponding target sequences, with off-target activities. Our results confirm previous reports that strength of base-pairing in the siRNA seed region is the primary factor determining the efficiency of off-target silencing. However, the degree of downregulation of off-target transcripts with shared seed sequence is not necessarily similar, suggesting that there are additional auxiliary factors that influence the silencing potential. Here, we demonstrate that both the melting temperature (Tm) in a subsection of siRNA non-seed region, and the GC contents of its corresponding target sequences, are negatively correlated with the efficiency of off-target effect. Analysis of experimentally validated miRNA targets demonstrated a similar trend, indicating a putative conserved mechanistic feature of seed region-dependent targeting mechanism. These observations may prove useful as parameters for off-target prediction algorithms and improve siRNA 'specificity' design rules.
High throughput sequencing analysis of RNA libraries reveals the influences of initial library and PCR methods on SELEX efficiency.

PubMed

Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J; Burnett, John C; Zhou, Jiehua

2016-09-22

The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct "biased sequences" and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the "biased sequences" was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy.
Statistical processing of large image sequences.

PubMed

Khellah, F; Fieguth, P; Murray, M J; Allen, M

2005-01-01

The dynamic estimation of large-scale stochastic image sequences, as frequently encountered in remote sensing, is important in a variety of scientific applications. However, the size of such images makes conventional dynamic estimation methods, for example, the Kalman and related filters, impractical. In this paper, we present an approach that emulates the Kalman filter, but with considerably reduced computational and storage requirements. Our approach is illustrated in the context of a 512 x 512 image sequence of ocean surface temperature. The static estimation step, the primary contribution here, uses a mixture of stationary models to accurately mimic the effect of a nonstationary prior, simplifying both computational complexity and modeling. Our approach provides an efficient, stable, positive-definite model which is consistent with the given correlation structure. Thus, the methods of this paper may find application in modeling and single-frame estimation.
Metabolic network prediction through pairwise rational kernels.

PubMed

Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian

2014-09-26

Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.
Characterizing the galaxy populations within different environments in the RCS2319 supercluster

NASA Astrophysics Data System (ADS)

Delahaye, Anna; Webb, Tracy

We present the results of a multi-wavelength photometric study of the high redshift supercluster RCS2319+00. RCS2319+00 is a high-redshift (z ~ 0.9) supercluster comprising three spectroscopically confrmed cluster cores discovered in the Red Sequence Cluster Survey (RCS) (Gladders & Yee 2005). Core proximities and merger rates estimate coalescence into a 1015 M ⊙ cluster by z ~ 0.5 (Gilbank et al. 2008). Spectroscopic studies of the system have revealed over 300 supercluster members located in the cores and several infalling groups (Faloon et al. 2013). RCS2319 presents a diverse range of dynamical systems and densities making it an ideal laboratory in which to study the effects of environment on galaxy properties. Imaging in optical and near infrared (griz' from MegaCam, JK s from WIRCam, both at CFHT), as well as 3.6 μm and 4.5μm from IRAC have enabled the assembly of a large photometric catalogue. Coupled with an extensive spectroscopic survey (Faloon et al. 2013) providing nearly 2400 redshifts across the field, photometric redshifts were determined using the template fitting code EAZY (Brammer et al. 2008). Nearly 80 000 photometric redshifts were measured providing a sample of nearly 3000 cluster members. To investigate effects of global environment, analysis was done utilizing a friend-of-friends group finding algorithm identifying several large and small infalling groups along with the three cluster cores. The cores are found to be dominated by massive, red galaxies and the field galaxies are populated by low mass, blue galaxies, as is the case in the local universe. Interestingly, the large groups exhibit intermediate properties between field and core populations, suggesting possible pre-processing as they are being accreted into the core halos. Relative fifth-nearest neighbour overdensity, log(1+δ5), is used as a proxy for local environment to investigate environmental dependence on galaxy colour. While there is an overall dependence of colour on local density, when controlled for stellar mass the dependence largely disappears. Indeed, galaxy mass is the dominant factor in determining colour, with local density a secondary effect only noticeable in lower mass galaxies at the 3 σ level for both colour and red fraction. RCS2319+00 presents a rare opportunity to probe many different densities and environments all located within the same object. We're able to investigate how galaxy evolution is affected by the environment, from field galaxies to infalling to groups to dense cluster cores, as well as the different density regions within each environment.
Understanding the structural and dynamic consequences of DNA epigenetic modifications: Computational insights into cytosine methylation and hydroxymethylation

PubMed Central

Carvalho, Alexandra T P; Gouveia, Leonor; Kanna, Charan Raju; Wärmländer, Sebastian K T S; Platts, Jamie A; Kamerlin, Shina Caroline Lynn

2014-01-01

We report a series of molecular dynamics (MD) simulations of up to a microsecond combined simulation time designed to probe epigenetically modified DNA sequences. More specifically, by monitoring the effects of methylation and hydroxymethylation of cytosine in different DNA sequences, we show, for the first time, that DNA epigenetic modifications change the molecule's dynamical landscape, increasing the propensity of DNA toward different values of twist and/or roll/tilt angles (in relation to the unmodified DNA) at the modification sites. Moreover, both the extent and position of different modifications have significant effects on the amount of structural variation observed. We propose that these conformational differences, which are dependent on the sequence environment, can provide specificity for protein binding. PMID:25625845
Slice profile effects in 2D slice-selective MRI of hyperpolarized nuclei.

PubMed

Deppe, Martin H; Teh, Kevin; Parra-Robles, Juan; Lee, Kuan J; Wild, Jim M

2010-02-01

This work explores slice profile effects in 2D slice-selective gradient-echo MRI of hyperpolarized nuclei. Two different sequences were investigated: a Spoiled Gradient Echo sequence with variable flip angle (SPGR-VFA) and a balanced Steady-State Free Precession (SSFP) sequence. It is shown that in SPGR-VFA the distribution of flip angles across the slice present in any realistically shaped radiofrequency (RF) pulse leads to large excess signal from the slice edges in later RF views, which results in an undesired non-constant total transverse magnetization, potentially exceeding the initial value by almost 300% for the last RF pulse. A method to reduce this unwanted effect is demonstrated, based on dynamic scaling of the slice selection gradient. SSFP sequences with small to moderate flip angles (<40 degrees ) are also shown to preserve the slice profile better than the most commonly used SPGR sequence with constant flip angle (SPGR-CFA). For higher flip angles, the slice profile in SSFP evolves in a manner similar to SPGR-CFA, with depletion of polarization in the center of the slice. Copyright 2009 Elsevier Inc. All rights reserved.
Characterizing genomic alterations in cancer by complementary functional associations | Office of Cancer Genomics

Cancer.gov

Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment.
Directed targeting of chromatin to the nuclear lamina is mediated by chromatin state and A-type lamins.

PubMed

Harr, Jennifer C; Luperchio, Teresa Romeo; Wong, Xianrong; Cohen, Erez; Wheelan, Sarah J; Reddy, Karen L

2015-01-05

Nuclear organization has been implicated in regulating gene activity. Recently, large developmentally regulated regions of the genome dynamically associated with the nuclear lamina have been identified. However, little is known about how these lamina-associated domains (LADs) are directed to the nuclear lamina. We use our tagged chromosomal insertion site system to identify small sequences from borders of fibroblast-specific variable LADs that are sufficient to target these ectopic sites to the nuclear periphery. We identify YY1 (Ying-Yang1) binding sites as enriched in relocating sequences. Knockdown of YY1 or lamin A/C, but not lamin A, led to a loss of lamina association. In addition, targeted recruitment of YY1 proteins facilitated ectopic LAD formation dependent on histone H3 lysine 27 trimethylation and histone H3 lysine di- and trimethylation. Our results also reveal that endogenous loci appear to be dependent on lamin A/C, YY1, H3K27me3, and H3K9me2/3 for maintenance of lamina-proximal positioning. © 2015 Harr et al.
Variation of b and p values from aftershocks sequences along the Mexican subduction zone and their relation to plate characteristics

NASA Astrophysics Data System (ADS)

Ávila-Barrientos, L.; Zúñiga, F. R.; Rodríguez-Pérez, Q.; Guzmán-Speziale, M.

2015-11-01

Aftershock sequences along the Mexican subduction margin (between coordinates 110ºW and 91ºW) were analyzed by means of the p value from the Omori-Utsu relation and the b value from the Gutenberg-Richter relation. We focused on recent medium to large (Mw > 5.6) events considered susceptible of generating aftershock sequences suitable for analysis. The main goal was to try to find a possible correlation between aftershock parameters and plate characteristics, such as displacement rate, age and segmentation. The subduction regime of Mexico is one of the most active regions of the world with a high frequency of occurrence of medium to large events and plate characteristics change along the subduction margin. Previous studies have observed differences in seismic source characteristics at the subduction regime, which may indicate a difference in rheology and possible segmentation. The results of the analysis of the aftershock sequences indicate a slight tendency for p values to decrease from west to east with increasing of plate age although a statistical significance is undermined by the small number of aftershocks in the sequences, a particular feature distinctive of the region as compared to other world subduction regimes. The b values show an opposite, increasing trend towards the east even though the statistical significance is not enough to warrant the validation of such a trend. A linear regression between both parameters provides additional support for the inverse relation. Moreover, we calculated the seismic coupling coefficient, showing a direct relation with the p and b values. While we cannot undoubtedly confirm the hypothesis that aftershock generation depends on certain tectonic characteristics (age, thickness, temperature), our results do not reject it thus encouraging further study into this question.
Exogenous L-Arginine Attenuates the Effects of Angiotensin II on Renal Hemodynamics and the Pressure Natriuresis-Diuresis Relationship

PubMed Central

Das, Satarupa; Mattson, David L.

2014-01-01

SUMMARY Administration of exogenous L-Arginine (L-Arg) attenuates Angiotensin II (AngII)-mediated hypertension and kidney disease in rats. The present study assessed renal hemodynamics and pressure-diuresis-natriuresis in anesthetized rats infused with vehicle, AngII (20 ng/kg/min, iv) or AngII + L-Arg (300 µg/kg/min, iv). Increasing renal perfusion pressure (RPP) from approximately 100 to 140 mmHg resulted in a 9–10 fold increase in urine flow and sodium excretion rate in control animals. In comparison, AngII infusion significantly reduced renal blood flow (RBF) and glomerular filtration rate (GFR) by 40–42% and blunted the pressure-dependent increase in urine flow and sodium excretion rate by 54–58% at elevated RPP. Supplementation of L-Arg reversed the vasoconstrictor effects of AngII and restored pressure-dependent diuresis to levels not significantly different from control rats. Experiments in isolated aortic rings were performed to assess L-Arg effects on the vasculature. Dose-dependent contraction to AngII (10−10M to 10−7M) was observed with a maximal force equal to 27±3% of the response to 10−5M phenylephrine. Contraction to 10−7M AngII was blunted by 75±3% with 10−4M L-Arg. The influence of L-Arg to blunt AngII mediated contraction was eliminated by endothelial denudation or incubation with nitric oxide synthase inhibitors. Moreover, the addition of 10−3M cationic or neutral amino acids, which compete with L-Arg for cellular uptake, blocked the effect of L-Arg. Anionic amino acids did not influence the effects of L-Arg on AngII-mediated contraction. These studies indicate that L-Arg blunts AngII-mediated vascular contraction by an endothelial- and NOS-dependent mechanism involving cellular uptake of L-Arg. PMID:24472006
Enzymatic Synthesis of Self-assembled Dicer Substrate RNA Nanostructures for Programmable Gene Silencing.

PubMed

Jang, Bora; Kim, Boyoung; Kim, Hyunsook; Kwon, Hyokyoung; Kim, Minjeong; Seo, Yunmi; Colas, Marion; Jeong, Hansaem; Jeong, Eun Hye; Lee, Kyuri; Lee, Hyukjin

2018-06-08

Enzymatic synthesis of RNA nanostructures is achieved by isothermal rolling circle transcription (RCT). Each arm of RNA nanostructures provides a functional role of Dicer substrate RNA inducing sequence specific RNA interference (RNAi). Three different RNAi sequences (GFP, RFP, and BFP) are incorporated within the three-arm junction RNA nanostructures (Y-RNA). The template and helper DNA strands are designed for the large-scale in vitro synthesis of RNA strands to prepare self-assembled Y-RNA. Interestingly, Dicer processing of Y-RNA is highly influenced by its physical structure and different gene silencing activity is achieved depending on its arm length and overhang. In addition, enzymatic synthesis allows the preparation of various Y-RNA structures using a single DNA template offering on demand regulation of multiple target genes.
Structures of Bacterial Biosynthetic Arginine Decarboxylases

DOE Office of Scientific and Technical Information (OSTI.GOV)

F Forouhar; S Lew; J Seetharaman

2011-12-31

Biosynthetic arginine decarboxylase (ADC; also known as SpeA) plays an important role in the biosynthesis of polyamines from arginine in bacteria and plants. SpeA is a pyridoxal-5'-phosphate (PLP)-dependent enzyme and shares weak sequence homology with several other PLP-dependent decarboxylases. Here, the crystal structure of PLP-bound SpeA from Campylobacter jejuni is reported at 3.0 {angstrom} resolution and that of Escherichia coli SpeA in complex with a sulfate ion is reported at 3.1 {angstrom} resolution. The structure of the SpeA monomer contains two large domains, an N-terminal TIM-barrel domain followed by a {beta}-sandwich domain, as well as two smaller helical domains. Themore » TIM-barrel and {beta}-sandwich domains share structural homology with several other PLP-dependent decarboxylases, even though the sequence conservation among these enzymes is less than 25%. A similar tetramer is observed for both C. jejuni and E. coli SpeA, composed of two dimers of tightly associated monomers. The active site of SpeA is located at the interface of this dimer and is formed by residues from the TIM-barrel domain of one monomer and a highly conserved loop in the {beta}-sandwich domain of the other monomer. The PLP cofactor is recognized by hydrogen-bonding, {pi}-stacking and van der Waals interactions.« less

Coordination sequences and information spreading in small-world networks

NASA Astrophysics Data System (ADS)

Herrero, Carlos P.

2002-10-01

We study the spread of information in small-world networks generated from different d-dimensional regular lattices, with d=1, 2, and 3. With this purpose, we analyze by numerical simulations the behavior of the coordination sequence, e.g., the average number of sites C(n) that can be reached from a given node of the network in n steps along its bonds. For sufficiently large networks, we find an asymptotic behavior C(n)~ρn, with a constant ρ that depends on the network dimension d and on the rewiring probability p (which measures the disorder strength of a given network). A simple model of information spreading in these networks is studied, assuming that only a fraction q of the network sites are active. The number of active nodes reached in n steps has an asymptotic form λn, λ being a constant that depends on p and q, as well as on the dimension d of the underlying lattice. The information spreading presents two different regimes depending on the value of λ: For λ>1 the information propagates along the whole system, and for λ<1 the spreading is damped and the information remains confined in a limited region of the network. We discuss the connection of these results with site percolation in small-world networks.
Enabling large-scale next-generation sequence assembly with Blacklight

PubMed Central

Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.

2014-01-01

Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974
Clustalnet: the joining of Clustal and CORBA.

PubMed

Campagne, F

2000-07-01

Performing sequence alignment operations from a different program than the original sequence alignment code, and/or through a network connection, is often required. Interactive alignment editors and large-scale biological data analysis are common examples where such a flexibility is important. Interoperability between the alignment engine and the client should be obtained regardless of the architectures and programming languages of the server and client. Clustalnet, a Clustal alignment CORBA server is described, which was developed on the basis of Clustalw. This server brings the robustness of the algorithms and implementations of Clustal to a new level of reuse. A Clustalnet server object can be accessed from a program, transparently through the network. We present interfaces to perform the alignment operations and to control these operations via immutable contexts. The interfaces that select the contexts do not depend on the nature of the operation to be performed, making the design modular. The IDL interfaces presented here are not specific to Clustal and can be implemented on top of different sequence alignment algorithm implementations.
Harmonic Analysis of Sedimentary Cyclic Sequences in Kansas, Midcontinent, USA

USGS Publications Warehouse

Merriam, D.F.; Robinson, J.E.

1997-01-01

Several stratigraphic sequences in the Upper Carboniferous (Pennsylvanian) in Kansas (Midcontinent, USA) were analyzed quantitatively for periodic repetitions. The sequences were coded by lithologic type into strings of datasets. The strings then were analyzed by an adaptation of a one-dimensional Fourier transform analysis and examined for evidence of periodicity. The method was tested using different states in coding to determine the robustness of the method and data. The most persistent response is in multiples of 8-10 ft (2.5-3.0 m) and probably is dependent on the depositional thickness of the original lithologic units. Other cyclicities occurred in multiples of the basic frequency of 8-10 with persistent ones at 22 and 30 feet (6.5-9.0 m) and large ones at 80 and 160 feet (25-50 m). These levels of thickness relate well to the basic cyclothem and megacyclothem as measured on outcrop. We propose that this approach is a suitable one for analyzing cyclic events in the stratigraphic record.
Retroviral DNA Integration Directed by HIV Integration Protein in Vitro

NASA Astrophysics Data System (ADS)

Bushman, Frederic D.; Fujiwara, Tamio; Craigie, Robert

1990-09-01

Efficient retroviral growth requires integration of a DNA copy of the viral RNA genome into a chromosome of the host. As a first step in analyzing the mechanism of integration of human immunodeficiency virus (HIV) DNA, a cell-free system was established that models the integration reaction. The in vitro system depends on the HIV integration (IN) protein, which was partially purified from insect cells engineered to express IN protein in large quantities. Integration was detected in a biological assay that scores the insertion of a linear DNA containing HIV terminal sequences into a λ DNA target. Some integration products generated in this assay contained five-base pair duplications of the target DNA at the recombination junctions, a characteristic of HIV integration in vivo; the remaining products contained aberrant junctional sequences that may have been produced in a variation of the normal reaction. These results indicate that HIV IN protein is the only viral protein required to insert model HIV DNA sequences into a target DNA in vitro.
Investigation of FANCA gene in Fanconi anaemia patients in Iran.

PubMed

Moghadam, Ali Akbar Saffar; Mahjoubi, Frouzandeh; Reisi, Nahid; Vosough, Parvaneh

2016-02-01

Fanconi anaemia (FA) is a syndrome with a predisposition to bone marrow failure, congenital anomalies and malignancies. It is characterized by cellular hypersensitivity to cross-linking agents such as mitomycin C (MMC). In the present study, a new approach was selected to investigate FANCA (Fanconi anaemia complementation group A) gene in patients clinically diagnosed with cellular hypersensitivity to DNA cross-linking agent MMC. Chromosomal breakage analysis was performed to prove the diagnosis of Fanconi anaemia in 318 families. Of these, 70 families had a positive result. Forty families agreed to molecular genetic testing. In total, there were 27 patients with unknown complementary types. Genomic DNA was extracted and total RNA was isolated from fresh whole blood of the patients. The first-strand cDNA was synthesized and the cDNA of each patient was then tested with 21 pairs of overlapping primers. High resolution melting curve analysis was used to screen FANCA, and LinReg software version 1.7 was utilized for analysis of expression. In total, six sequence alterations were identified, which included two stop codons, two frames-shift mutations, one large deletion and one amino acid exchange. FANCA expression was downregulated in patients who had sequence alterations. The results of the present study show that high resolution melting (HRM) curve analysis may be useful in the detection of sequence alteration. It is simpler and more cost-effective than the multiplex ligation-dependent probe amplification (MLPA) procedure.
Postretrieval new learning does not reliably induce human memory updating via reconsolidation.

PubMed

Hardwicke, Tom E; Taqi, Mahdi; Shanks, David R

2016-05-10

Reconsolidation theory proposes that retrieval can destabilize an existing memory trace, opening a time-dependent window during which that trace is amenable to modification. Support for the theory is largely drawn from nonhuman animal studies that use invasive pharmacological or electroconvulsive interventions to disrupt a putative postretrieval restabilization ("reconsolidation") process. In human reconsolidation studies, however, it is often claimed that postretrieval new learning can be used as a means of "updating" or "rewriting" existing memory traces. This proposal warrants close scrutiny because the ability to modify information stored in the memory system has profound theoretical, clinical, and ethical implications. The present study aimed to replicate and extend a prominent 3-day motor-sequence learning study [Walker MP, Brakefield T, Hobson JA, Stickgold R (2003) Nature 425(6958):616-620] that is widely cited as a convincing demonstration of human reconsolidation. However, in four direct replication attempts (n = 64), we did not observe the critical impairment effect that has previously been taken to indicate disruption of an existing motor memory trace. In three additional conceptual replications (n = 48), we explored the broader validity of reconsolidation-updating theory by using a declarative recall task and sequences similar to phone numbers or computer passwords. Rather than inducing vulnerability to interference, memory retrieval appeared to aid the preservation of existing sequence knowledge relative to a no-retrieval control group. These findings suggest that memory retrieval followed by new learning does not reliably induce human memory updating via reconsolidation.
The Cutaneous Rabbit Revisited

ERIC Educational Resources Information Center

Flach, Rudiger; Haggard, Patrick

2006-01-01

In the cutaneous rabbit effect (CRE), a tactile event (so-called attractee tap) is mislocalized toward an adjacent attractor tap. The effect depends on the time interval between the taps. The authors delivered sequences of taps to the forearm and asked participants to report the location of one of the taps. The authors replicated the original CRE…
Regression and Data Mining Methods for Analyses of Multiple Rare Variants in the Genetic Analysis Workshop 17 Mini-Exome Data

PubMed Central

Bailey-Wilson, Joan E.; Brennan, Jennifer S.; Bull, Shelley B; Culverhouse, Robert; Kim, Yoonhee; Jiang, Yuan; Jung, Jeesun; Li, Qing; Lamina, Claudia; Liu, Ying; Mägi, Reedik; Niu, Yue S.; Simpson, Claire L.; Wang, Libo; Yilmaz, Yildiz E.; Zhang, Heping; Zhang, Zhaogong

2012-01-01

Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors. PMID:22128066
Extending earthquakes' reach through cascading.

PubMed

Marsan, David; Lengliné, Olivier

2008-02-22

Earthquakes, whatever their size, can trigger other earthquakes. Mainshocks cause aftershocks to occur, which in turn activate their own local aftershock sequences, resulting in a cascade of triggering that extends the reach of the initial mainshock. A long-lasting difficulty is to determine which earthquakes are connected, either directly or indirectly. Here we show that this causal structure can be found probabilistically, with no a priori model nor parameterization. Large regional earthquakes are found to have a short direct influence in comparison to the overall aftershock sequence duration. Relative to these large mainshocks, small earthquakes collectively have a greater effect on triggering. Hence, cascade triggering is a key component in earthquake interactions.
Helper-dependent adenoviral vectors for liver-directed gene therapy

PubMed Central

Brunetti-Pierri, Nicola; Ng, Philip

2011-01-01

Helper-dependent adenoviral (HDAd) vectors devoid of all viral-coding sequences are promising non-integrating vectors for liver-directed gene therapy because they have a large cloning capacity, can efficiently transduce a wide variety of cell types from various species independent of the cell cycle and can result in long-term transgene expression without chronic toxicity. The main obstacle preventing clinical applications of HDAd for liver-directed gene therapy is the host innate inflammatory response against the vector capsid proteins that occurs shortly after intravascular vector administration resulting in acute toxicity, the severity of which is dependent on vector dose. Intense efforts have been focused on elucidating the factors involved in this acute response and various strategies have been investigated to improve the therapeutic index of HDAd vectors. These strategies have yielded encouraging results with the potential for clinical translation. PMID:21470977
A large deletion in the succinate dehydrogenase B gene (SDHB) in a Japanese patient with abdominal paraganglioma and concomitant metastasis.

PubMed

Kodama, Hitomi; Iihara, Masatoshi; Nissato, Sumiko; Isobe, Kazumasa; Kawakami, Yasushi; Okamoto, Takahiro; Takekoshi, Kazuhiro

2010-01-01

Recently, mutations in nuclear genes encoding two mitochondrial complex II subunit proteins, Succinate dehydrogenase D (SDHD) and SDHB, have been found to be associated with the development of familial pheochromocytomas and paragangliomas (hereditary pheochromocytoma/paraganglioma syndrome: HPPS). Growing evidence suggests that the mutation of SDHB is highly associated with abdominal paraganglioma and the following distant metastasis (malignant paraganglioma). In the present study, we used multiplex ligation dependent probe amplification (MLPA) analysis to identify a large heterozygous SDHB gene deletion encompassing sequences corresponding to the promoter region, in addition to exon 1 and exon 2 malignant paraganglioma patient in whom previously characterized SDHB mutations were undetectable. This is the first Japanese case report of malignant paraganglioma, with a large SDHB deletions. Our present findings strongly support the notion that large deletions in the SDHB gene should be considered in patients lacking characterized SDHB mutations.
Augmented brain function by coordinated reset stimulation with slowly varying sequences.

PubMed

Zeitler, Magteld; Tass, Peter A

2015-01-01

Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.
Augmented brain function by coordinated reset stimulation with slowly varying sequences

PubMed Central

Zeitler, Magteld; Tass, Peter A.

2015-01-01

Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS. PMID:25873867
De Novo Centromere Formation and Centromeric Sequence Expansion in Wheat and its Wide Hybrids.

PubMed

Guo, Xiang; Su, Handong; Shi, Qinghua; Fu, Shulan; Wang, Jing; Zhang, Xiangqi; Hu, Zanmin; Han, Fangpu

2016-04-01

Centromeres typically contain tandem repeat sequences, but centromere function does not necessarily depend on these sequences. We identified functional centromeres with significant quantitative changes in the centromeric retrotransposons of wheat (CRW) contents in wheat aneuploids (Triticum aestivum) and the offspring of wheat wide hybrids. The CRW signals were strongly reduced or essentially lost in some wheat ditelosomic lines and in the addition lines from the wide hybrids. The total loss of the CRW sequences but the presence of CENH3 in these lines suggests that the centromeres were formed de novo. In wheat and its wide hybrids, which carry large complex genomes or no sequenced genome, we performed CENH3-ChIP-dot-blot methods alone or in combination with CENH3-ChIP-seq and identified the ectopic genomic sequences present at the new centromeres. In adcdition, the transcription of the identified DNA sequences was remarkably increased at the new centromere, suggesting that the transcription of the corresponding sequences may be associated with de novo centromere formation. Stable alien chromosomes with two and three regions containing CRW sequences induced by centromere breakage were observed in the wheat-Th. elongatum hybrid derivatives, but only one was a functional centromere. In wheat-rye (Secale cereale) hybrids, the rye centromere-specific sequences spread along the chromosome arms and may have caused centromere expansion. Frequent and significant quantitative alterations in the centromere sequence via chromosomal rearrangement have been systematically described in wheat wide hybridizations, which may affect the retention or loss of the alien chromosomes in the hybrids. Thus, the centromere behavior in wide crosses likely has an important impact on the generation of biodiversity, which ultimately has implications for speciation.
De Novo Centromere Formation and Centromeric Sequence Expansion in Wheat and its Wide Hybrids

PubMed Central

Fu, Shulan; Wang, Jing; Zhang, Xiangqi; Hu, Zanmin; Han, Fangpu

2016-01-01

Centromeres typically contain tandem repeat sequences, but centromere function does not necessarily depend on these sequences. We identified functional centromeres with significant quantitative changes in the centromeric retrotransposons of wheat (CRW) contents in wheat aneuploids (Triticum aestivum) and the offspring of wheat wide hybrids. The CRW signals were strongly reduced or essentially lost in some wheat ditelosomic lines and in the addition lines from the wide hybrids. The total loss of the CRW sequences but the presence of CENH3 in these lines suggests that the centromeres were formed de novo. In wheat and its wide hybrids, which carry large complex genomes or no sequenced genome, we performed CENH3-ChIP-dot-blot methods alone or in combination with CENH3-ChIP-seq and identified the ectopic genomic sequences present at the new centromeres. In adcdition, the transcription of the identified DNA sequences was remarkably increased at the new centromere, suggesting that the transcription of the corresponding sequences may be associated with de novo centromere formation. Stable alien chromosomes with two and three regions containing CRW sequences induced by centromere breakage were observed in the wheat-Th. elongatum hybrid derivatives, but only one was a functional centromere. In wheat-rye (Secale cereale) hybrids, the rye centromere-specific sequences spread along the chromosome arms and may have caused centromere expansion. Frequent and significant quantitative alterations in the centromere sequence via chromosomal rearrangement have been systematically described in wheat wide hybridizations, which may affect the retention or loss of the alien chromosomes in the hybrids. Thus, the centromere behavior in wide crosses likely has an important impact on the generation of biodiversity, which ultimately has implications for speciation. PMID:27110907
Evidence for x -dependent proton color fluctuations in p A collisions at the CERN Large Hadron Collider

DOE PAGES

Alvioli, M.; Cole, B. A.; Frankfurt, L.; ...

2016-01-21

The centrality dependence of forward jet production in pA collisions at the Large Hadron Collider (LHC) has been found to grossly violate the Glauber model prediction in a way that depends on the x in the proton. In this paper, we argue that this modification pattern provides the first experimental evidence for x-dependent proton color fluctuation effects. On average, parton configurations in the projectile proton containing a parton with large x interact with a nuclear target with a significantly smaller than average cross section and have smaller than average size. We implement the effects of fluctuations of the interaction strengthmore » and, using the ATLAS analysis of how hadron production at backward rapidities depends on the number of wounded nucleons, make quantitative predictions for the centrality dependence of the jet production rate as a function of the x-dependent interaction strength σ(x). We find that σ(x) ~ 0.6(σ) gives a good description of the data at x = 0.6. Finally, these findings support an explanation of the European Muon Collaboration effect as arising from the suppression of small-size nucleon configurations in the nucleus.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Enciso, Marta, E-mail: m.enciso@latrobe.edu.au; Schütte, Christof, E-mail: schuette@zib.de; Zuse Institute Berlin, Berlin

We employ a recently developed coarse-grained model for peptides and proteins where the effect of pH is automatically included. We explore the effect of pH in the aggregation process of the amyloidogenic peptide KTVIIE and two related sequences, using three different pH environments. Simulations using large systems (24 peptides chains per box) allow us to describe the formation of realistic peptide aggregates. We evaluate the thermodynamic and kinetic implications of changes in sequence and pH upon peptide aggregation, and we discuss how a minimalistic coarse-grained model can account for these details.
Algorithms for optimizing cross-overs in DNA shuffling.

PubMed

He, Lu; Friedman, Alan M; Bailey-Kellogg, Chris

2012-03-21

DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library. This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing "runs" of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and β (15%), and beta-lactamases of varying identity (26-47%). Our methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable and more diverse chimeric libraries.
GENETICS IN ENDOCRINOLOGY: Genetic counseling for congenital hypogonadotropic hypogonadism and Kallmann syndrome: new challenges in the era of oligogenism and next-generation sequencing.

PubMed

Maione, Luigi; Dwyer, Andrew A; Francou, Bruno; Guiochon-Mantel, Anne; Binart, Nadine; Bouligand, Jérôme; Young, Jacques

2018-03-01

Congenital hypogonadotropic hypogonadism (CHH) and Kallmann syndrome (KS) are rare, related diseases that prevent normal pubertal development and cause infertility in affected men and women. However, the infertility carries a good prognosis as increasing numbers of patients with CHH/KS are now able to have children through medically assisted procreation. These are genetic diseases that can be transmitted to patients' offspring. Importantly, patients and their families should be informed of this risk and given genetic counseling. CHH and KS are phenotypically and genetically heterogeneous diseases in which the risk of transmission largely depends on the gene(s) responsible(s). Inheritance may be classically Mendelian yet more complex; oligogenic modes of transmission have also been described. The prevalence of oligogenicity has risen dramatically since the advent of massively parallel next-generation sequencing (NGS) in which tens, hundreds or thousands of genes are sequenced at the same time. NGS is medically and economically more efficient and more rapid than traditional Sanger sequencing and is increasingly being used in medical practice. Thus, it seems plausible that oligogenic forms of CHH/KS will be increasingly identified making genetic counseling even more complex. In this context, the main challenge will be to differentiate true oligogenism from situations when several rare variants that do not have a clear phenotypic effect are identified by chance. This review aims to summarize the genetics of CHH/KS and to discuss the challenges of oligogenic transmission and also its role in incomplete penetrance and variable expressivity in a perspective of genetic counseling. © 2018 European Society of Endocrinology.

Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

PubMed Central

2010-01-01

Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.

PubMed

David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A

2010-02-08

All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
Improved Model for Predicting the Free Energy Contribution of Dinucleotide Bulges to RNA Duplex Stability.

PubMed

Tomcho, Jeremy C; Tillman, Magdalena R; Znosko, Brent M

2015-09-01

Predicting the secondary structure of RNA is an intermediate in predicting RNA three-dimensional structure. Commonly, determining RNA secondary structure from sequence uses free energy minimization and nearest neighbor parameters. Current algorithms utilize a sequence-independent model to predict free energy contributions of dinucleotide bulges. To determine if a sequence-dependent model would be more accurate, short RNA duplexes containing dinucleotide bulges with different sequences and nearest neighbor combinations were optically melted to derive thermodynamic parameters. These data suggested energy contributions of dinucleotide bulges were sequence-dependent, and a sequence-dependent model was derived. This model assigns free energy penalties based on the identity of nucleotides in the bulge (3.06 kcal/mol for two purines, 2.93 kcal/mol for two pyrimidines, 2.71 kcal/mol for 5'-purine-pyrimidine-3', and 2.41 kcal/mol for 5'-pyrimidine-purine-3'). The predictive model also includes a 0.45 kcal/mol penalty for an A-U pair adjacent to the bulge and a -0.28 kcal/mol bonus for a G-U pair adjacent to the bulge. The new sequence-dependent model results in predicted values within, on average, 0.17 kcal/mol of experimental values, a significant improvement over the sequence-independent model. This model and new experimental values can be incorporated into algorithms that predict RNA stability and secondary structure from sequence.
Improving the realism of white matter numerical phantoms: a step towards a better understanding of the influence of structural disorders in diffusion MRI

NASA Astrophysics Data System (ADS)

Ginsburger, Kévin; Poupon, Fabrice; Beaujoin, Justine; Estournet, Delphine; Matuschke, Felix; Mangin, Jean-François; Axer, Markus; Poupon, Cyril

2018-02-01

White matter is composed of irregularly packed axons leading to a structural disorder in the extra-axonal space. Diffusion MRI experiments using oscillating gradient spin echo sequences have shown that the diffusivity transverse to axons in this extra-axonal space is dependent on the frequency of the employed sequence. In this study, we observe the same frequency-dependence using 3D simulations of the diffusion process in disordered media. We design a novel white matter numerical phantom generation algorithm which constructs biomimicking geometric configurations with few design parameters, and enables to control the level of disorder of the generated phantoms. The influence of various geometrical parameters present in white matter, such as global angular dispersion, tortuosity, presence of Ranvier nodes, beading, on the extra-cellular perpendicular diffusivity frequency dependence was investigated by simulating the diffusion process in numerical phantoms of increasing complexity and fitting the resulting simulated diffusion MR signal attenuation with an adequate analytical model designed for trapezoidal OGSE sequences. This work suggests that angular dispersion and especially beading have non-negligible effects on this extracellular diffusion metrics that may be measured using standard OGSE DW-MRI clinical protocols.
Culture dependent and independent analysis of bacterial communities associated with commercial salad leaf vegetables.

PubMed

Jackson, Colin R; Randolph, Kevin C; Osborn, Shelly L; Tyler, Heather L

2013-12-01

Plants harbor a diverse bacterial community, both as epiphytes on the plant surface and as endophytes within plant tissue. While some plant-associated bacteria act as plant pathogens or promote plant growth, others may be human pathogens. The aim of the current study was to determine the bacterial community composition of organic and conventionally grown leafy salad vegetables at the point of consumption using both culture-dependent and culture-independent methods. Total culturable bacteria on salad vegetables ranged from 8.0 × 10(3) to 5.5 × 10(8) CFU g(-1). The number of culturable endophytic bacteria from surface sterilized plants was significantly lower, ranging from 2.2 × 10(3) to 5.8 × 10(5) CFU g(-1). Cultured isolates belonged to six major bacterial phyla, and included representatives of Pseudomonas, Pantoea, Chryseobacterium, and Flavobacterium. Eleven different phyla and subphyla were identified by culture-independent pyrosequencing, with Gammaproteobacteria, Betaproteobacteria, and Bacteroidetes being the most dominant lineages. Other bacterial lineages identified (e.g. Firmicutes, Alphaproteobacteria, Acidobacteria, and Actinobacteria) typically represented less than 1% of sequences obtained. At the genus level, sequences classified as Pseudomonas were identified in all samples and this was often the most prevalent genus. Ralstonia sequences made up a greater portion of the community in surface sterilized than non-surface sterilized samples, indicating that it was largely endophytic, while Acinetobacter sequences appeared to be primarily associated with the leaf surface. Analysis of molecular variance indicated there were no significant differences in bacterial community composition between organic versus conventionally grown, or surface-sterilized versus non-sterilized leaf vegetables. While culture-independent pyrosequencing identified significantly more bacterial taxa, the dominant taxa from pyrosequence data were also detected by traditional culture-dependent methods. The use of pyrosequencing allowed for the identification of low abundance bacteria in leaf salad vegetables not detected by culture-dependent methods. The presence of a range of bacterial populations as endophytes presents an interesting phenomenon as these microorganisms cannot be removed by washing and are thus ingested during salad consumption.
Culture dependent and independent analysis of bacterial communities associated with commercial salad leaf vegetables

PubMed Central

2013-01-01

Background Plants harbor a diverse bacterial community, both as epiphytes on the plant surface and as endophytes within plant tissue. While some plant-associated bacteria act as plant pathogens or promote plant growth, others may be human pathogens. The aim of the current study was to determine the bacterial community composition of organic and conventionally grown leafy salad vegetables at the point of consumption using both culture-dependent and culture-independent methods. Results Total culturable bacteria on salad vegetables ranged from 8.0 × 103 to 5.5 × 108 CFU g-1. The number of culturable endophytic bacteria from surface sterilized plants was significantly lower, ranging from 2.2 × 103 to 5.8 × 105 CFU g-1. Cultured isolates belonged to six major bacterial phyla, and included representatives of Pseudomonas, Pantoea, Chryseobacterium, and Flavobacterium. Eleven different phyla and subphyla were identified by culture-independent pyrosequencing, with Gammaproteobacteria, Betaproteobacteria, and Bacteroidetes being the most dominant lineages. Other bacterial lineages identified (e.g. Firmicutes, Alphaproteobacteria, Acidobacteria, and Actinobacteria) typically represented less than 1% of sequences obtained. At the genus level, sequences classified as Pseudomonas were identified in all samples and this was often the most prevalent genus. Ralstonia sequences made up a greater portion of the community in surface sterilized than non-surface sterilized samples, indicating that it was largely endophytic, while Acinetobacter sequences appeared to be primarily associated with the leaf surface. Analysis of molecular variance indicated there were no significant differences in bacterial community composition between organic versus conventionally grown, or surface-sterilized versus non-sterilized leaf vegetables. While culture-independent pyrosequencing identified significantly more bacterial taxa, the dominant taxa from pyrosequence data were also detected by traditional culture-dependent methods. Conclusions The use of pyrosequencing allowed for the identification of low abundance bacteria in leaf salad vegetables not detected by culture-dependent methods. The presence of a range of bacterial populations as endophytes presents an interesting phenomenon as these microorganisms cannot be removed by washing and are thus ingested during salad consumption. PMID:24289725
Methylene blue binding to DNA with alternating AT base sequence: minor groove binding is favored over intercalation.

PubMed

Rohs, Remo; Sklenar, Heinz

2004-04-01

The results presented in this paper on methylene blue (MB) binding to DNA with AT alternating base sequence complement the data obtained in two former modeling studies of MB binding to GC alternating DNA. In the light of the large amount of experimental data for both systems, this theoretical study is focused on a detailed energetic analysis and comparison in order to understand their different behavior. Since experimental high-resolution structures of the complexes are not available, the analysis is based on energy minimized structural models of the complexes in different binding modes. For both sequences, four different intercalation structures and two models for MB binding in the minor and major groove have been proposed. Solvent electrostatic effects were included in the energetic analysis by using electrostatic continuum theory, and the dependence of MB binding on salt concentration was investigated by solving the non-linear Poisson-Boltzmann equation. We find that the relative stability of the different complexes is similar for the two sequences, in agreement with the interpretation of spectroscopic data. Subtle differences, however, are seen in energy decompositions and can be attributed to the change from symmetric 5'-YpR-3' intercalation to minor groove binding with increasing salt concentration, which is experimentally observed for the AT sequence at lower salt concentration than for the GC sequence. According to our results, this difference is due to the significantly lower non-electrostatic energy for the minor groove complex with AT alternating DNA, whereas the slightly lower binding energy to this sequence is caused by a higher deformation energy of DNA. The energetic data are in agreement with the conclusions derived from different spectroscopic studies and can also be structurally interpreted on the basis of the modeled complexes. The simple static modeling technique and the neglect of entropy terms and of non-electrostatic solute-solvent interactions, which are assumed to be nearly constant for the compared complexes of MB with DNA, seem to be justified by the results.
Time Separation Between Events in a Sequence: a Regional Property?

NASA Astrophysics Data System (ADS)

Muirwood, R.; Fitzenz, D. D.

2013-12-01

Earthquake sequences are loosely defined as events occurring too closely in time and space to appear unrelated. Depending on the declustering method, several, all, or no event(s) after the first large event might be recognized as independent mainshocks. It can therefore be argued that a probabilistic seismic hazard assessment (PSHA, traditionally dealing with mainshocks only) might already include the ground shaking effects of such sequences. Alternatively all but the largest event could be classified as an ';aftershock' and removed from the earthquake catalog. While in PSHA the question is only whether to keep or remove the events from the catalog, for Risk Management purposes, the community response to the earthquakes, as well as insurance risk transfer mechanisms, can be profoundly affected by the actual timing of events in such a sequence. In particular the repetition of damaging earthquakes over a period of weeks to months can lead to businesses closing and families evacuating from the region (as happened in Christchurch, New Zealand in 2011). Buildings that are damaged in the first earthquake may go on to be damaged again, even while they are being repaired. Insurance also functions around a set of critical timeframes - including the definition of a single 'event loss' for reinsurance recoveries within the 192 hour ';hours clause', the 6-18 month pace at which insurance claims are settled, and the annual renewal of insurance and reinsurance contracts. We show how temporal aspects of earthquake sequences need to be taken into account within models for Risk Management, and what time separation between events are most sensitive, both in terms of the modeled disruptions to lifelines and business activity as well as in the losses to different parties (such as insureds, insurers and reinsurers). We also explore the time separation between all events and between loss causing events for a collection of sequences from across the world and we point to the need to understand the rate controlling processes that determine such sequences per tectonic region and fluid/heat flow provinces.
The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

PubMed

Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

2012-03-15

Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
Statistical context shapes stimulus-specific adaptation in human auditory cortex.

PubMed

Herrmann, Björn; Henry, Molly J; Fromboluti, Elisa Kim; McAuley, J Devin; Obleser, Jonas

2015-04-01

Stimulus-specific adaptation is the phenomenon whereby neural response magnitude decreases with repeated stimulation. Inconsistencies between recent nonhuman animal recordings and computational modeling suggest dynamic influences on stimulus-specific adaptation. The present human electroencephalography (EEG) study investigates the potential role of statistical context in dynamically modulating stimulus-specific adaptation by examining the auditory cortex-generated N1 and P2 components. As in previous studies of stimulus-specific adaptation, listeners were presented with oddball sequences in which the presentation of a repeated tone was infrequently interrupted by rare spectral changes taking on three different magnitudes. Critically, the statistical context varied with respect to the probability of small versus large spectral changes within oddball sequences (half of the time a small change was most probable; in the other half a large change was most probable). We observed larger N1 and P2 amplitudes (i.e., release from adaptation) for all spectral changes in the small-change compared with the large-change statistical context. The increase in response magnitude also held for responses to tones presented with high probability, indicating that statistical adaptation can overrule stimulus probability per se in its influence on neural responses. Computational modeling showed that the degree of coadaptation in auditory cortex changed depending on the statistical context, which in turn affected stimulus-specific adaptation. Thus the present data demonstrate that stimulus-specific adaptation in human auditory cortex critically depends on statistical context. Finally, the present results challenge the implicit assumption of stationarity of neural response magnitudes that governs the practice of isolating established deviant-detection responses such as the mismatch negativity. Copyright © 2015 the American Physiological Society.
Modeling Interdependent and Periodic Real-World Action Sequences

PubMed Central

Kurashima, Takeshi; Althoff, Tim; Leskovec, Jure

2018-01-01

Mobile health applications, including those that track activities such as exercise, sleep, and diet, are becoming widely used. Accurately predicting human actions in the real world is essential for targeted recommendations that could improve our health and for personalization of these applications. However, making such predictions is extremely difficult due to the complexities of human behavior, which consists of a large number of potential actions that vary over time, depend on each other, and are periodic. Previous work has not jointly modeled these dynamics and has largely focused on item consumption patterns instead of broader types of behaviors such as eating, commuting or exercising. In this work, we develop a novel statistical model, called TIPAS, for Time-varying, Interdependent, and Periodic Action Sequences. Our approach is based on personalized, multivariate temporal point processes that model time-varying action propensities through a mixture of Gaussian intensities. Our model captures short-term and long-term periodic interdependencies between actions through Hawkes process-based self-excitations. We evaluate our approach on two activity logging datasets comprising 12 million real-world actions (e.g., eating, sleep, and exercise) taken by 20 thousand users over 17 months. We demonstrate that our approach allows us to make successful predictions of future user actions and their timing. Specifically, TIPAS improves predictions of actions, and their timing, over existing methods across multiple datasets by up to 156%, and up to 37%, respectively. Performance improvements are particularly large for relatively rare and periodic actions such as walking and biking, improving over baselines by up to 256%. This demonstrates that explicit modeling of dependencies and periodicities in real-world behavior enables successful predictions of future actions, with implications for modeling human behavior, app personalization, and targeting of health interventions. PMID:29780977
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

PubMed

Roca, Alberto I

2014-01-01

The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Fault trees and sequence dependencies

NASA Technical Reports Server (NTRS)

Dugan, Joanne Bechta; Boyd, Mark A.; Bavuso, Salvatore J.

1990-01-01

One of the frequently cited shortcomings of fault-tree models, their inability to model so-called sequence dependencies, is discussed. Several sources of such sequence dependencies are discussed, and new fault-tree gates to capture this behavior are defined. These complex behaviors can be included in present fault-tree models because they utilize a Markov solution. The utility of the new gates is demonstrated by presenting several models of the fault-tolerant parallel processor, which include both hot and cold spares.
Limits of transforming competence of SV40 nuclear and cytoplasmic large T mutants with altered Rb binding sequences.

PubMed

Tedesco, D; Fischer-Fantuzzi, L; Vesco, C

1993-03-01

Multiple amino acid substitutions were introduced into the SV40 large T region that harbors the retinoblastoma protein (Rb) binding site and the nuclear transport signal, changing either one or both of these determinants. Mutant activities were examined in a set of assays allowing different levels of transforming potential to be distinguished; phenotypic changes in established and pre-crisis rat embryo fibroblasts (REFs) were detected under isogenic cell conditions, and comparisons made with other established rodent cells. The limit of the transforming ability of mutants with important substitutions in the Rb binding site fell between two transformation levels of the same established rat cells. Such cells could be induced to form dense foci but not agar colonies (their parental pre-crises REFs, as expected, were untransformed either way). Nonetheless, agar colony induction was possible in other cell lines, such as mouse NIH3T3 and (for one of the mutants) rat F2408. All these mutants efficiently immortalized pre-crisis REFs. The transforming ability of cytoplasmic mutants appeared to depend on the integrity of the Rb-binding sequence to approximately the same extent as that of the wild-type large T, although evidence of in vivo Rb-cytoplasmic large T complexes was not found. The presence or absence of small t was critical when the transforming task of mutants was near the limit of their abilities.
Cerebellar activation during motor sequence learning is associated with subsequent transfer to new sequences.

PubMed

Shimizu, Renee E; Wu, Allan D; Knowlton, Barbara J

2016-12-01

Effective learning results not only in improved performance on a practiced task, but also in the ability to transfer the acquired knowledge to novel, similar tasks. Using a modified serial reaction time (RT) task, the authors examined the ability to transfer to novel sequences after practicing sequences in a repetitive order versus a nonrepeating interleaved order. Interleaved practice resulted in better performance on new sequences than repetitive practice. In a second study, participants practiced interleaved sequences in a functional MRI (fMRI) scanner and received a transfer test of novel sequences. Transfer ability was positively correlated with cerebellar blood oxygen level dependent activity during practice, indicating that greater cerebellar engagement during training resulted in better subsequent transfer performance. Interleaved practice may thus result in a more generalized representation that is robust to interference, and the degree of activation in the cerebellum may be a reflection of the instantiation and engagement of internal models. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
The Effects of Travel Path and Source Structure on the Character of Regional Distance Seismograms from Nuclear Explosions

DTIC Science & Technology

1991-12-27

and had a ML of 6.4. The earthquake sequence was very energetic, having a foreshock with a ML of 5.9 and three large aftershocks measuring 5.8, 5.6...regional data-A review, Bull. Seism. Soc. Am. 72, S89-S129. Smith, K. D., and K. F. Priestley (1988). The foreshock sequence of the 1986 Chalfant
Opposing effects of dopamine antagonism in a motor sequence task—tiapride increases cortical excitability and impairs motor learning

PubMed Central

Lissek, Silke; Vallana, Guido S.; Schlaffke, Lara; Lenz, Melanie; Dinse, Hubert R.; Tegenthoff, Martin

2014-01-01

The dopaminergic system is involved in learning and participates in the modulation of cortical excitability (CE). CE has been suggested as a marker of learning and use-dependent plasticity. However, results from separate studies on either motor CE or motor learning challenge this notion, suggesting opposing effects of dopaminergic modulation upon these parameters: while agonists decrease and antagonists increase CE, motor learning is enhanced by agonists and disturbed by antagonists. To examine whether this discrepancy persists when complex motor learning and motor CE are measured in the same experimental setup, we investigated the effects of dopaminergic (DA) antagonism upon both parameters and upon task-associated brain activation. Our results demonstrate that DA-antagonism has opposing effects upon motor CE and motor sequence learning. Tiapride did not alter baseline CE, but increased CE post training of a complex motor sequence while simultaneously impairing motor learning. Moreover, tiapride reduced activation in several brain regions associated with motor sequence performance, i.e., dorsolateral PFC (dlPFC), supplementary motor area (SMA), Broca's area, cingulate and caudate body. Blood-oxygenation-level-dependent (BOLD) intensity in anterior cingulate and caudate body, but not CE, correlated with performance across groups. In summary, our results do not support a concept of CE as a general marker of motor learning, since they demonstrate that a straightforward relation of increased CE and higher learning success does not apply to all instances of motor learning. At least for complex motor tasks that recruit a network of brain regions outside motor cortex, CE in primary motor cortex is probably no central determinant for learning success. PMID:24994972
A VLT/UVES spectroscopy study of O2 stars in the LMC

NASA Astrophysics Data System (ADS)

Doran, Emile I.; Crowther, Paul A.

2011-01-01

We have analysed VLT/UVES spectra of six O2 stars within the Large Magellanic Cloud using the non-LTE atmospheric code CMFGEN. A range of physical properties was determined by employing a temperature calibration based upon N IV - N V diagnostics. Wind properties were also obtained from the Hα line, while CNO surface abundances were supplied through various diagnostics. Our results reveal effective temperatures in excess of T_{eff} ˜50 kK in all cases. We also addressed their evolutionary status and favour a mass dependent division. For lower masses ≤100 M⊙Mar, an O2 star follows the classical sequence, evolving from dwarf on to giant, through to supergiant. At higher masses, the dwarf phase may be circumvented and instead O2 stars begin their lives as giants or supergiants, evolving to the H-rich WN stage within ˜1.5 Myr.
Designing nucleosomal force sensors

NASA Astrophysics Data System (ADS)

Tompitak, M.; de Bruin, L.; Eslami-Mossallam, B.; Schiessel, H.

2017-05-01

About three quarters of our DNA is wrapped into nucleosomes: DNA spools with a protein core. It is well known that the affinity of a given DNA stretch to be incorporated into a nucleosome depends on the geometry and elasticity of the basepair sequence involved, causing the positioning of nucleosomes. Here we show that DNA elasticity can have a much deeper effect on nucleosomes than just their positioning: it affects their "identities". Employing a recently developed computational algorithm, the mutation Monte Carlo method, we design nucleosomes with surprising physical characteristics. Unlike any other nucleosomes studied so far, these nucleosomes are short-lived when put under mechanical tension whereas other physical properties are largely unaffected. This suggests that the nucleosome, the most abundant DNA-protein complex in our cells, might more properly be considered a class of complexes with a wide array of physical properties, and raises the possibility that evolution has shaped various nucleosome species according to their genomic context.
A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins.

PubMed

Sawle, Lucas; Ghosh, Kingshuk

2015-08-28

A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.

Impact of target mRNA structure on siRNA silencing efficiency: A large-scale study.

PubMed

Gredell, Joseph A; Berger, Angela K; Walton, S Patrick

2008-07-01

The selection of active siRNAs is generally based on identifying siRNAs with certain sequence and structural properties. However, the efficiency of RNA interference has also been shown to depend on the structure of the target mRNA, primarily through studies using exogenous transcripts with well-defined secondary structures in the vicinity of the target sequence. While these studies provide a means for examining the impact of target sequence and structure independently, the predicted secondary structures for these transcripts are often not reflective of structures that form in full-length, native mRNAs where interactions can occur between relatively remote segments of the mRNAs. Here, using a combination of experimental results and analysis of a large dataset, we demonstrate that the accessibility of certain local target structures on the mRNA is an important determinant in the gene silencing ability of siRNAs. siRNAs targeting the enhanced green fluorescent protein were chosen using a minimal siRNA selection algorithm followed by classification based on the predicted minimum free energy structures of the target transcripts. Transfection into HeLa and HepG2 cells revealed that siRNAs targeting regions of the mRNA predicted to have unpaired 5'- and 3'-ends resulted in greater gene silencing than regions predicted to have other types of secondary structure. These results were confirmed by analysis of gene silencing data from previously published siRNAs, which showed that mRNA target regions unpaired at either the 5'-end or 3'-end were silenced, on average, approximately 10% more strongly than target regions unpaired in the center or primarily paired throughout. We found this effect to be independent of the structure of the siRNA guide strand. Taken together, these results suggest minimal requirements for nucleation of hybridization between the siRNA guide strand and mRNA and that both mRNA and guide strand structure should be considered when choosing candidate siRNAs. (c) 2008 Wiley Periodicals, Inc.
Molecular biology and genetic diversity of Rift Valley fever virus

PubMed Central

Ikegami, Tetsuro

2013-01-01

Rift Valley fever virus (RVFV), a member of the family Bunyaviridae, genus Phlebovirus, is the causative agent of Rift Valley fever (RVF), a mosquito-borne disease of ruminant animals and humans. The generation of a large sequence database has facilitated studies of the evolution and spread of the virus. Bayesian analyses indicate that currently circulating strains of RVFV are descended from an ancestral species that emerged from a natural reservoir in Africa when large-scale cattle and sheep farming were introduced during the 19th century. Viruses descended from multiple lineages persist in that region, through infection of reservoir animals and vertical transmission in mosquitoes, emerging in years of heavy rainfall to cause epizootics and epidemics. On a number of occasions, viruses from these lineages have been transported outside the enzootic region through the movement of infected animals or mosquitoes, triggering outbreaks in countries such as Egypt, Saudi Arabia, Mauritania and Madagascar, where RVF had not previously been seen. Such viruses could potentially become established in their new environments through infection of wild and domestic ruminants and other animals and vertical transmission in local mosquito species. Despite their extensive geographic dispersion, all strains of RVFV remain closely related at the nucleotide and amino acid level. The high degree of conservation of genes encoding the virion surface glycoproteins suggests that a single vaccine should protect against all currently circulating RVFV strains. Similarly, preservation of the sequence of the RNA-dependent RNA polymerase across viral lineages implies that antiviral drugs targeting the enzyme should be effective against all strains. Researchers should be encouraged to collect additional RVFV isolates and perform whole-genome sequencing and phylogenetic analysis, so as to enhance our understanding of the continuing evolution of this important virus. This review forms part of a series of invited papers in Antiviral Research on the genetic diversity of emerging viruses. PMID:22710362
Molecular biology and genetic diversity of Rift Valley fever virus.

PubMed

Ikegami, Tetsuro

2012-09-01

Rift Valley fever virus (RVFV), a member of the family Bunyaviridae, genus Phlebovirus, is the causative agent of Rift Valley fever (RVF), a mosquito-borne disease of ruminant animals and humans. The generation of a large sequence database has facilitated studies of the evolution and spread of the virus. Bayesian analyses indicate that currently circulating strains of RVFV are descended from an ancestral species that emerged from a natural reservoir in Africa when large-scale cattle and sheep farming were introduced during the 19th century. Viruses descended from multiple lineages persist in that region, through infection of reservoir animals and vertical transmission in mosquitoes, emerging in years of heavy rainfall to cause epizootics and epidemics. On a number of occasions, viruses from these lineages have been transported outside the enzootic region through the movement of infected animals or mosquitoes, triggering outbreaks in countries such as Egypt, Saudi Arabia, Mauritania and Madagascar, where RVF had not previously been seen. Such viruses could potentially become established in their new environments through infection of wild and domestic ruminants and other animals and vertical transmission in local mosquito species. Despite their extensive geographic dispersion, all strains of RVFV remain closely related at the nucleotide and amino acid level. The high degree of conservation of genes encoding the virion surface glycoproteins suggests that a single vaccine should protect against all currently circulating RVFV strains. Similarly, preservation of the sequence of the RNA-dependent RNA polymerase across viral lineages implies that antiviral drugs targeting the enzyme should be effective against all strains. Researchers should be encouraged to collect additional RVFV isolates and perform whole-genome sequencing and phylogenetic analysis, so as to enhance our understanding of the continuing evolution of this important virus. This review forms part of a series of invited papers in Antiviral Research on the genetic diversity of emerging viruses. Copyright © 2012 Elsevier B.V. All rights reserved.
Impact of target mRNA structure on siRNA silencing efficiency: a large-scale study

PubMed Central

Gredell, Joseph A.; Berger, Angela K.; Walton, S. Patrick

2009-01-01

The selection of active siRNAs is generally based on identifying siRNAs with certain sequence and structural properties. However, the efficiency of RNA interference has also been shown to depend on the structure of the target mRNA, primarily through studies using exogenous transcripts with well-defined secondary structures in the vicinity of the target sequence. While these studies provide a means for examining the impact of target sequence and structure independently, the predicted secondary structures for these transcripts are often not reflective of structures that form in full-length, native mRNAs where interactions can occur between relatively remote segments of the mRNAs. Here, using a combination of experimental results and analysis of a large dataset, we demonstrate that the accessibility of certain local target structures on the mRNA is an important determinant in the gene silencing ability of siRNAs. siRNAs targeting the enhanced green fluorescent protein were chosen using a minimal siRNA selection algorithm followed by classification based on the predicted minimum free energy structures of the target transcripts. Transfection into HeLa and HepG2 cells revealed that siRNAs targeting regions of the mRNA predicted to have unpaired 5’- and 3’-ends resulted in greater gene silencing than regions predicted to have other types of secondary structure. These results were confirmed by analysis of gene silencing data from previously published siRNAs, which showed that mRNA target regions unpaired at either the 5’-end or 3’-end were silenced, on average, ~10% more strongly than target regions unpaired in the center or primarily paired throughout. We found this effect to be independent of the structure of the siRNA guide strand. Taken together, these results suggest minimal requirements for nucleation of hybridization between the siRNA guide strand and mRNA and that both mRNA and guide strand structure should be considered when choosing candidate siRNAs. PMID:18306428
The Human EST Ontology Explorer: a tissue-oriented visualization system for ontologies distribution in human EST collections.

PubMed

Merelli, Ivan; Caprera, Andrea; Stella, Alessandra; Del Corvo, Marcello; Milanesi, Luciano; Lazzari, Barbara

2009-10-15

The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) http://www.itb.cnr.it/ptp/human_est_explorer, a web facility for comparison of expression levels among libraries from several healthy and diseased tissues. The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed. The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.
African swine fever virus encodes two genes which share significant homology with the two largest subunits of DNA-dependent RNA polymerases.

PubMed Central

Yáñez, R J; Boursnell, M; Nogal, M L; Yuste, L; Viñuela, E

1993-01-01

A random sequencing strategy applied to two large SalI restriction fragments (SB and SD) of the African swine fever virus (ASFV) genome revealed that they might encode proteins similar to the two largest RNA polymerase subunits of eukaryotes, poxviruses and Escherichia coli. After further mapping by dot-blot hybridization, two large open reading frames (ORFs) were completely sequenced. The first ORF (NP1450L) encodes a protein of 1450 amino acids with extensive similarity to the largest subunit of RNA polymerases. The second one (EP1242L) codes for a protein of 1242 amino acids similar to the second largest RNA polymerase subunit. Proteins NP1450L and EP1242L are more similar to the corresponding subunits of eukaryotic RNA polymerase II than to those of vaccinia virus, the prototype poxvirus, which shares many functional characteristics with ASFV. ORFs NP1450L and EP1242L are mainly expressed late in ASFV infection, after the onset of DNA replication. Images PMID:8506138
Analysis of large 16S rRNA Illumina data sets: Impact of singleton read filtering on microbial community description.

PubMed

Auer, Lucas; Mariadassou, Mahendra; O'Donohue, Michael; Klopp, Christophe; Hernandez-Raquet, Guillermina

2017-11-01

Next-generation sequencing technologies give access to large sets of data, which are extremely useful in the study of microbial diversity based on 16S rRNA gene. However, the production of such large data sets is not only marred by technical biases and sequencing noise but also increases computation time and disc space use. To improve the accuracy of OTU predictions and overcome both computations, storage and noise issues, recent studies and tools suggested removing all single reads and low abundant OTUs, considering them as noise. Although the effect of applying an OTU abundance threshold on α- and β-diversity has been well documented, the consequences of removing single reads have been poorly studied. Here, we test the effect of singleton read filtering (SRF) on microbial community composition using in silico simulated data sets as well as sequencing data from synthetic and real communities displaying different levels of diversity and abundance profiles. Scalability to large data sets is also assessed using a complete MiSeq run. We show that SRF drastically reduces the chimera content and computational time, enabling the analysis of a complete MiSeq run in just a few minutes. Moreover, SRF accurately determines the actual community diversity: the differences in α- and β-community diversity obtained with SRF and standard procedures are much smaller than the intrinsic variability of technical and biological replicates. © 2017 John Wiley & Sons Ltd.
XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.

PubMed

Yu, Yao; Hu, Hao; Bohlender, Ryan J; Hu, Fulan; Chen, Jiun-Sheng; Holt, Carson; Fowler, Jerry; Guthery, Stephen L; Scheet, Paul; Hildebrandt, Michelle A T; Yandell, Mark; Huff, Chad D

2018-04-06

High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
The effect of presentation rate on implicit sequence learning in aging.

PubMed

Foster, Chris M; Giovanello, Kelly S

2017-02-01

Implicit sequence learning is thought to be preserved in aging when the to-be learned associations are first-order; however, when associations are second-order, older adults (OAs) tend to experience deficits as compared to young adults (YAs). Two experiments were conducted using a first (Experiment 1) and second-order (Experiment 2) serial-reaction time task. Stimuli were presented at a constant rate of either 800 milliseconds (fast) or 1200 milliseconds (slow). Results indicate that both age groups learned first-order dependencies equally in both conditions. OAs and YAs also learned second-order dependencies, but the learning of lag-2 information was significantly impacted by the rate of presentation for both groups. OAs showed significant lag-2 learning in slow condition while YAs showed significant lag-2 learning in the fast condition. The sensitivity of implicit sequence learning to the rate of presentation supports the idea that OAs and YAs different processing speeds impact the ability to build complex associations across time and intervening events.
Efficient use of unlabeled data for protein sequence classification: a comparative study

PubMed Central

Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

2009-01-01

Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450
De Novo Protein Structure Prediction

NASA Astrophysics Data System (ADS)

Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

PubMed Central

Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

2001-01-01

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
An efficient approach to BAC based assembly of complex genomes.

PubMed

Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

2016-01-01

There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

PubMed

Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

2017-01-01

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Volume interpolated 3D-spoiled gradient echo sequence is better than dynamic contrast spin echo sequence for MRI detection of corticotropin secreting pituitary microadenomas.

PubMed

Kasaliwal, Rajeev; Sankhe, Shilpa S; Lila, Anurag R; Budyal, Sweta R; Jagtap, Varsha S; Sarathi, Vijaya; Kakade, Harshal; Bandgar, Tushar; Menon, Padmavathy S; Shah, Nalini S

2013-06-01

Various techniques have been attempted to increase the yield of magnetic resonance imaging (MRI) for localization of pituitary microadenomas in corticotropin (ACTH)-dependent Cushing's syndrome (CS). To compare the performance of dynamic contrast spin echo (DC-SE) and volume interpolated 3D-spoiled gradient echo (VI-SGE) MR sequences in the diagnostic evaluation of ACTH-dependent CS. Data was analysed retrospectively from a series of ACTH-dependent CS patients treated over 2-year period at a tertiary care referral centre (2009-2011). Thirty-six patients (24 female and 12 male) were diagnosed to have ACTH-dependent CS during the study period. All patients underwent MRI by both sequences during a single examination. Cases with negative and equivocal pituitary MR imaging underwent corticotropin-releasing hormone (CRH) stimulated bilateral inferior petrosal sinus sampling (BIPSS) to confirm pituitary origin of ACTH excess state. Thirty patients were finally diagnosed to have Cushing's disease (CD) [based on histopathology proof of adenoma and/or remission (partial/complete) of hypercortisolism postsurgery]. Six patients were diagnosed to have histopathologically proven ectopic CS. Of 30 patients with CD, 24 patients had microadenomas and 6 patients had macroadenomas. DC-SE MRI sequence was able to identify microadenomas in 16 of 24 patients, whereas postcontrast VI-SGE sequence was able to identify microadenomas in 21 of 24 patients. All six patients of ectopic CS had negative pituitary MR imaging by both techniques (specificity: 100%). VI-SGE MR sequence was better for localization of pituitary microadenomas particularly when DC-SE MR sequence is negative or equivocal and should be used in addition to DC-SE MR sequence for the evaluation of ACTH-dependent CS. © 2012 John Wiley & Sons Ltd.
Transcription blockage by homopurine DNA sequences: role of sequence composition and single-strand breaks

PubMed Central

Belotserkovskii, Boris P.; Neil, Alexander J.; Saleh, Syed Shayon; Shin, Jane Hae Soo; Mirkin, Sergei M.; Hanawalt, Philip C.

2013-01-01

The ability of DNA to adopt non-canonical structures can affect transcription and has broad implications for genome functioning. We have recently reported that guanine-rich (G-rich) homopurine-homopyrimidine sequences cause significant blockage of transcription in vitro in a strictly orientation-dependent manner: when the G-rich strand serves as the non-template strand [Belotserkovskii et al. (2010) Mechanisms and implications of transcription blockage by guanine-rich DNA sequences., Proc. Natl Acad. Sci. USA, 107, 12816–12821]. We have now systematically studied the effect of the sequence composition and single-stranded breaks on this blockage. Although substitution of guanine by any other base reduced the blockage, cytosine and thymine reduced the blockage more significantly than adenine substitutions, affirming the importance of both G-richness and the homopurine-homopyrimidine character of the sequence for this effect. A single-strand break in the non-template strand adjacent to the G-rich stretch dramatically increased the blockage. Breaks in the non-template strand result in much weaker blockage signals extending downstream from the break even in the absence of the G-rich stretch. Our combined data support the notion that transcription blockage at homopurine-homopyrimidine sequences is caused by R-loop formation. PMID:23275544
Rapidly rotating neutron stars in general relativity: Realistic equations of state

NASA Technical Reports Server (NTRS)

Cook, Gregory B.; Shapiro, Stuart L.; Teukolsky, Saul A.

1994-01-01

We construct equilibrium sequences of rotating neutron stars in general relativity. We compare results for 14 nuclear matter equations of state. We determine a number of important physical parameters for such stars, including the maximum mass and maximum spin rate. The stability of the configurations to quasi-radial perturbations is assessed. We employ a numerical scheme particularly well suited to handle rapid rotation and large departures from spherical symmetry. We provide an extensive tabulation of models for future reference. Two classes of evolutionary sequences of fixed baryon rest mass and entropy are explored: normal sequences, which behave very much like Newtonian sequences, and supramassive sequences, which exist for neutron stars solely because of general relativistic effects. Adiabatic dissipation of energy and angular momentum causes a star to evolve in quasi-stationary fashion along an evolutionary sequence. Supramassive sequences have masses exceeding the maximum mass of a nonrotating neutron star. A supramassive star evolves toward eventual catastrophic collapse to a black hole. Prior to collapse, the star actually spins up as it loses angular momentum, an effect that may provide an observable precursor to gravitational collapse to a black hole.
Sequence and Temperature Dependence of the End-to-End Collision Dynamics of Single-Stranded DNA

PubMed Central

Uzawa, Takanori; Isoshima, Takashi; Ito, Yoshihiro; Ishimori, Koichiro; Makarov, Dmitrii E.; Plaxco, Kevin W.

2013-01-01

Intramolecular collision dynamics play an essential role in biomolecular folding and function and, increasingly, in the performance of biomimetic technologies. To date, however, the quantitative studies of dynamics of single-stranded nucleic acids have been limited. Thus motivated, here we investigate the sequence composition, chain-length, viscosity, and temperature dependencies of the end-to-end collision dynamics of single-stranded DNAs. We find that both the absolute collision rate and the temperature dependencies of these dynamics are base-composition dependent, suggesting that base stacking interactions are a significant contributor. For example, whereas the end-to-end collision dynamics of poly-thymine exhibit simple, linear Arrhenius behavior, the behavior of longer poly-adenine constructs is more complicated. Specifically, 20- and 25-adenine constructs exhibit biphasic temperature dependencies, with their temperature dependences becoming effectively indistinguishable from that of poly-thymine above 335 K for 20-adenines and 328 K for 25-adenines. The differing Arrhenius behaviors of poly-thymine and poly-adenine and the chain-length dependence of the temperature at which poly-adenine crosses over to behave like poly-thymine can be explained by a barrier friction mechanism in which, at low temperatures, the energy barrier for the local rearrangement of poly-adenine becomes the dominant contributor to its end-to-end collision dynamics. PMID:23746521
A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

PubMed

Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

2015-04-11

Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.
Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing

PubMed Central

Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.

2015-01-01

For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622

Some links on this page may take you to non-federal websites. Their policies may differ from this site.