Lavery, Richard; Zakrzewska, Krystyna; Beveridge, David; Bishop, Thomas C.; Case, David A.; Cheatham, Thomas; Dixit, Surjit; Jayaram, B.; Lankas, Filip; Laughton, Charles; Maddocks, John H.; Michon, Alexis; Osman, Roman; Orozco, Modesto; Perez, Alberto; Singh, Tanya; Spackova, Nada; Sponer, Jiri
2010-01-01
It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA. PMID:19850719
Transient effects in π-pulse sequences in MAS solid-state NMR
NASA Astrophysics Data System (ADS)
Hellwagner, Johannes; Wili, Nino; Ibáñez, Luis Fábregas; Wittmann, Johannes J.; Meier, Beat H.; Ernst, Matthias
2018-02-01
Dipolar recoupling techniques that use isolated rotor-synchronized π pulses are commonly used in solid-state NMR spectroscopy to gain insight into the structure of biological molecules. These sequences excel through their simplicity, stability towards radio-frequency (rf) inhomogeneity, and low rf requirements. For a theoretical understanding of such sequences, we present a Floquet treatment based on an interaction-frame transformation including the chemical-shift offset dependence. This approach is applied to the homonuclear dipolar-recoupling sequence Radio-Frequency Driven Recoupling (RFDR) and the heteronuclear recoupling sequence Rotational Echo Double Resonance (REDOR). Based on the Floquet approach, we show the influence of effective fields caused by pulse transients and discuss the advantages of pulse-transient compensation. We demonstrate experimentally that the transfer efficiency for homonuclear recoupling can be doubled in some cases in model compounds as well as in simple peptides if pulse-transient compensation is applied to the π pulses. Additionally, we discuss the influence of various phase cycles on the recoupling efficiency in order to reduce the magnitude of effective fields. Based on the findings from RFDR, we are able to explain why the REDOR sequence does not suffer in the recoupling efficiency despite the presence of effective fields.
PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.
Wimmer, Katharina; Wernstedt, Annekatrin
2014-01-01
The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.
Hughes, Robert W; Vachon, François; Jones, Dylan M
2005-07-01
A novel attentional capture effect is reported in which visual-verbal serial recall was disrupted if a single deviation in the interstimulus interval occurred within otherwise regularly presented task-irrelevant spoken items. The degree of disruption was the same whether the temporal deviant was embedded in a sequence made up of a repeating item or a sequence of changing items. Moreover, the effect was evident during the presentation of the to-be-remembered sequence but not during rehearsal just prior to recall, suggesting that the encoding of sequences is particularly susceptible. The results suggest that attentional capture is due to a violation of an algorithm rather than an aggregate-based neural model and further undermine an attentional capture-based account of the classical changing-state irrelevant sound effect. ((c) 2005 APA, all rights reserved).
Masking as an effective quality control method for next-generation sequencing data analysis.
Yun, Sajung; Yun, Sijung
2014-12-13
Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
A two-stage stochastic rule-based model to determine pre-assembly buffer content
NASA Astrophysics Data System (ADS)
Gunay, Elif Elcin; Kula, Ufuk
2018-01-01
This study considers instant decision-making needs of the automobile manufactures for resequencing vehicles before final assembly (FA). We propose a rule-based two-stage stochastic model to determine the number of spare vehicles that should be kept in the pre-assembly buffer to restore the altered sequence due to paint defects and upstream department constraints. First stage of the model decides the spare vehicle quantities, where the second stage model recovers the scrambled sequence respect to pre-defined rules. The problem is solved by sample average approximation (SAA) algorithm. We conduct a numerical study to compare the solutions of heuristic model with optimal ones and provide following insights: (i) as the mismatch between paint entrance and scheduled sequence decreases, the rule-based heuristic model recovers the scrambled sequence as good as the optimal resequencing model, (ii) the rule-based model is more sensitive to the mismatch between the paint entrance and scheduled sequences for recovering the scrambled sequence, (iii) as the defect rate increases, the difference in recovery effectiveness between rule-based heuristic and optimal solutions increases, (iv) as buffer capacity increases, the recovery effectiveness of the optimization model outperforms heuristic model, (v) as expected the rule-based model holds more inventory than the optimization model.
HIV drug resistance surveillance for prioritizing treatment in resource-limited settings
Walensky, Rochelle P.; Weinstein, Milton C.; Yazdanpanah, Yazdan; Losina, Elena; Mercincavage, Lauren M.; Touré, Siaka; Divi, Nomita; Anglaret, Xavier; Goldie, Sue J.; Freedberg, Kenneth A.
2008-01-01
Background Sentinel testing programs for HIV drug resistance in resource-limited settings can inform policy on antiretroviral therapy (ART) and drug sequencing. Objective To examine the value of resistance surveillance in influencing recommendations toward effective and cost-effective sequencing of ART regimens. Methods A state-transition model of HIV infection was adapted to simulate clinical care in Côte d’Ivoire and evaluate the incremental cost-effectiveness of (1) no ART; (2) ART beginning with a non-nucleoside reverse transcriptase inhibitor (NNRTI)-based regimen followed by a boosted protease inhibitor (PI)-based regimen; and (3) ART beginning with a boosted PI-based regimen followed by an NNRTI-based regimen. Results At a 5% prevalence of NNRTI resistance, a strategy that started with a PI-based regimen had a smaller health benefit and higher cost-effectiveness ratio than a strategy that started with an NNRTI-based regimen (cost-effectiveness ratio $910/year of life saved). Results consistently favored initiation with an NNRTI-based regimen, regardless of the population prevalence of NNRTI resistance (up to 76%) and the efficacy of an NNRTI-based regimen in the setting of resistance. The most influential parameters on the cost-effectiveness of sequencing strategies were boosted PI-based regimen costs and the efficacy of this regimen when used as second-line therapy. Conclusions Drug costs and treatment efficacies, but not NNRTI resistance levels, were most influential in determining optimal HIV drug sequencing in Côte d’Ivoire. Results of surveillance for NNRTI resistance should not be used as a major guide to treatment policy in resource-limited settings. PMID:17457091
Chao, Michael C.; Pritchard, Justin R.; Zhang, Yanjia J.; Rubin, Eric J.; Livny, Jonathan; Davis, Brigid M.; Waldor, Matthew K.
2013-01-01
The coupling of high-density transposon mutagenesis to high-throughput DNA sequencing (transposon-insertion sequencing) enables simultaneous and genome-wide assessment of the contributions of individual loci to bacterial growth and survival. We have refined analysis of transposon-insertion sequencing data by normalizing for the effect of DNA replication on sequencing output and using a hidden Markov model (HMM)-based filter to exploit heretofore unappreciated information inherent in all transposon-insertion sequencing data sets. The HMM can smooth variations in read abundance and thereby reduce the effects of read noise, as well as permit fine scale mapping that is independent of genomic annotation and enable classification of loci into several functional categories (e.g. essential, domain essential or ‘sick’). We generated a high-resolution map of genomic loci (encompassing both intra- and intergenic sequences) that are required or beneficial for in vitro growth of the cholera pathogen, Vibrio cholerae. This work uncovered new metabolic and physiologic requirements for V. cholerae survival, and by combining transposon-insertion sequencing and transcriptomic data sets, we also identified several novel noncoding RNA species that contribute to V. cholerae growth. Our findings suggest that HMM-based approaches will enhance extraction of biological meaning from transposon-insertion sequencing genomic data. PMID:23901011
Meher, J K; Meher, P K; Dash, G N; Raval, M K
2012-01-01
The first step in gene identification problem based on genomic signal processing is to convert character strings into numerical sequences. These numerical sequences are then analysed spectrally or using digital filtering techniques for the period-3 peaks, which are present in exons (coding areas) and absent in introns (non-coding areas). In this paper, we have shown that single-indicator sequences can be generated by encoding schemes based on physico-chemical properties. Two new methods are proposed for generating single-indicator sequences based on hydration energy and dipole moments. The proposed methods produce high peak at exon locations and effectively suppress false exons (intron regions having greater peak than exon regions) resulting in high discriminating factor, sensitivity and specificity.
Rogan, P K; Schneider, T D
1995-01-01
Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
Pasi, Marco; Maddocks, John H.; Lavery, Richard
2015-01-01
Microsecond molecular dynamics simulations of B-DNA oligomers carried out in an aqueous environment with a physiological salt concentration enable us to perform a detailed analysis of how potassium ions interact with the double helix. The oligomers studied contain all 136 distinct tetranucleotides and we are thus able to make a comprehensive analysis of base sequence effects. Using a recently developed curvilinear helicoidal coordinate method we are able to analyze the details of ion populations and densities within the major and minor grooves and in the space surrounding DNA. The results show higher ion populations than have typically been observed in earlier studies and sequence effects that go beyond the nature of individual base pairs or base pair steps. We also show that, in some special cases, ion distributions converge very slowly and, on a microsecond timescale, do not reflect the symmetry of the corresponding base sequence. PMID:25662221
Sequence dependency of canonical base pair opening in the DNA double helix
Villa, Alessandra
2017-01-01
The flipping-out of a DNA base from the double helical structure is a key step of many cellular processes, such as DNA replication, modification and repair. Base pair opening is the first step of base flipping and the exact mechanism is still not well understood. We investigate sequence effects on base pair opening using extensive classical molecular dynamics simulations targeting the opening of 11 different canonical base pairs in two DNA sequences. Two popular biomolecular force fields are applied. To enhance sampling and calculate free energies, we bias the simulation along a simple distance coordinate using a newly developed adaptive sampling algorithm. The simulation is guided back and forth along the coordinate, allowing for multiple opening pathways. We compare the calculated free energies with those from an NMR study and check assumptions of the model used for interpreting the NMR data. Our results further show that the neighboring sequence is an important factor for the opening free energy, but also indicates that other sequence effects may play a role. All base pairs are observed to have a propensity for opening toward the major groove. The preferred opening base is cytosine for GC base pairs, while for AT there is sequence dependent competition between the two bases. For AT opening, we identify two non-canonical base pair interactions contributing to a local minimum in the free energy profile. For both AT and CG we observe long-lived interactions with water and with sodium ions at specific sites on the open base pair. PMID:28369121
Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min
2012-01-01
DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226
NASA Astrophysics Data System (ADS)
Hellwagner, Johannes; Sharma, Kshama; Tan, Kong Ooi; Wittmann, Johannes J.; Meier, Beat H.; Madhu, P. K.; Ernst, Matthias
2017-06-01
Pulse imperfections like pulse transients and radio-frequency field maladjustment or inhomogeneity are the main sources of performance degradation and limited reproducibility in solid-state nuclear magnetic resonance experiments. We quantitatively analyze the influence of such imperfections on the performance of symmetry-based pulse sequences and describe how they can be compensated. Based on a triple-mode Floquet analysis, we develop a theoretical description of symmetry-based dipolar recoupling sequences, in particular, R2 6411, calculating first- and second-order effective Hamiltonians using real pulse shapes. We discuss the various origins of effective fields, namely, pulse transients, deviation from the ideal flip angle, and fictitious fields, and develop strategies to counteract them for the restoration of full transfer efficiency. We compare experimental applications of transient-compensated pulses and an asynchronous implementation of the sequence to a supercycle, SR26, which is known to be efficient in compensating higher-order error terms. We are able to show the superiority of R26 compared to the supercycle, SR26, given the ability to reduce experimental error on the pulse sequence by pulse-transient compensation and a complete theoretical understanding of the sequence.
ERIC Educational Resources Information Center
Ipek, Ismail
2010-01-01
The purpose of this study was to investigate the effects of CBI lesson sequence type and cognitive style of field dependence on learning from Computer-Based Cooperative Instruction (CBCI) in WEB on the dependent measures, achievement, reading comprehension and reading rate. Eighty-seven college undergraduate students were randomly assigned to…
Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi
2018-03-20
Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics.
Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi
2018-01-01
Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics. PMID:29644010
Is the phonological similarity effect in working memory due to proactive interference?
Baddeley, Alan D; Hitch, Graham J; Quinlan, Philip T
2018-04-12
Immediate serial recall of verbal material is highly sensitive to impairment attributable to phonological similarity. Although this has traditionally been interpreted as a within-sequence similarity effect, Engle (2007) proposed an interpretation based on interference from prior sequences, a phenomenon analogous to that found in the Peterson short-term memory (STM) task. We use the method of serial reconstruction to test this in an experiment contrasting the standard paradigm in which successive sequences are drawn from the same set of phonologically similar or dissimilar words and one in which the vowel sound on which similarity is based is switched from trial to trial, a manipulation analogous to that producing release from PI in the Peterson task. A substantial similarity effect occurs under both conditions although there is a small advantage from switching across similar sequences. There is, however, no evidence for the suggestion that the similarity effect will be absent from the very first sequence tested. Our results support the within-sequence similarity rather than a between-list PI interpretation. Reasons for the contrast with the classic Peterson short-term forgetting task are briefly discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Suyama, Yoshihisa; Matsuki, Yu
2015-01-01
Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239
Research progress of plant population genomics based on high-throughput sequencing.
Wang, Yun-sheng
2016-08-01
Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
ParticleCall: A particle filter for base calling in next-generation sequencing systems
2012-01-01
Background Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data. Results In this paper, we consider Illumina’s sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina’s Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy. Conclusions The proposed ParticleCall provides more accurate calls than the Illumina’s base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling. ParticleCall is freely available at https://sourceforge.net/projects/particlecall. PMID:22776067
Cost-effectiveness of sequenced treatment of rheumatoid arthritis with targeted immune modulators.
Jansen, Jeroen P; Incerti, Devin; Mutebi, Alex; Peneva, Desi; MacEwan, Joanna P; Stolshek, Bradley; Kaur, Primal; Gharaibeh, Mahdi; Strand, Vibeke
2017-07-01
To determine the cost-effectiveness of treatment sequences of biologic disease-modifying anti-rheumatic drugs or Janus kinase/STAT pathway inhibitors (collectively referred to as bDMARDs) vs conventional DMARDs (cDMARDs) from the US societal perspective for treatment of patients with moderately to severely active rheumatoid arthritis (RA) with inadequate responses to cDMARDs. An individual patient simulation model was developed that assesses the impact of treatments on disease based on clinical trial data and real-world evidence. Treatment strategies included sequences starting with etanercept, adalimumab, certolizumab, or abatacept. Each of these treatment strategies was compared with cDMARDs. Incremental cost, incremental quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios (ICERs) were calculated for each treatment sequence relative to cDMARDs. The cost-effectiveness of each strategy was determined using a US willingness-to-pay (WTP) threshold of $150,000/QALY. For the base-case scenario, bDMARD treatment sequences were associated with greater treatment benefit (i.e. more QALYs), lower lost productivity costs, and greater treatment-related costs than cDMARDs. The expected ICERs for bDMARD sequences ranged from ∼$126,000 to $140,000 per QALY gained, which is below the US-specific WTP. Alternative scenarios examining the effects of homogeneous patients, dose increases, increased costs of hospitalization for severely physically impaired patients, and a lower baseline Health Assessment Questionnaire (HAQ) Disability Index score resulted in similar ICERs. bDMARD treatment sequences are cost-effective from a US societal perspective.
Small-target leak detection for a closed vessel via infrared image sequences
NASA Astrophysics Data System (ADS)
Zhao, Ling; Yang, Hongjiu
2017-03-01
This paper focus on a leak diagnosis and localization method based on infrared image sequences. Some problems on high probability of false warning and negative affect for marginal information are solved by leak detection. An experimental model is established for leak diagnosis and localization on infrared image sequences. The differential background prediction is presented to eliminate the negative affect of marginal information on test vessel based on a kernel regression method. A pipeline filter based on layering voting is designed to reduce probability of leak point false warning. A synthesize leak diagnosis and localization algorithm is proposed based on infrared image sequences. The effectiveness and potential are shown for developed techniques through experimental results.
A New Method for Setting Calculation Sequence of Directional Relay Protection in Multi-Loop Networks
NASA Astrophysics Data System (ADS)
Haijun, Xiong; Qi, Zhang
2016-08-01
Workload of relay protection setting calculation in multi-loop networks may be reduced effectively by optimization setting calculation sequences. A new method of setting calculation sequences of directional distance relay protection in multi-loop networks based on minimum broken nodes cost vector (MBNCV) was proposed to solve the problem experienced in current methods. Existing methods based on minimum breakpoint set (MBPS) lead to more break edges when untying the loops in dependent relationships of relays leading to possibly more iterative calculation workloads in setting calculations. A model driven approach based on behavior trees (BT) was presented to improve adaptability of similar problems. After extending the BT model by adding real-time system characters, timed BT was derived and the dependency relationship in multi-loop networks was then modeled. The model was translated into communication sequence process (CSP) models and an optimization setting calculation sequence in multi-loop networks was finally calculated by tools. A 5-nodes multi-loop network was applied as an example to demonstrate effectiveness of the modeling and calculation method. Several examples were then calculated with results indicating the method effectively reduces the number of forced broken edges for protection setting calculation in multi-loop networks.
Radioresistance of GGG Sequences to Prompt Strand Break Formation from Direct-Type Radiation Damage
Black, Paul J.; Miller, Adam S.; Hayes, Jeffrey J.
2016-01-01
Purpose As humans, we are constantly exposed to ionizing radiation from natural, man-made and cosmic sources which can damage DNA, leading to deleterious effects including cancer incidence. In this work we introduce a method to monitor strand breaks resulting from damage due to the direct effect of ionizing radiation and provide evidence for sequence-dependent effects leading to strand breaks. Materials and methods To analyze only DNA strand breaks caused by radiation damage due to the direct effect of ionizing radiation, we combined an established technique to generate dehydrated DNA samples with a technique to analyze single strand breaks on short oligonucleotide sequences via denaturing gel electrophoresis. Results We find that direct damage primarily results in a reduced number of strand breaks in guanine triplet regions (GGG) when compared to isolated guanine (G) bases with identical flanking base context. In addition, we observe strand break behavior possibly indicative of protection of guanine bases when flanked by pyrimidines, and sensitization of guanine to strand break when flanked by adenine (A) bases in both isolated G and GGG cases. Conclusions These observations provide insight into the strand break behavior in GGG regions damaged via the direct effect of ionizing radiation. In addition, this could be indicative of DNA sequences that are naturally more susceptible to strand break due to the direct effect of ionizing radiation. PMID:27349757
Sequence Effect on the Formation of DNA Minidumbbells.
Liu, Yuan; Lam, Sik Lok
2017-11-16
The DNA minidumbbell (MDB) is a recently identified non-B structure. The reported MDBs contain two TTTA, CCTG, or CTTG type II loops. At present, the knowledge and understanding of the sequence criteria for MDB formation are still limited. In this study, we performed a systematic high-resolution nuclear magnetic resonance (NMR) and native gel study to investigate the effect of sequence variations in tandem repeats on the formation of MDBs. Our NMR results reveal the importance of hydrogen bonds, base-base stacking, and hydrophobic interactions from each of the participating residues. We conclude that in the MDBs formed by tandem repeats, C-G loop-closing base pairs are more stabilizing than T-A loop-closing base pairs, and thymine residues in both the second and third loop positions are more stabilizing than cytosine residues. The results from this study enrich our knowledge on the sequence criteria for the formation of MDBs, paving a path for better exploring their potential roles in biological systems and DNA nanotechnology.
Effect of Base Sequence "Defects" on the Electrostatic Potential of Dissolved DNA
NASA Astrophysics Data System (ADS)
Adams, Scott V.; Wagner, Katrina; Kephart, Thomas S.; Edwards, Glenn
1997-11-01
An analytical model of the electrostatic potential surrounding dissolved DNA has been developed. The model consists of an all-atom, mathematically helical structure for DNA, in which the atoms are arranged in infinite lines of discrete point charges on concentric cylindrical surfaces. The surrounding solvent and counterions are treated with the Debye-Huckel approximation (Wagner et al., Biophysical Journal 73, 21-30, 1997). Variation in the electrostatic potential due to structural differences between A, B, and Z conformations and homopolymer base sequence is apparent. The most recent modification to the model exploits the principle of superposition to calculate the potential of DNA with a base sequence containing `defects.' That is, the base sequence is no longer uniform along the polymer. Differences between the potential of homopolymer DNA and the potential of DNA containing base `defects' are immediately obvious. These results may aid in understanding the role of electrostatics in base-sequence specificity exhibited by DNA-binding proteins.
Resolution of model Holliday junctions by yeast endonuclease: effect of DNA structure and sequence.
Parsons, C A; Murchie, A I; Lilley, D M; West, S C
1989-01-01
The resolution of Holliday junctions in DNA involves specific cleavage at or close to the site of the junction. A nuclease from Saccharomyces cerevisiae cleaves model Holliday junctions in vitro by the introduction of nicks in regions of duplex DNA adjacent to the crossover point. In previous studies [Parsons and West (1988) Cell, 52, 621-629] it was shown that cleavage occurred within homologous arm sequences with precise symmetry across the junction. In contrast, junctions with heterologous arm sequences were cleaved asymmetrically. In this work, we have studied the effect of sequence changes and base modification upon the site of cleavage. It is shown that the specificity of cleavage is unchanged providing that perfect homology is maintained between opposing arm sequences. However, in the absence of homology, cleavage depends upon sequence context and is affected by minor changes such as base modification. These data support the proposed mechanism for cleavage of a Holliday junction, which requires homologous alignment of arm sequences in an enzyme--DNA complex as a prerequisite for symmetrical cleavage by the yeast endonuclease. Images PMID:2653810
Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.
Fracassetti, Marco; Griffin, Philippa C; Willi, Yvonne
2015-01-01
Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-05-01
Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-01-01
Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
A Model of BGA Thermal Fatigue Life Prediction Considering Load Sequence Effects
Hu, Weiwei; Li, Yaqiu; Sun, Yufeng; Mosleh, Ali
2016-01-01
Accurate testing history data is necessary for all fatigue life prediction approaches, but such data is always deficient especially for the microelectronic devices. Additionally, the sequence of the individual load cycle plays an important role in physical fatigue damage. However, most of the existing models based on the linear damage accumulation rule ignore the sequence effects. This paper proposes a thermal fatigue life prediction model for ball grid array (BGA) packages to take into consideration the load sequence effects. For the purpose of improving the availability and accessibility of testing data, a new failure criterion is discussed and verified by simulation and experimentation. The consequences for the fatigue underlying sequence load conditions are shown. PMID:28773980
Mapping Base Modifications in DNA by Transverse-Current Sequencing
NASA Astrophysics Data System (ADS)
Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.
2018-02-01
Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.
Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.
1997-01-01
To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156
Sun, Beili; Zhou, Dongrui; Tu, Jing; Lu, Zuhong
2017-01-01
The characteristics of tongue coating are very important symbols for disease diagnosis in traditional Chinese medicine (TCM) theory. As a habitat of oral microbiota, bacteria on the tongue dorsum have been proved to be the cause of many oral diseases. The high-throughput next-generation sequencing (NGS) platforms have been widely applied in the analysis of bacterial 16S rRNA gene. We developed a methodology based on genus-specific multiprimer amplification and ligation-based sequencing for microbiota analysis. In order to validate the efficiency of the approach, we thoroughly analyzed six tongue coating samples from lung cancer patients with different TCM types, and more than 600 genera of bacteria were detected by this platform. The results showed that ligation-based parallel sequencing combined with enzyme digestion and multiamplification could expand the effective length of sequencing reads and could be applied in the microbiota analysis.
Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database
2017-01-01
Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799
Wang, Quan; Rothkopf, Constantin A; Triesch, Jochen
2017-08-01
The ability to learn sequential behaviors is a fundamental property of our brains. Yet a long stream of studies including recent experiments investigating motor sequence learning in adult human subjects have produced a number of puzzling and seemingly contradictory results. In particular, when subjects have to learn multiple action sequences, learning is sometimes impaired by proactive and retroactive interference effects. In other situations, however, learning is accelerated as reflected in facilitation and transfer effects. At present it is unclear what the underlying neural mechanism are that give rise to these diverse findings. Here we show that a recently developed recurrent neural network model readily reproduces this diverse set of findings. The self-organizing recurrent neural network (SORN) model is a network of recurrently connected threshold units that combines a simplified form of spike-timing dependent plasticity (STDP) with homeostatic plasticity mechanisms ensuring network stability, namely intrinsic plasticity (IP) and synaptic normalization (SN). When trained on sequence learning tasks modeled after recent experiments we find that it reproduces the full range of interference, facilitation, and transfer effects. We show how these effects are rooted in the network's changing internal representation of the different sequences across learning and how they depend on an interaction of training schedule and task similarity. Furthermore, since learning in the model is based on fundamental neuronal plasticity mechanisms, the model reveals how these plasticity mechanisms are ultimately responsible for the network's sequence learning abilities. In particular, we find that all three plasticity mechanisms are essential for the network to learn effective internal models of the different training sequences. This ability to form effective internal models is also the basis for the observed interference and facilitation effects. This suggests that STDP, IP, and SN may be the driving forces behind our ability to learn complex action sequences.
2000-08-01
4). Sequence recognition of all four DNA bases is achieved by positioning an N- methylimidazole opposite guanine or N-methylpyrrole opposite...unique sequences of DNA based upon selective binding motifs to all four DNA bases , although relatively little is known about the ability of these agents to
USDA-ARS?s Scientific Manuscript database
Genetic diversity is an essential resource for breeders to improve new cultivars with desirable characteristics. Recently genotyping-by-sequencing (GBS), a next generation sequencing (NGS) based technology that can simplify complex genomes, has been used as a high-throughput and cost-effective molec...
Crash sequence based risk matrix for motorcycle crashes.
Wu, Kun-Feng; Sasidharan, Lekshmi; Thor, Craig P; Chen, Sheng-Yin
2018-04-05
Considerable research has been conducted related to motorcycle and other powered-two-wheeler (PTW) crashes; however, it always has been controversial among practitioners concerning with types of crashes should be first targeted and how to prioritize resources for the implementation of mitigating actions. Therefore, there is a need to identify types of motorcycle crashes that constitute the greatest safety risk to riders - most frequent and most severe crashes. This pilot study seeks exhibit the efficacy of a new approach for prioritizing PTW crash causation sequences as they relate to injury severity to better inform the application of mitigating countermeasures. To accomplish this, the present study constructed a crash sequence-based risk matrix to identify most frequent and most severe motorcycle crashes in an attempt to better connect causes and countermeasures of PTW crashes. Although the frequency of each crash sequence can be computed from crash data, a crash severity model is needed to compare the levels of crash severity among different crash sequences, while controlling for other factors that also have effects on crash severity such drivers' age, use of helmet, etc. The construction of risk matrix based on crash sequences involve two tasks: formulation of crash sequence and the estimation of a mixed-effects (ME) model to adjust the levels of severities for each crash sequence to account for other crash contributing factors that would have an effect on the maximum level of crash severity in a crash. Three data elements from the National Automotive Sampling System - General Estimating System (NASS-GES) data were utilized to form a crash sequence: critical event, crash types, and sequence of events. A mixed-effects model was constructed to model the severity levels for each crash sequence while accounting for the effects of those crash contributing factors on crash severity. A total of 8039 crashes involving 8208 motorcycles occurred during 2011 and 2013 were included in this study, weighted to represent 338,655 motorcyclists involved in traffic crashes in three years (2011-2013)(NHTSA, 2013). The top five most frequent and severe types of crash sequences were identified, accounting for 23 percent of all the motorcycle crashes included in the study, and they are (1) run-off-road crashes on the right, and hitting roadside objects, (2) cross-median crashes, and rollover, (3) left-turn oncoming crashes, and head-on, (4) crossing over (passing through) or turning into opposite direction at intersections, and (5) side-impacted. In addition to crash sequences, several other factors were also identified to have effects on crash severity: use of helmet, presence of horizontal curves, alcohol consumption, road surface condition, roadway functional class, and nighttime condition. Copyright © 2018 Elsevier Ltd. All rights reserved.
You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng
2018-06-06
As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.
Thermodynamics-based models of transcriptional regulation with gene sequence.
Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing
2015-12-01
Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.
Zhang, Pin; Liang, Yanmei; Chang, Shengjiang; Fan, Hailun
2013-08-01
Accurate segmentation of renal tissues in abdominal computed tomography (CT) image sequences is an indispensable step for computer-aided diagnosis and pathology detection in clinical applications. In this study, the goal is to develop a radiology tool to extract renal tissues in CT sequences for the management of renal diagnosis and treatments. In this paper, the authors propose a new graph-cuts-based active contours model with an adaptive width of narrow band for kidney extraction in CT image sequences. Based on graph cuts and contextual continuity, the segmentation is carried out slice-by-slice. In the first stage, the middle two adjacent slices in a CT sequence are segmented interactively based on the graph cuts approach. Subsequently, the deformable contour evolves toward the renal boundaries by the proposed model for the kidney extraction of the remaining slices. In this model, the energy function combining boundary with regional information is optimized in the constructed graph and the adaptive search range is determined by contextual continuity and the object size. In addition, in order to reduce the complexity of the min-cut computation, the nodes in the graph only have n-links for fewer edges. The total 30 CT images sequences with normal and pathological renal tissues are used to evaluate the accuracy and effectiveness of our method. The experimental results reveal that the average dice similarity coefficient of these image sequences is from 92.37% to 95.71% and the corresponding standard deviation for each dataset is from 2.18% to 3.87%. In addition, the average automatic segmentation time for one kidney in each slice is about 0.36 s. Integrating the graph-cuts-based active contours model with contextual continuity, the algorithm takes advantages of energy minimization and the characteristics of image sequences. The proposed method achieves effective results for kidney segmentation in CT sequences.
Li, Zhongshan; Liu, Zhenwei; Jiang, Yi; Chen, Denghui; Ran, Xia; Sun, Zhong Sheng; Wu, Jinyu
2017-01-01
Exome sequencing has been widely used to identify the genetic variants underlying human genetic disorders for clinical diagnoses, but the identification of pathogenic sequence variants among the huge amounts of benign ones is complicated and challenging. Here, we describe a new Web server named mirVAFC for pathogenic sequence variants prioritizations from clinical exome sequencing (CES) variant data of single individual or family. The mirVAFC is able to comprehensively annotate sequence variants, filter out most irrelevant variants using custom criteria, classify variants into different categories as for estimated pathogenicity, and lastly provide pathogenic variants prioritizations based on classifications and mutation effects. Case studies using different types of datasets for different diseases from publication and our in-house data have revealed that mirVAFC can efficiently identify the right pathogenic candidates as in original work in each case. Overall, the Web server mirVAFC is specifically developed for pathogenic sequence variant identifications from family-based CES variants using classification-based prioritizations. The mirVAFC Web server is freely accessible at https://www.wzgenomics.cn/mirVAFC/. © 2016 WILEY PERIODICALS, INC.
Yang, Cheng-Hong; Wu, Kuo-Chuan; Chuang, Li-Yeh; Chang, Hsueh-Wei
2018-01-01
DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a r ibulose diphosphate carboxylase ( rbcL ) S NP b arcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
Manlig, Erika; Wahlberg, Per
2017-01-01
Abstract Sodium bisulphite treatment of DNA combined with next generation sequencing (NGS) is a powerful combination for the interrogation of genome-wide DNA methylation profiles. Library preparation for whole genome bisulphite sequencing (WGBS) is challenging due to side effects of the bisulphite treatment, which leads to extensive DNA damage. Recently, a new generation of methods for bisulphite sequencing library preparation have been devised. They are based on initial bisulphite treatment of the DNA, followed by adaptor tagging of single stranded DNA fragments, and enable WGBS using low quantities of input DNA. In this study, we present a novel approach for quick and cost effective WGBS library preparation that is based on splinted adaptor tagging (SPLAT) of bisulphite-converted single-stranded DNA. Moreover, we validate SPLAT against three commercially available WGBS library preparation techniques, two of which are based on bisulphite treatment prior to adaptor tagging and one is a conventional WGBS method. PMID:27899585
Park, Sun-Kyeong; Park, Seung-Hoo; Lee, Min-Young; Park, Ji-Hyun; Jeong, Jae-Hong; Lee, Eui-Kyung
2016-11-01
In south Korea, the price of biologics has been decreasing owing to patent expiration and the availability of biosimilars. This study evaluated the cost-effectiveness of a treatment strategy initiated with etanercept (ETN) compared with leflunomide (LFN) after a 30% reduction in the medication cost of ETN in patients with active rheumatoid arthritis (RA) with an inadequate response to methotrexate (MTX-IR). A cohort-based Markov model was designed to evaluate the lifetime cost-effectiveness of treatment sequence initiated with ETN (A) compared with 2 sequences initiated with LFN: LFN-ETN sequence (B) and LFN sequence (C). Patients transited through the treatment sequences, which consisted of sequential biologics and palliative therapy, based on American College of Rheumatology (ACR) responses and the probability of discontinuation. A systematic literature review and a network meta-analysis were conducted to estimate ACR responses to ETN and LFN. Utility was estimated by mapping an equation for converting the Health Assessment Questionnaire-Disability Index score to utility weight. The costs comprised medications, outpatient visits, administration, dispensing, monitoring, palliative therapy, and treatment for adverse events. A subanalysis was conducted to identify the influence of the ETN price reduction compared with the unreduced price, and sensitivity analyses explored the uncertainty of model parameters and assumptions. The ETN sequence (A) was associated with higher costs and a gain in quality-adjusted life years (QALYs) compared with both sequences initiated with LFN (B, C) throughout the lifetime of patients with RA and MTX-IR. The incremental cost-effectiveness ratio (ICER) for strategy A versus B was ₩13,965,825 (US$1726) per QALY and that for strategy A versus C was ₩9,587,983 (US$8050) per QALY. The results indicated that strategy A was cost-effective based on the commonly cited ICER threshold of ₩20,000,000 (US$16,793) per QALY in South Korea. The robustness of the base-case analysis was confirmed using sensitivity analyses. When the unreduced medication cost of ETN was applied in a subanalysis, the ICER for strategy A versus B was ₩20,909,572 (US$17,556) per QALY and that for strategy A versus C was ₩22,334,713 (US$18,753) per QALY. This study indicated that a treatment strategy initiated with ETN was more cost-effective in patients with active RA and MTX-IR than 2 sequences initiated with LFN. The results also indicate that the reduced price of ETN affected the cost-effectiveness associated with its earlier use. Copyright © 2016 Elsevier HS Journals, Inc. All rights reserved.
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
Fazakerley, G V; Quignard, E; Teoule, R; Guy, A; Guschlbauer, W
1987-09-15
We report two-dimensional NOE (NOESY) spectra on the sequence d(GCGATCATGG).d(CCATGATCGC) which contains the unmethylated dam site. As expected the DNA adopts a B-form conformation but appears to be distorted at the TG step of the second strand. This distorsion, probably bending, is not seen on the opposite strand. When the first strand is methylated on adenine in the GATC or CATG sequence the NOESY spectra indicate little or no change in the conformation. However the single strand-duplex exchange is slowed down to the slow-exchange region on a proton NMR time scale. We have assigned the exchangeable imino and cytidine amino resonances of the three duplexes. From the imino linewidths as a function of temperature, we observe that the unmethylated and the hemimethylated Gm6ATC duplexes melt normally from the ends. However, this is not so for the hemimethylated Cm6ATG duplex which, apart from the terminal base pairs, melts cooperatively and at higher temperature. In spectra recorded in H2O a second duplex is observed, for the Gm6ATC sequence, which we have not been able to identify. It is however unlikely to be a hairpin structure. Ultraviolet-melting curves also indicate the presence of two transitions for this duplex. The effect of methylation upon base-pair lifetimes has been studied by comparing the above three duplexes. Little effect is observed upon methylation in the GATC sequence but a drastic increase in the lifetimes of all base pairs is observed upon methylation in the CATG sequence.
DArT Markers Effectively Target Gene Space in the Rye Genome
Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna
2016-01-01
Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625
DArT Markers Effectively Target Gene Space in the Rye Genome.
Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna
2016-01-01
Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.
NASA Astrophysics Data System (ADS)
Upton, Brianna; Evans, John; Morrow, Cherilynn; Thoms, Brian
2009-11-01
Previous studies have shown that many students have misconceptions about basic concepts in physics. Moreover, it has been concluded that one of the challenges lies in the teaching methodology. To address this, Georgia State University has begun teaching studio algebra-based physics. Although many institutions have implemented studio physics, most have done so in calculus-based sequences. The effectiveness of the studio approach in an algebra-based introductory physics course needs further investigation. A 3-semester study assessing the effectiveness of studio physics in an algebra-based physics sequence has been performed. This study compares the results of student pre- and post-tests using the Force Concept Inventory. Using the results from this assessment tool, we will discuss the effectiveness of the studio approach to teaching physics at GSU.
Historical feature pattern extraction based network attack situation sensing algorithm.
Zeng, Yong; Liu, Dacheng; Lei, Zhou
2014-01-01
The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.
Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm
Zeng, Yong; Liu, Dacheng; Lei, Zhou
2014-01-01
The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously. PMID:24892054
DNA/RNA transverse current sequencing: intrinsic structural noise from neighboring bases
Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.
2015-01-01
Nanopore DNA sequencing via transverse current has emerged as a promising candidate for third-generation sequencing technology. It produces long read lengths which could alleviate problems with assembly errors inherent in current technologies. However, the high error rates of nanopore sequencing have to be addressed. A very important source of the error is the intrinsic noise in the current arising from carrier dispersion along the chain of the molecule, i.e., from the influence of neighboring bases. In this work we perform calculations of the transverse current within an effective multi-orbital tight-binding model derived from first-principles calculations of the DNA/RNA molecules, to study the effect of this structural noise on the error rates in DNA/RNA sequencing via transverse current in nanopores. We demonstrate that a statistical technique, utilizing not only the currents through the nucleotides but also the correlations in the currents, can in principle reduce the error rate below any desired precision. PMID:26150827
ERIC Educational Resources Information Center
Druey, Michel D.
2014-01-01
In many task-switch studies, task sequence and response sequence interact: Response repetitions produce benefits when the task repeats but produce costs when the task switches. Four different theoretical frameworks have been proposed to explain these effects: a reconfiguration-based account, association-learning models, an episodic-retrieval…
Zhang, Haitao; Wu, Chenxue; Chen, Zewei; Liu, Zhao; Zhu, Yunhong
2017-01-01
Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non-sensitive rules variation ratios with the traditional spatial-temporal k-anonymity method. Furthermore, we also found the performance variation tendency from the parameter K value, which can help achieve the goal of hiding the maximum number of original sensitive rules while generating a minimum of new sensitive rules and affecting a minimum number of non-sensitive rules.
Wu, Chenxue; Liu, Zhao; Zhu, Yunhong
2017-01-01
Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non-sensitive rules variation ratios with the traditional spatial-temporal k-anonymity method. Furthermore, we also found the performance variation tendency from the parameter K value, which can help achieve the goal of hiding the maximum number of original sensitive rules while generating a minimum of new sensitive rules and affecting a minimum number of non-sensitive rules. PMID:28767687
Design of nucleic acid sequences for DNA computing based on a thermodynamic approach
Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma
2005-01-01
We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762
Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.
Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat
2014-05-23
The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.
Jiang, Haojun; Xie, Yifan; Li, Xuchao; Ge, Huijuan; Deng, Yongqiang; Mu, Haofang; Feng, Xiaoli; Yin, Lu; Du, Zhou; Chen, Fang; He, Nongyue
2016-01-01
Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels in order to verify the performance in clinical cases. Combining targeted deep sequencing of selective SNP and informative bioinformatics pipeline, we calculated the combined paternity index (CPI) of 17 cases to determine paternity. Sequencing-based NIPAT results fully agreed with invasive prenatal paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future.
The Impact of Different Instructions on Vietnamese EFL Students' Acquisition of Formulaic Sequences
ERIC Educational Resources Information Center
Ha, Do Thi
2017-01-01
The paper explores how various teaching methods, namely Phonology-Based Instruction (PBI) and Translation-Based Instruction (TBI), have an effect on students' acquisition of formulaic sequences. 20 multiword expressions were taught to 48 Vietnamese EFL students from three intact classes as 2 treatment groups (PBI and TBI) and 1 control group.…
Xu, Weijia; Ozer, Stuart; Gutell, Robin R
2009-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.
Xu, Weijia; Ozer, Stuart; Gutell, Robin R.
2010-01-01
With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534
Kono, H; Saven, J G
2001-02-23
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
Mismatch and G-Stack Modulated Probe Signals on SNP Microarrays
Binder, Hans; Fasold, Mario; Glomb, Torsten
2009-01-01
Background Single nucleotide polymorphism (SNP) arrays are important tools widely used for genotyping and copy number estimation. This technology utilizes the specific affinity of fragmented DNA for binding to surface-attached oligonucleotide DNA probes. We analyze the variability of the probe signals of Affymetrix GeneChip SNP arrays as a function of the probe sequence to identify relevant sequence motifs which potentially cause systematic biases of genotyping and copy number estimates. Methodology/Principal Findings The probe design of GeneChip SNP arrays enables us to disentangle different sources of intensity modulations such as the number of mismatches per duplex, matched and mismatched base pairings including nearest and next-nearest neighbors and their position along the probe sequence. The effect of probe sequence was estimated in terms of triple-motifs with central matches and mismatches which include all 256 combinations of possible base pairings. The probe/target interactions on the chip can be decomposed into nearest neighbor contributions which correlate well with free energy terms of DNA/DNA-interactions in solution. The effect of mismatches is about twice as large as that of canonical pairings. Runs of guanines (G) and the particular type of mismatched pairings formed in cross-allelic probe/target duplexes constitute sources of systematic biases of the probe signals with consequences for genotyping and copy number estimates. The poly-G effect seems to be related to the crowded arrangement of probes which facilitates complex formation of neighboring probes with at minimum three adjacent G's in their sequence. Conclusions The applied method of “triple-averaging” represents a model-free approach to estimate the mean intensity contributions of different sequence motifs which can be applied in calibration algorithms to correct signal values for sequence effects. Rules for appropriate sequence corrections are suggested. PMID:19924253
Su, Jiao; Zhang, Haijie; Jiang, Bingying; Zheng, Huzhi; Chai, Yaqin; Yuan, Ruo; Xiang, Yun
2011-11-15
We report an ultrasensitive electrochemical approach for the detection of uropathogen sequence-specific DNA target. The sensing strategy involves a dual signal amplification process, which combines the signal enhancement by the enzymatic target recycling technique with the sensitivity improvement by the quantum dot (QD) layer-by-layer (LBL) assembled labels. The enzyme-based catalytic target DNA recycling process results in the use of each target DNA sequence for multiple times and leads to direct amplification of the analytical signal. Moreover, the LBL assembled QD labels can further enhance the sensitivity of the sensing system. The coupling of these two effective signal amplification strategies thus leads to low femtomolar (5fM) detection of the target DNA sequences. The proposed strategy also shows excellent discrimination between the target DNA and the single-base mismatch sequences. The advantageous intrinsic sequence-independent property of exonuclease III over other sequence-dependent enzymes makes our new dual signal amplification system a general sensing platform for monitoring ultralow level of various types of target DNA sequences. Copyright © 2011 Elsevier B.V. All rights reserved.
Targeted therapy according to next generation sequencing-based panel sequencing.
Saito, Motonobu; Momma, Tomoyuki; Kono, Koji
2018-04-17
Targeted therapy against actionable gene mutations shows a significantly higher response rate as well as longer survival compared to conventional chemotherapy, and has become a standard therapy for many cancers. Recent progress in next-generation sequencing (NGS) has enabled to identify huge number of genetic aberrations. Based on sequencing results, patients recommend to undergo targeted therapy or immunotherapy. In cases where there are no available approved drugs for the genetic mutations detected in the patients, it is recommended to be facilitate the registration for the clinical trials. For that purpose, a NGS-based sequencing panel that can simultaneously target multiple genes in a single investigation has been used in daily clinical practice. To date, various types of sequencing panels have been developed to investigate genetic aberrations with tumor somatic genome variants (gain-of-function or loss-of-function mutations, high-level copy number alterations, and gene fusions) through comprehensive bioinformatics. Because sequencing panels are efficient and cost-effective, they are quickly being adopted outside the lab, in hospitals and clinics, in order to identify personal targeted therapy for individual cancer patients.
Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D
2015-07-07
Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Osmylated DNA, a novel concept for sequencing DNA using nanopores
NASA Astrophysics Data System (ADS)
Kanavarioti, Anastassia
2015-03-01
Saenger sequencing has led the advances in molecular biology, while faster and cheaper next generation technologies are urgently needed. A newer approach exploits nanopores, natural or solid-state, set in an electrical field, and obtains base sequence information from current variations due to the passage of a ssDNA molecule through the pore. A hurdle in this approach is the fact that the four bases are chemically comparable to each other which leads to small differences in current obstruction. ‘Base calling’ becomes even more challenging because most nanopores sense a short sequence and not individual bases. Perhaps sequencing DNA via nanopores would be more manageable, if only the bases were two, and chemically very different from each other; a sequence of 1s and 0s comes to mind. Osmylated DNA comes close to such a sequence of 1s and 0s. Osmylation is the addition of osmium tetroxide bipyridine across the C5-C6 double bond of the pyrimidines. Osmylation adds almost 400% mass to the reactive base, creates a sterically and electronically notably different molecule, labeled 1, compared to the unreactive purines, labeled 0. If osmylated DNA were successfully sequenced, the result would be a sequence of osmylated pyrimidines (1), and purines (0), and not of the actual nucleobases. To solve this problem we studied the osmylation reaction with short oligos and with M13mp18, a long ssDNA, developed a UV-vis assay to measure extent of osmylation, and designed two protocols. Protocol A uses mild conditions and yields osmylated thymidines (1), while leaving the other three bases (0) practically intact. Protocol B uses harsher conditions and effectively osmylates both pyrimidines, but not the purines. Applying these two protocols also to the complementary of the target polynucleotide yields a total of four osmylated strands that collectively could define the actual base sequence of the target DNA.
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.
Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia
2017-03-14
Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Transcriptome-Based Differentiation of Closely-Related Miscanthus Lines
Chouvarine, Philippe; Cooksey, Amanda M.; McCarthy, Fiona M.; ...
2012-01-10
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthusmore » (Miscanthus6giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations."« less
Porter, Teresita M.; Golding, G. Brian
2012-01-01
Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215
NASA Astrophysics Data System (ADS)
Jian, Le; Cao, Wang; Jintao, Yang; Yinge, Wang
2018-04-01
This paper describes the design of a dynamic voltage restorer (DVR) that can simultaneously protect several sensitive loads from voltage sags in a region of an MV distribution network. A novel reference voltage calculation method based on zero-sequence voltage optimisation is proposed for this DVR to optimise cost-effectiveness in compensation of voltage sags with different characteristics in an ungrounded neutral system. Based on a detailed analysis of the characteristics of voltage sags caused by different types of faults and the effect of the wiring mode of the transformer on these characteristics, the optimisation target of the reference voltage calculation is presented with several constraints. The reference voltages under all types of voltage sags are calculated by optimising the zero-sequence component, which can reduce the degree of swell in the phase-to-ground voltage after compensation to the maximum extent and can improve the symmetry degree of the output voltages of the DVR, thereby effectively increasing the compensation ability. The validity and effectiveness of the proposed method are verified by simulation and experimental results.
Inferring Short-Range Linkage Information from Sequencing Chromatograms
Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas
2013-01-01
Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502
Zadoff-Chu sequence-based hitless ranging scheme for OFDMA-PON configured 5G fronthaul uplinks
NASA Astrophysics Data System (ADS)
Reza, Ahmed Galib; Rhee, June-Koo Kevin
2017-05-01
A Zadoff-Chu (ZC) sequence-based low-complexity hitless upstream time synchronization scheme is proposed for an orthogonal frequency division multiple access passive optical network configured cloud radio access network fronthaul. The algorithm is based on gradual loading of the ZC sequences, where the phase discontinuity due to the cyclic prefix is alleviated by a frequency domain phase precoder, eliminating the requirements of guard bands to mitigate intersymbol interference and inter-carrier interference. Simulation results for uncontrolled-wavelength asynchronous transmissions from four concurrent transmitting optical network units are presented to demonstrate the effectiveness of the proposed scheme.
ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
2012-01-01
Background Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration. PMID:22946927
Novel methodologies for spectral classification of exon and intron sequences
NASA Astrophysics Data System (ADS)
Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.
2012-12-01
Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
Dean, Kimberly M; Grayhack, Elizabeth J
2012-12-01
We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.
A pulsed magnetic stress applied to Drosophila melanogaster flies
NASA Astrophysics Data System (ADS)
Delle Side, D.; Bozzetti, M. P.; Friscini, A.; Giuffreda, E.; Nassisi, V.; Specchia, V.; Velardi, L.
2014-04-01
We report the development of a system to feed pulsed magnetic stress to biological samples. The device is based on a RLC circuit that transforms the energy stored in a high voltage capacitor into a magnetic field inside a coil. The field has been characterized and we found that charging the capacitor with 24 kV results in a peak field of 0.4 T. In order to test its effect, we applied such a stress to the Drosophila melanogaster model and we examined its bio-effects. We analysed, in the germ cells, the effects on the control of specific DNA repetitive sequences that are activated after different environmental stresses. The deregulation of these sequences causes genomic instability and chromosomes breaks leading to sterility. The magnetic field treatment did not produce effects on repetitive sequences in the germ cells of Drosophila. Hence, this field doesn't produce deleterious effects linked to repetitive sequences derepression.
Bergman, C M; Kreitman, M
2001-08-01
Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.
Sequencing Cyclic Peptides by Multistage Mass Spectrometry
Mohimani, Hosein; Yang, Yu-Liang; Liu, Wei-Ting; Hsieh, Pei-Wen; Dorrestein, Pieter C.; Pevzner, Pavel A.
2012-01-01
Some of the most effective antibiotics (e.g., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. While hundreds of biomedically important cyclic peptides have been sequenced, the computational techniques for sequencing cyclic peptides are still in their infancy. Previous methods for sequencing peptide antibiotics and other cyclic peptides are based on Nuclear Magnetic Resonance spectroscopy, and require large amount (miligrams) of purified materials that, for most compounds, are not possible to obtain. Recently, development of mass spectrometry based methods has provided some hope for accurate sequencing of cyclic peptides using picograms of materials. In this paper we develop a method for sequencing of cyclic peptides by multistage mass spectrometry, and show its advantages over single stage mass spectrometry. The method is tested on known and new cyclic peptides from Bacillus brevis, Dianthus superbus and Streptomyces griseus, as well as a new family of cyclic peptides produced by marine bacteria. PMID:21751357
Charge transport through DNA based electronic barriers
NASA Astrophysics Data System (ADS)
Patil, Sunil R.; Chawda, Vivek; Qi, Jianqing; Anantram, M. P.; Sinha, Niraj
2018-05-01
We report charge transport in electronic 'barriers' constructed by sequence engineering in DNA. Considering the ionization potentials of Thymine-Adenine (AT) and Guanine-Cytosine (GC) base pairs, we treat AT as 'barriers'. The effect of DNA conformation (A and B form) on charge transport is also investigated. Particularly, the effect of width of 'barriers' on hole transport is investigated. Density functional theory (DFT) calculations are performed on energy minimized DNA structures to obtain the electronic Hamiltonian. The quantum transport calculations are performed using the Landauer-Buttiker framework. Our main findings are contrary to previous studies. We find that a longer A-DNA with more AT base pairs can conduct better than shorter A-DNA with a smaller number of AT base pairs. We also find that some sequences of A-DNA can conduct better than a corresponding B-DNA with the same sequence. The counterions mediated charge transport and long range interactions are speculated to be responsible for counter-intuitive length and AT content dependence of conductance of A-DNA.
Computational-Model-Based Analysis of Context Effects on Harmonic Expectancy.
Morimoto, Satoshi; Remijn, Gerard B; Nakajima, Yoshitaka
2016-01-01
Expectancy for an upcoming musical chord, harmonic expectancy, is supposedly based on automatic activation of tonal knowledge. Since previous studies implicitly relied on interpretations based on Western music theory, the underlying computational processes involved in harmonic expectancy and how it relates to tonality need further clarification. In particular, short chord sequences which cannot lead to unique keys are difficult to interpret in music theory. In this study, we examined effects of preceding chords on harmonic expectancy from a computational perspective, using stochastic modeling. We conducted a behavioral experiment, in which participants listened to short chord sequences and evaluated the subjective relatedness of the last chord to the preceding ones. Based on these judgments, we built stochastic models of the computational process underlying harmonic expectancy. Following this, we compared the explanatory power of the models. Our results imply that, even when listening to short chord sequences, internally constructed and updated tonal assumptions determine the expectancy of the upcoming chord.
Computational-Model-Based Analysis of Context Effects on Harmonic Expectancy
Morimoto, Satoshi; Remijn, Gerard B.; Nakajima, Yoshitaka
2016-01-01
Expectancy for an upcoming musical chord, harmonic expectancy, is supposedly based on automatic activation of tonal knowledge. Since previous studies implicitly relied on interpretations based on Western music theory, the underlying computational processes involved in harmonic expectancy and how it relates to tonality need further clarification. In particular, short chord sequences which cannot lead to unique keys are difficult to interpret in music theory. In this study, we examined effects of preceding chords on harmonic expectancy from a computational perspective, using stochastic modeling. We conducted a behavioral experiment, in which participants listened to short chord sequences and evaluated the subjective relatedness of the last chord to the preceding ones. Based on these judgments, we built stochastic models of the computational process underlying harmonic expectancy. Following this, we compared the explanatory power of the models. Our results imply that, even when listening to short chord sequences, internally constructed and updated tonal assumptions determine the expectancy of the upcoming chord. PMID:27003807
Lossy compression of quality scores in genomic data.
Cánovas, Rodrigo; Moffat, Alistair; Turpin, Andrew
2014-08-01
Next-generation sequencing technologies are revolutionizing medicine. Data from sequencing technologies are typically represented as a string of bases, an associated sequence of per-base quality scores and other metadata, and in aggregate can require a large amount of space. The quality scores show how accurate the bases are with respect to the sequencing process, that is, how confident the sequencer is of having called them correctly, and are the largest component in datasets in which they are retained. Previous research has examined how to store sequences of bases effectively; here we add to that knowledge by examining methods for compressing quality scores. The quality values originate in a continuous domain, and so if a fidelity criterion is introduced, it is possible to introduce flexibility in the way these values are represented, allowing lossy compression over the quality score data. We present existing compression options for quality score data, and then introduce two new lossy techniques. Experiments measuring the trade-off between compression ratio and information loss are reported, including quantifying the effect of lossy representations on a downstream application that carries out single nucleotide polymorphism and insert/deletion detection. The new methods are demonstrably superior to other techniques when assessed against the spectrum of possible trade-offs between storage required and fidelity of representation. An implementation of the methods described here is available at https://github.com/rcanovas/libCSAM. rcanovas@student.unimelb.edu.au Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Research and Implementation of Tibetan Word Segmentation Based on Syllable Methods
NASA Astrophysics Data System (ADS)
Jiang, Jing; Li, Yachao; Jiang, Tao; Yu, Hongzhi
2018-03-01
Tibetan word segmentation (TWS) is an important problem in Tibetan information processing, while abbreviated word recognition is one of the key and most difficult problems in TWS. Most of the existing methods of Tibetan abbreviated word recognition are rule-based approaches, which need vocabulary support. In this paper, we propose a method based on sequence tagging model for abbreviated word recognition, and then implement in TWS systems with sequence labeling models. The experimental results show that our abbreviated word recognition method is fast and effective and can be combined easily with the segmentation model. This significantly increases the effect of the Tibetan word segmentation.
Molecular Structure and Sequence in Complex Coacervates
NASA Astrophysics Data System (ADS)
Sing, Charles; Lytle, Tyler; Madinya, Jason; Radhakrishna, Mithun
Oppositely-charged polyelectrolytes in aqueous solution can undergo associative phase separation, in a process known as complex coacervation. This results in a polyelectrolyte-dense phase (coacervate) and polyelectrolyte-dilute phase (supernatant). There remain challenges in understanding this process, despite a long history in polymer physics. We use Monte Carlo simulation to demonstrate that molecular features (charge spacing, size) play a crucial role in governing the equilibrium in coacervates. We show how these molecular features give rise to strong monomer sequence effects, due to a combination of counterion condensation and correlation effects. We distinguish between structural and sequence-based correlations, which can be designed to tune the phase diagram of coacervation. Sequence effects further inform the physical understanding of coacervation, and provide the basis for new coacervation models that take monomer-level features into account.
The Release 6 reference sequence of the Drosophila melanogaster genome
Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.; ...
2015-01-14
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less
The Release 6 reference sequence of the Drosophila melanogaster genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.
Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less
Compressive sensing method for recognizing cat-eye effect targets.
Li, Li; Li, Hui; Dang, Ersheng; Liu, Bo
2013-10-01
This paper proposes a cat-eye effect target recognition method with compressive sensing (CS) and presents a recognition method (sample processing before reconstruction based on compressed sensing, or SPCS) for image processing. In this method, the linear projections of original image sequences are applied to remove dynamic background distractions and extract cat-eye effect targets. Furthermore, the corresponding imaging mechanism for acquiring active and passive image sequences is put forward. This method uses fewer images to recognize cat-eye effect targets, reduces data storage, and translates the traditional target identification, based on original image processing, into measurement vectors processing. The experimental results show that the SPCS method is feasible and superior to the shape-frequency dual criteria method.
Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning
Xu, Shengzhi; Cheng, Xiang; Li, Zhengyi; Xiong, Li
2016-01-01
In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy. PMID:26973430
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS
Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei
2013-01-01
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.
Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei
2013-07-31
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.
Initial retrieval sequence and blending strategy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pemwell, D.L.; Grenard, C.E.
1996-09-01
This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.
De Lillo, Carlo; Kirby, Melissa; Poole, Daniel
2016-01-01
Immediate serial spatial recall measures the ability to retain sequences of locations in short-term memory and is considered the spatial equivalent of digit span. It is tested by requiring participants to reproduce sequences of movements performed by an experimenter or displayed on a monitor. Different organizational factors dramatically affect serial spatial recall but they are often confounded or underspecified. Untangling them is crucial for the characterization of working-memory models and for establishing the contribution of structure and memory capacity to spatial span. We report five experiments assessing the relative role and independence of factors that have been reported in the literature. Experiment 1 disentangled the effects of spatial clustering and path-length by manipulating the distance of items displayed on a touchscreen monitor. Long-path sequences segregated by spatial clusters were compared with short-path sequences not segregated by clusters. Recall was more accurate for sequences segregated by clusters independently from path-length. Experiment 2 featured conditions where temporal pauses were introduced between or within cluster boundaries during the presentation of sequences with the same paths. Thus, the temporal structure of the sequences was either consistent or inconsistent with a hierarchical representation based on segmentation by spatial clusters but the effect of structure could not be confounded with effects of path-characteristics. Pauses at cluster boundaries yielded more accurate recall, as predicted by a hierarchical model. In Experiment 3, the systematic manipulation of sequence structure, path-length, and presence of path-crossings of sequences showed that structure explained most of the variance, followed by the presence/absence of path-crossings, and path-length. Experiments 4 and 5 replicated the results of the previous experiments in immersive virtual reality navigation tasks where the viewpoint of the observer changed dynamically during encoding and recall. This suggested that the effects of structure in spatial span are not dependent on perceptual grouping processes induced by the aerial view of the stimulus array typically afforded by spatial recall tasks. These results demonstrate the independence of coding strategies based on structure from effects of path characteristics and perceptual grouping in immediate serial spatial recall. PMID:27891101
Indexcov: fast coverage quality control for whole-genome sequencing.
Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E; Quinlan, Aaron R
2017-11-01
The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license. © The Authors 2017. Published by Oxford University Press.
An efficient study design to test parent-of-origin effects in family trios.
Yu, Xiaobo; Chen, Gao; Feng, Rui
2017-11-01
Increasing evidence has shown that genes may cause prenatal, neonatal, and pediatric diseases depending on their parental origins. Statistical models that incorporate parent-of-origin effects (POEs) can improve the power of detecting disease-associated genes and help explain the missing heritability of diseases. In many studies, children have been sequenced for genome-wide association testing. But it may become unaffordable to sequence their parents and evaluate POEs. Motivated by the reality, we proposed a budget-friendly study design of sequencing children and only genotyping their parents through single nucleotide polymorphism array. We developed a powerful likelihood-based method, which takes into account both sequence reads and linkage disequilibrium to infer the parental origins of children's alleles and estimate their POEs on the outcome. We evaluated the performance of our proposed method and compared it with an existing method using only genotypes, through extensive simulations. Our method showed higher power than the genotype-based method. When either the mean read depth or the pair-end length was reasonably large, our method achieved ideal power. When single parents' genotypes were unavailable or parental genotypes at the testing locus were not typed, both methods lost power compared with when complete data were available; but the power loss from our method was smaller than the genotype-based method. We also extended our method to accommodate mixed genotype, low-, and high-coverage sequence data from children and their parents. At presence of sequence errors, low-coverage parental sequence data may lead to lower power than parental genotype data. © 2017 WILEY PERIODICALS, INC.
Elder, Robert M; Jayaraman, Arthi
2013-10-10
Gene therapy relies on the delivery of DNA into cells, and polycations are one class of vectors enabling efficient DNA delivery. Nuclear localization sequences (NLS), cationic oligopeptides that target molecules for nuclear entry, can be incorporated into polycations to improve their gene delivery efficiency. We use simulations to study the effect of peptide chemistry and sequence on the DNA-binding behavior of NLS-grafted polycations by systematically mutating the residues in the grafts, which are based on the SV40 NLS (peptide sequence PKKKRKV). Replacing arginine (R) with lysine (K) reduces binding strength by eliminating arginine-DNA interactions, but placing R in a less hindered location (e.g., farther from the grafting point to the polycation backbone) has surprisingly little effect on polycation-DNA binding strength. Changing the positions of the hydrophobic proline (P) and valine (V) residues relative to the polycation backbone changes hydrophobic aggregation within the polycation and, consequently, changes the conformational entropy loss that occurs upon polycation-DNA binding. Since conformational entropy loss affects the free energy of binding, the positions of P and V in the grafts affect DNA binding affinity. The insight from this work guides synthesis of polycations with tailored DNA binding affinity and, in turn, efficient DNA delivery.
He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z
2013-12-04
Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Wu, Jiaxin; Li, Yanda; Jiang, Rui
2014-03-01
Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.
Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants
Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.
2015-01-01
Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016
Pilotte, Nils; Papaiakovou, Marina; Grant, Jessica R; Bierwert, Lou Ann; Llewellyn, Stacey; McCarthy, James S; Williams, Steven A
2016-03-01
The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays. Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis). Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay. The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
2013-01-01
Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810
Pre-Attentive Auditory Processing of Lexicality
ERIC Educational Resources Information Center
Jacobsen, Thomas; Horvath, Janos; Schroger, Erich; Lattner, Sonja; Widmann, Andreas; Winkler, Istvan
2004-01-01
The effects of lexicality on auditory change detection based on auditory sensory memory representations were investigated by presenting oddball sequences of repeatedly presented stimuli, while participants ignored the auditory stimuli. In a cross-linguistic study of Hungarian and German participants, stimulus sequences were composed of words that…
NASA Astrophysics Data System (ADS)
Beeman-Cadwallader, Nicole
In 2007 Pioneer High School, a public school in Whittier, California changed the sequence of its science courses from the Traditional Biology-Chemistry-Physics (B-C-P) to Biology-Physics-Chemistry (B-P-C), or "Physics Second." The California Standards Tests (CSTs) scores in Physics and Chemistry from 2004-2012 were used to determine if there were any effects of the Physics Second sequencing on student achievement in those courses. The data was also used to determine whether the Physics Second sequence had an effect on performance in Physics and Chemistry based on gender. Independent t tests and chi-square analysis of the data determined an improvement in student performance in Chemistry but not Physics. The 2x2 Factorial ANOVA analysis revealed that in Physics male students performed better on the CSTs than their female peers. In Chemistry, it was noted that male and female students performed equally well. Neither finding was a result ofthe change to the "Physics Second" sequencing.
Synthesis and triplex forming properties of pyrimidine derivative containing extended functionality.
Gianolio, D A; McLaughlin, L W
1999-08-01
Two pyrimidine nucleosides have been synthesized containing extended hydrogen bonding functionality. In one case the side chain is based upon semicarbazide and in the second monoacetylated carbohydrazide was employed. DNA sequences could be prepared using both analogue nucleosides in a reverse coupling protocol, and provided that the normal capping step was eliminated and that the iodine-based oxidizing solution was replaced with one based upon 10-camphorsulfonyl oxaziridine. Both derivatives exhibited moderate effects in targeting selectively C-G base pairs embedded within a polypurine target sequence.
Structured prediction models for RNN based sequence labeling in clinical text.
Jagannatha, Abhyuday N; Yu, Hong
2016-11-01
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Structured prediction models for RNN based sequence labeling in clinical text
Jagannatha, Abhyuday N; Yu, Hong
2016-01-01
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040
NASA Astrophysics Data System (ADS)
Du, Mao-Kang; He, Bo; Wang, Yong
2011-01-01
Recently, the cryptosystem based on chaos has attracted much attention. Wang and Yu (Commun. Nonlin. Sci. Numer. Simulat. 14 (2009) 574) proposed a block encryption algorithm based on dynamic sequences of multiple chaotic systems. We analyze the potential flaws in the algorithm. Then, a chosen-plaintext attack is presented. Some remedial measures are suggested to avoid the flaws effectively. Furthermore, an improved encryption algorithm is proposed to resist the attacks and to keep all the merits of the original cryptosystem.
2012-01-01
Background In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Results Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. Conclusions We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species. PMID:22805587
Yang, Huaan; Tao, Ye; Zheng, Zequn; Li, Chengdao; Sweetingham, Mark W; Howieson, John G
2012-07-17
In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.
Wu, Howard G.
2013-01-01
The planning of goal-directed movements is highly adaptable; however, the basic mechanisms underlying this adaptability are not well understood. Even the features of movement that drive adaptation are hotly debated, with some studies suggesting remapping of goal locations and others suggesting remapping of the movement vectors leading to goal locations. However, several previous motor learning studies and the multiplicity of the neural coding underlying visually guided reaching movements stand in contrast to this either/or debate on the modes of motor planning and adaptation. Here we hypothesize that, during visuomotor learning, the target location and movement vector of trained movements are separately remapped, and we propose a novel computational model for how motor plans based on these remappings are combined during the control of visually guided reaching in humans. To test this hypothesis, we designed a set of experimental manipulations that effectively dissociated the effects of remapping goal location and movement vector by examining the transfer of visuomotor adaptation to untrained movements and movement sequences throughout the workspace. The results reveal that (1) motor adaptation differentially remaps goal locations and movement vectors, and (2) separate motor plans based on these features are effectively averaged during motor execution. We then show that, without any free parameters, the computational model we developed for combining movement-vector-based and goal-location-based planning predicts nearly 90% of the variance in novel movement sequences, even when multiple attributes are simultaneously adapted, demonstrating for the first time the ability to predict how motor adaptation affects movement sequence planning. PMID:23804099
Extracting flat-field images from scene-based image sequences using phase correlation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caron, James N., E-mail: Caron@RSImd.com; Montes, Marcos J.; Obermark, Jerome L.
Flat-field image processing is an essential step in producing high-quality and radiometrically calibrated images. Flat-fielding corrects for variations in the gain of focal plane array electronics and unequal illumination from the system optics. Typically, a flat-field image is captured by imaging a radiometrically uniform surface. The flat-field image is normalized and removed from the images. There are circumstances, such as with remote sensing, where a flat-field image cannot be acquired in this manner. For these cases, we developed a phase-correlation method that allows the extraction of an effective flat-field image from a sequence of scene-based displaced images. The method usesmore » sub-pixel phase correlation image registration to align the sequence to estimate the static scene. The scene is removed from sequence producing a sequence of misaligned flat-field images. An average flat-field image is derived from the realigned flat-field sequence.« less
ERIC Educational Resources Information Center
Kahle, Jane Butler
Four audio-tutorial units were developed as part of this study to determine the effectiveness of the use of advanced organizers, based on Ausubel's theories, for meaningful learning experiences. In this study an advanced organizer was developed and given to half of the subjects prior to the instructional sequence. A series of micro-learning tasks,…
Ren, Jie; Song, Kai; Deng, Minghua; Reinert, Gesine; Cannon, Charles H; Sun, Fengzhu
2016-04-01
Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution ,: using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results ,: and that the clustering results that use a N: MC of the estimated order give a plausible clustering of the species. Our implementation of the statistics developed here is available as R package 'NGS.MC' at http://www-rcf.usc.edu/∼fsun/Programs/NGS-MC/NGS-MC.html fsun@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.
Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
2018-07-01
Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.
Study design requirements for RNA sequencing-based breast cancer diagnostics.
Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias
2016-02-01
Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.
Genetic and DNA sequence analysis of the kanamycin resistance transposon Tn903.
Grindley, N D; Joyce, C M
1980-01-01
The kanamycin resistance transposon Tn903 consists of a unique region of about 1000 base pairs bounded by a pair of 1050-base-pair inverted repeat sequences. Each repeat contains two Pvu II endonuclease cleavage sites separated by 520 base pairs. We have constructed derivatives of Tn903 in which this 520-base-pair fragment is deleted from one or both repeats. Those derivatives that lack both 520-base-pair fragments cannot transpose, whereas those that lack just one remain transposition proficient. One such transposable derivative, Tn903 delta I, has been selected for further study. We have determined the sequence of the intact inverted repeat. The 18 base pairs at each end are identical and inverted relative to one another, a structure characteristic of insertion sequences. Additional experiments indicate that a single inverted repeat from Tn903 can, in fact, transpose; we propose that this element be called IS903. To correlate the DNA sequence with genetic activities, we have created mutations by inserting a 10-base-pair DNA fragment at several sites within the intact repeat of Tn903 delta 1, and we have examined the effect of such insertions on transposability. The results suggest that IS903 encodes a 307-amino-acid polypeptide (a "transposase") that is absolutely required for transposition of IS903 or Tn903. Images PMID:6261245
Effect of the reflectional symmetry on the coherent hole transport across DNA hairpins
NASA Astrophysics Data System (ADS)
Zarea, Mehdi; Berlin, Yuri; Ratner, Mark A.
2017-03-01
The coherent hole transfer in three types of DNA hairpins containing strands with adenine (A) and guanine (G) nucleobases has been studied. The investigated hairpins involve An+1GGAn, AnGAGAn, or (AG)2nA strands that connect the hole donor and hole acceptor located on opposite ends of hairpins. The positive charge transfer from the photo-excited donor to the acceptor is shown to be slower for An+1GGAn in comparison with AnGAGAn and (AG)2nA sequences. We have revealed that this is due to the reflectional symmetry of the last two sequences with respect to the axis passing through the middle base. As has been demonstrated, the symmetry of the sequence structure manifests itself in the reflectional symmetry of the energy eigenstates. In addition, it has been shown that (AG)2nA is the only symmetric sequence with a zero energy state in the middle of the LUMO tight-binding energy band. Based on our theoretical findings, we predict that the hairpin with this sequence should have the fastest coherent hole transfer rate among the class of base sequences studied.
Streaming support for data intensive cloud-based sequence analysis.
Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
An effective solution to the nonlinear, nonstationary Navier-Stokes equations for two dimensions
NASA Technical Reports Server (NTRS)
Gabrielsen, R. E.
1975-01-01
A sequence of approximate solutions for the nonlinear, nonstationary Navier-Stokes equations for a two-dimensional domain, from which explicit error estimates and rates of convergence are obtained, is described. This sequence of approximate solutions is based primarily on the Newton-Kantorovich method.
ERIC Educational Resources Information Center
Fan, Xinxin; Geelan, David; Gillies, Robyn
2018-01-01
This study investigated the effectiveness of a novel inquiry-based instructional sequence using interactive simulations for supporting students' development of conceptual understanding, inquiry process skills and confidence in learning. The study, conducted in Beijing, involved two teachers and 117 students in four classes. The teachers…
Procedural Memory Consolidation in the Performance of Brief Keyboard Sequences
ERIC Educational Resources Information Center
Duke, Robert A.; Davis, Carla M.
2006-01-01
Using two sequential key press sequences, we tested the extent to which subjects' performance on a digital piano keyboard changed between the end of training and retest on subsequent days. We found consistent, significant improvements attributable to sleep-based consolidation effects, indicating that learning continued after the cessation of…
Identifying and mitigating batch effects in whole genome sequencing data.
Tom, Jennifer A; Reeder, Jens; Forrest, William F; Graham, Robert R; Hunkapiller, Julie; Behrens, Timothy W; Bhangale, Tushar R
2017-07-24
Large sample sets of whole genome sequencing with deep coverage are being generated, however assembling datasets from different sources inevitably introduces batch effects. These batch effects are not well understood and can be due to changes in the sequencing protocol or bioinformatics tools used to process the data. No systematic algorithms or heuristics exist to detect and filter batch effects or remove associations impacted by batch effects in whole genome sequencing data. We describe key quality metrics, provide a freely available software package to compute them, and demonstrate that identification of batch effects is aided by principal components analysis of these metrics. To mitigate batch effects, we developed new site-specific filters that identified and removed variants that falsely associated with the phenotype due to batch effect. These include filtering based on: a haplotype based genotype correction, a differential genotype quality test, and removing sites with missing genotype rate greater than 30% after setting genotypes with quality scores less than 20 to missing. This method removed 96.1% of unconfirmed genome-wide significant SNP associations and 97.6% of unconfirmed genome-wide significant indel associations. We performed analyses to demonstrate that: 1) These filters impacted variants known to be disease associated as 2 out of 16 confirmed associations in an AMD candidate SNP analysis were filtered, representing a reduction in power of 12.5%, 2) In the absence of batch effects, these filters removed only a small proportion of variants across the genome (type I error rate of 3%), and 3) in an independent dataset, the method removed 90.2% of unconfirmed genome-wide SNP associations and 89.8% of unconfirmed genome-wide indel associations. Researchers currently do not have effective tools to identify and mitigate batch effects in whole genome sequencing data. We developed and validated methods and filters to address this deficiency.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Deroost, Natacha; Coomans, Daphné
2018-02-01
We examined the role of sequence awareness in a pure perceptual sequence learning design. Participants had to react to the target's colour that changed according to a perceptual sequence. By varying the mapping of the target's colour onto the response keys, motor responses changed randomly. The effect of sequence awareness on perceptual sequence learning was determined by manipulating the learning instructions (explicit versus implicit) and assessing the amount of sequence awareness after the experiment. In the explicit instruction condition (n = 15), participants were instructed to intentionally search for the colour sequence, whereas in the implicit instruction condition (n = 15), they were left uninformed about the sequenced nature of the task. Sequence awareness after the sequence learning task was tested by means of a questionnaire and the process-dissociation-procedure. The results showed that the instruction manipulation had no effect on the amount of perceptual sequence learning. Based on their report to have actively applied their sequence knowledge during the experiment, participants were subsequently regrouped in a sequence strategy group (n = 14, of which 4 participants from the implicit instruction condition and 10 participants from the explicit instruction condition) and a no-sequence strategy group (n = 16, of which 11 participants from the implicit instruction condition and 5 participants from the explicit instruction condition). Only participants of the sequence strategy group showed reliable perceptual sequence learning and sequence awareness. These results indicate that perceptual sequence learning depends upon the continuous employment of strategic cognitive control processes on sequence knowledge. Sequence awareness is suggested to be a necessary but not sufficient condition for perceptual learning to take place. Copyright © 2018 Elsevier B.V. All rights reserved.
Transcriptome-based differentiation of closely-related Miscanthus lines.
Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G
2012-01-01
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.
Skeleton-based human action recognition using multiple sequence alignment
NASA Astrophysics Data System (ADS)
Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong
2015-05-01
Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.
Effect of Varied Computer Based Presentation Sequences on Facilitating Student Achievement.
ERIC Educational Resources Information Center
Noonen, Ann; Dwyer, Francis M.
1994-01-01
Examines the effectiveness of visual illustrations in computer-based education, the effect of order of visual presentation, and whether screen design affects students' use of graphics and text. Results indicate that order of presentation and choice of review did not influence student achievement; however, when given a choice, students selected the…
Ewing's Sarcoma: Development of RNA Interference-Based Therapy for Advanced Disease
Simmons, Olivia; Maples, Phillip B.; Senzer, Neil; Nemunaitis, John
2012-01-01
Ewing's sarcoma tumors are associated with chromosomal translocation between the EWS gene and the ETS transcription factor gene. These unique target sequences provide opportunity for RNA interference(i)-based therapy. A summary of RNAi mechanism and therapeutically designed products including siRNA, shRNA and bi-shRNA are described. Comparison is made between each of these approaches. Systemic RNAi-based therapy, however, requires protected delivery to the Ewing's sarcoma tumor site for activity. Delivery systems which have been most effective in preclinical and clinical testing are reviewed, followed by preclinical assessment of various silencing strategies with demonstration of effectiveness to EWS/FLI-1 target sequences. It is concluded that RNAi-based therapeutics may have testable and achievable activity in management of Ewing's sarcoma. PMID:22523703
Cross-correlation patterns in social opinion formation with sequential data
NASA Astrophysics Data System (ADS)
Chakrabarti, Anindya S.
2016-11-01
Recent research on large-scale internet data suggests existence of patterns in the collective behavior of billions of people even though each of them may pursue own activities. In this paper, we interpret online rating activity as a process of forming social opinion about individual items, where people sequentially choose a rating based on the current information set comprising all previous ratings and own preferences. We construct an opinion index from the sequence of ratings and we show that (1) movie-specific opinion converges much slower than an independent and identically distributed (i.i.d.) sequence of ratings, (2) rating sequence for individual movies shows lesser variation compared to an i.i.d. sequence of ratings, (3) the probability density function of the asymptotic opinions has more spread than that defined over opinion arising from i.i.d. sequence of ratings, (4) opinion sequences across movies are correlated with significantly higher and lower correlation compared to opinion constructed from i.i.d. sequence of ratings, creating a bimodal cross-correlation structure. By decomposing the temporal correlation structures from panel data of movie ratings, we show that the social effects are very prominent whereas group effects cannot be differentiated from those of surrogate data and individual effects are quite small. The former explains a large part of extreme positive or negative correlations between sequences of opinions. In general, this method can be applied to any rating data to extract social or group-specific effects in correlation structures. We conclude that in this particular case, social effects are important in opinion formation process.
Chen, Hui; Luthra, Rajyalakshmi; Goswami, Rashmi S; Singh, Rajesh R; Roy-Chowdhuri, Sinchita
2015-08-28
Application of next-generation sequencing (NGS) technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM) (Life Technologies), a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
Iterative pass optimization of sequence data
NASA Technical Reports Server (NTRS)
Wheeler, Ward C.
2003-01-01
The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
DNA Sequence-Dependent Ionic Currents in Ultra-Small Solid-State Nanopores†
Comer, Jeffrey
2016-01-01
Measurements of ionic currents through nanopores partially blocked by DNA have emerged as a powerful method for characterization of the DNA nucleotide sequence. Although the effect of the nucleotide sequence on the nanopore blockade current has been experimentally demonstrated, prediction and interpretation of such measurements remain a formidable challenge. Using atomic resolution computational approaches, here we show how the sequence, molecular conformation, and pore geometry affect the blockade ionic current in model solid-state nanopores. We demonstrate that the blockade current from a DNA molecule is determined by the chemical identities and conformations of at least three consecutive nucleotides. We find the blockade currents produced by the nucleotide triplets to vary considerably with their nucleotide sequence despite having nearly identical molecular conformations. Encouragingly, we find blockade current differences as large as 25% for single-base substitutions in ultra small (1.6 nm × 1.1 nm cross section; 2 nm length) solid-state nanopores. Despite the complex dependence of the blockade current on the sequence and conformation of the DNA triplets, we find that, under many conditions, the number of thymine bases is positively correlated with the current, whereas the number of purine bases and the presence of both purine and pyrimidines in the triplet are negatively correlated with the current. Based on these observations, we construct a simple theoretical model that relates the ion current to the base content of a solid-state nanopore. Furthermore, we show that compact conformations of DNA in narrow pores provide the greatest signal-to-noise ratio for single base detection, whereas reduction of the nanopore length increases the ionic current noise. Thus, the sequence dependence of nanopore blockade current can be theoretically rationalized, although the predictions will likely need to be customized for each nanopore type. PMID:27103233
Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W
2011-01-01
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.
2011-01-01
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
Proteomic Identification of Monoclonal Antibodies from Serum
2015-01-01
Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between “true” and “false” identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases. PMID:24684310
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.
Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S
2012-01-01
RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.
KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.
Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki
2013-07-09
The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with useful annotation information with easy-to-use web interfaces, which helps researchers to efficiently search for target sequences such as insect resistance-related genes. KONAGAbase will be continuously updated and additional genomic/transcriptomic resources and analysis tools will be provided for further efficient analysis of the mechanism of insecticide resistance and the development of effective insecticides with a novel mode of action for DBM.
msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.
Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James
2018-02-01
Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).
Pastor, D; Amaya, W; García-Olcina, R; Sales, S
2007-07-01
We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.
GASP: Gapped Ancestral Sequence Prediction for proteins
Edwards, Richard J; Shields, Denis C
2004-01-01
Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
NASA Astrophysics Data System (ADS)
Choudhary, Kuldeep; Kumar, Santosh
2017-05-01
The application of electro-optic effect in lithium-niobate-based Mach-Zehnder interferometer to design a 3-bit optical pseudorandom binary sequence (PRBS) generator has been proposed, which is characterized by its simplicity of generation and stability. The proposed device is optoelectronic in nature. The PBRS generator is immensely applicable for pattern generation, encryption, and coding applications in optical networks. The study is carried out by simulating the proposed device with beam propagation method.
A Concise History of School-Based Smoking Prevention Research: A Pendulum Effect Case Study
ERIC Educational Resources Information Center
Sussman, Steve; Black, David S.; Rohrbach, Louise A.
2010-01-01
School-based cigarette smoking prevention was initiated shortly after the first Surgeon General's Report in 1964. This article highlights a sequence of events by which school-based tobacco use prevention research developed as a science, and illustrates a pendulum effect, with confidence in tobacco use prevention increasing and decreasing at…
USDA-ARS?s Scientific Manuscript database
In recent years, next generation sequencing (NGS) based bulked segregant analysis (BSA) has become a powerful approach for allele discovery in non-model plant species. However, challenges remain, particular for out-crossing species with complex genomes. Here, the genetic control of a weeping bran...
The Use of Sequence and Synthesis for Teaching Concepts.
ERIC Educational Resources Information Center
Frey, Linda; Reigeluth, Charles M.
In order to investigate the effects of sequence and synthesis in the teaching of taxonomically-related concepts, a study was conducted in which 27 students from Syracuse University were asked to examine printed instructions dealing with kinds of sailboats and then to respond to a test based on those instructions. The synthesizing structure…
NASA Technical Reports Server (NTRS)
Adeleye, Sanya; Chung, Christopher
2006-01-01
Commercial aircraft undergo a significant number of maintenance and logistical activities during the turnaround operation at the departure gate. By analyzing the sequencing of these activities, more effective turnaround contingency plans may be developed for logistical and maintenance disruptions. Turnaround contingency plans are particularly important as any kind of delay in a hub based system may cascade into further delays with subsequent connections. The contingency sequencing of the maintenance and logistical turnaround activities were analyzed using a combined network and computer simulation modeling approach. Experimental analysis of both current and alternative policies provides a framework to aid in more effective tactical decision making.
Reproducibility and quantitation of amplicon sequencing-based detection
Zhou, Jizhong; Wu, Liyou; Deng, Ye; Zhi, Xiaoyang; Jiang, Yi-Huei; Tu, Qichao; Xie, Jianping; Van Nostrand, Joy D; He, Zhili; Yang, Yunfeng
2011-01-01
To determine the reproducibility and quantitation of the amplicon sequencing-based detection approach for analyzing microbial community structure, a total of 24 microbial communities from a long-term global change experimental site were examined. Genomic DNA obtained from each community was used to amplify 16S rRNA genes with two or three barcode tags as technical replicates in the presence of a small quantity (0.1% wt/wt) of genomic DNA from Shewanella oneidensis MR-1 as the control. The technical reproducibility of the amplicon sequencing-based detection approach is quite low, with an average operational taxonomic unit (OTU) overlap of 17.2%±2.3% between two technical replicates, and 8.2%±2.3% among three technical replicates, which is most likely due to problems associated with random sampling processes. Such variations in technical replicates could have substantial effects on estimating β-diversity but less on α-diversity. A high variation was also observed in the control across different samples (for example, 66.7-fold for the forward primer), suggesting that the amplicon sequencing-based detection approach could not be quantitative. In addition, various strategies were examined to improve the comparability of amplicon sequencing data, such as increasing biological replicates, and removing singleton sequences and less-representative OTUs across biological replicates. Finally, as expected, various statistical analyses with preprocessed experimental data revealed clear differences in the composition and structure of microbial communities between warming and non-warming, or between clipping and non-clipping. Taken together, these results suggest that amplicon sequencing-based detection is useful in analyzing microbial community structure even though it is not reproducible and quantitative. However, great caution should be taken in experimental design and data interpretation when the amplicon sequencing-based detection approach is used for quantitative analysis of the β-diversity of microbial communities. PMID:21346791
Perceptions of randomness in binary sequences: Normative, heuristic, or both?
Reimers, Stian; Donkin, Chris; Le Pelley, Mike E
2018-03-01
When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH)
Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos
2016-01-01
Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.
The “Naked Coral” Hypothesis Revisited – Evidence for and Against Scleractinian Monophyly
Forêt, Sylvain; Huttley, Gavin; Miller, David J.; Chen, Chaolun Allen
2014-01-01
The relationship between Scleractinia and Corallimorpharia, Orders within Anthozoa distinguished by the presence of an aragonite skeleton in the former, is controversial. Although classically considered distinct groups, some phylogenetic analyses have placed the Corallimorpharia within a larger Scleractinia/Corallimorpharia clade, leading to the suggestion that the Corallimorpharia are “naked corals” that arose via skeleton loss during the Cretaceous from a Scleractinian ancestor. Scleractinian paraphyly is, however, contradicted by a number of recent phylogenetic studies based on mt nucleotide (nt) sequence data. Whereas the “naked coral” hypothesis was based on analysis of the sequences of proteins encoded by a relatively small number of mt genomes, here a much-expanded dataset was used to reinvestigate hexacorallian phylogeny. The initial observation was that, whereas analyses based on nt data support scleractinian monophyly, those based on amino acid (aa) data support the “naked coral” hypothesis, irrespective of the method and with very strong support. To better understand the bases of these contrasting results, the effects of systematic errors were examined. Compared to other hexacorallians, the mt genomes of “Robust” corals have a higher (A+T) content, codon usage is far more constrained, and the proteins that they encode have a markedly higher phenylalanine content, leading us to suggest that mt DNA repair may be impaired in this lineage. Thus the “naked coral” topology could be caused by high levels of saturation in these mitochondrial sequences, long-branch effects or model violations. The equivocal results of these extensive analyses highlight the fundamental problems of basing coral phylogeny on mitochondrial sequence data. PMID:24740380
Becságh, Péter; Szakács, Orsolya
2014-10-01
During diagnostic workflow when detecting sequence alterations, sometimes it is important to design an algorithm that includes screening and direct tests in combination. Normally the use of direct test, which is mainly sequencing, is limited. There is an increased need for effective screening tests, with "closed tube" during the whole process and therefore decreasing the risk of PCR product contamination. The aim of this study was to design such a closed tube, detection probe based screening assay to detect different kind of sequence alterations in the exon 11 of the human c-kit gene region. Inside this region there are variable possible deletions and single nucleotide changes. During assay setup, more probe chemistry formats were screened and tested. After some optimization steps the taqman probe format was selected.
Compression of next-generation sequencing reads aided by highly efficient de novo assembly
Jones, Daniel C.; Ruzzo, Walter L.; Peng, Xinxia
2012-01-01
We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. PMID:22904078
Spink, N; Brown, D G; Skelly, J V; Neidle, S
1994-01-01
The bis-benzimidazole drug Hoechst 33258 has been co-crystallized with the dodecanucleotide sequence d(CGCAAATTTGCG)2. The structure has been solved by molecular replacement and refined to an R factor of 18.5% for 2125 reflections collected on a Xentronics area detector. The drug is bound in the minor groove, at the five base-pair site 5'-ATTTG and is in a unique orientation. This is displaced by one base pair in the 5' direction compared to previously-determined structures of this drug with the sequence d(CGCGAATTCGCG)2. Reasons for this difference in behaviour are discussed in terms of several sequence-dependent structural features of the DNA, with particular reference to differences in propeller twist and minor-groove width. Images PMID:7515488
Novel method for high-throughput colony PCR screening in nanoliter-reactors
Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin
2009-01-01
We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448
Slice profile effects in 2D slice-selective MRI of hyperpolarized nuclei.
Deppe, Martin H; Teh, Kevin; Parra-Robles, Juan; Lee, Kuan J; Wild, Jim M
2010-02-01
This work explores slice profile effects in 2D slice-selective gradient-echo MRI of hyperpolarized nuclei. Two different sequences were investigated: a Spoiled Gradient Echo sequence with variable flip angle (SPGR-VFA) and a balanced Steady-State Free Precession (SSFP) sequence. It is shown that in SPGR-VFA the distribution of flip angles across the slice present in any realistically shaped radiofrequency (RF) pulse leads to large excess signal from the slice edges in later RF views, which results in an undesired non-constant total transverse magnetization, potentially exceeding the initial value by almost 300% for the last RF pulse. A method to reduce this unwanted effect is demonstrated, based on dynamic scaling of the slice selection gradient. SSFP sequences with small to moderate flip angles (<40 degrees ) are also shown to preserve the slice profile better than the most commonly used SPGR sequence with constant flip angle (SPGR-CFA). For higher flip angles, the slice profile in SSFP evolves in a manner similar to SPGR-CFA, with depletion of polarization in the center of the slice. Copyright 2009 Elsevier Inc. All rights reserved.
Directed evolution of the TALE N-terminal domain for recognition of all 5' bases.
Lamb, Brian M; Mercer, Andrew C; Barbas, Carlos F
2013-11-01
Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5'-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5' T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5' T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes.
Using Next Generation Sequencing for Multiplexed Trait-Linked Markers in Wheat
Bernardo, Amy; Wang, Shan; St. Amand, Paul; Bai, Guihua
2015-01-01
With the advent of next generation sequencing (NGS) technologies, single nucleotide polymorphisms (SNPs) have become the major type of marker for genotyping in many crops. However, the availability of SNP markers for important traits of bread wheat ( Triticum aestivum L.) that can be effectively used in marker-assisted selection (MAS) is still limited and SNP assays for MAS are usually uniplex. A shift from uniplex to multiplex assays will allow the simultaneous analysis of multiple markers and increase MAS efficiency. We designed 33 locus-specific markers from SNP or indel-based marker sequences that linked to 20 different quantitative trait loci (QTL) or genes of agronomic importance in wheat and analyzed the amplicon sequences using an Ion Torrent Proton Sequencer and a custom allele detection pipeline to determine the genotypes of 24 selected germplasm accessions. Among the 33 markers, 27 were successfully multiplexed and 23 had 100% SNP call rates. Results from analysis of "kompetitive allele-specific PCR" (KASP) and sequence tagged site (STS) markers developed from the same loci fully verified the genotype calls of 23 markers. The NGS-based multiplexed assay developed in this study is suitable for rapid and high-throughput screening of SNPs and some indel-based markers in wheat. PMID:26625271
Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica
2016-02-18
The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through the graphical user interface ( http://compbio.math.hr/ ). Our results show that scanning with a carefully parameterized motif-HMM is an effective approach for annotation of protein families with low sequence similarity and conserved motifs. The results of this study expand current knowledge and provide new insights into the evolution of the large GDSL-lipase family in land plants.
Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu
2016-10-01
Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Sherwood, R.; Mutz, D.; Estlin, T.; Chien, S.; Backes, P.; Norris, J.; Tran, D.; Cooper, B.; Rabideau, G.; Mishkin, A.; Maxwell, S.
2001-07-01
This article discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from high-level science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This artificial intelligence (AI)-based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules. An automated planning and scheduling system encodes rover design knowledge and uses search and reasoning techniques to automatically generate low-level command sequences while respecting rover operability constraints, science and engineering preferences, environmental predictions, and also adhering to hard temporal constraints. This prototype planning system has been field-tested using the Rocky 7 rover at JPL and will be field-tested on more complex rovers to prove its effectiveness before transferring the technology to flight operations for an upcoming NASA mission. Enabling goal-driven commanding of planetary rovers greatly reduces the requirements for highly skilled rover engineering personnel. This in turn greatly reduces mission operations costs. In addition, goal-driven commanding permits a faster response to changes in rover state (e.g., faults) or science discoveries by removing the time-consuming manual sequence validation process, allowing rapid "what-if" analyses, and thus reducing overall cycle times.
A statistical method for the detection of variants from next-generation resequencing of DNA pools.
Bansal, Vikas
2010-06-15
Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
ERIC Educational Resources Information Center
Flores, Margaret M.; Hinton, Vanessa; Strozier, Shaunita D.
2014-01-01
Based on Common Core Standards (2010), mathematics interventions should emphasize conceptual understanding of numbers and operations as well as fluency. For students at risk for failure, the concrete-representational-abstract (CRA) sequence and the Strategic Instruction Model (SIM) have been shown effective in teaching computation with an emphasis…
ERIC Educational Resources Information Center
Kazakoff, Elizabeth R.; Sullivan, Amanda; Bers, Marina U.
2013-01-01
This paper examines the impact of programming robots on sequencing ability during a 1-week intensive robotics workshop at an early childhood STEM magnet school in the Harlem area of New York City. Children participated in computer programming activities using a developmentally appropriate tangible programming language CHERP, specifically designed…
Automated constraint checking of spacecraft command sequences
NASA Astrophysics Data System (ADS)
Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Spitale, Joseph M.; Le, Dang
1995-01-01
Robotic spacecraft are controlled by onboard sets of commands called "sequences." Determining that sequences will have the desired effect on the spacecraft can be expensive in terms of both labor and computer coding time, with different particular costs for different types of spacecraft. Specification languages and appropriate user interface to the languages can be used to make the most effective use of engineering validation time. This paper describes one specification and verification environment ("SAVE") designed for validating that command sequences have not violated any flight rules. This SAVE system was subsequently adapted for flight use on the TOPEX/Poseidon spacecraft. The relationship of this work to rule-based artificial intelligence and to other specification techniques is discussed, as well as the issues that arise in the transfer of technology from a research prototype to a full flight system.
Streaming Support for Data Intensive Cloud-Based Sequence Analysis
Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461
Bentley, Amy R.; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P.; Zhou, Jie; Huang, Hanxia; Mullikin, James C.; Blakesley, Robert W.; Hansen, Nancy F.; Bouffard, Gerard G.; Cherukuri, Praveen F.; Maskeri, Baishali; Young, Alice C.; Adeyemo, Adebowale; Rotimi, Charles N.
2014-01-01
Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA. PMID:24603370
DNA extraction for streamlined metagenomics of diverse environmental samples.
Marotz, Clarisse; Amir, Amnon; Humphrey, Greg; Gaffney, James; Gogul, Grant; Knight, Rob
2017-06-01
A major bottleneck for metagenomic sequencing is rapid and efficient DNA extraction. Here, we compare the extraction efficiencies of three magnetic bead-based platforms (KingFisher, epMotion, and Tecan) to a standardized column-based extraction platform across a variety of sample types, including feces, oral, skin, soil, and water. Replicate sample plates were extracted and prepared for 16S rRNA gene amplicon sequencing in parallel to assess extraction bias and DNA quality. The data demonstrate that any effect of extraction method on sequencing results was small compared with the variability across samples; however, the KingFisher platform produced the largest number of high-quality reads in the shortest amount of time. Based on these results, we have identified an extraction pipeline that dramatically reduces sample processing time without sacrificing bacterial taxonomic or abundance information.
Nöth, Ulrike; Laufs, Helmut; Stoermer, Robert; Deichmann, Ralf
2012-03-01
To describe heating effects to be expected in simultaneous electroencephalography (EEG) and magnetic resonance imaging (MRI) when deviating from the EEG manufacturer's instructions; to test which anatomical MRI sequences have a sufficiently low specific absorption rate (SAR) to be performed with the EEG equipment in place; and to suggest precautions to reduce the risk of heating. Heating was determined in vivo below eight EEG electrodes, using both head and body coil transmission and sequences covering the whole range of SAR values. Head transmit coil: temperature increases were below 2.2°C for low SAR sequences, but reached 4.6°C (one subject, clavicle) for high SAR sequences; the equilibrium temperature T(eq) remained below 39°C. Body transmit coil: temperature increases were higher and more frequent over subjects and electrodes, with values below 2.6°C for low SAR sequences, reaching 6.9°C for high SAR sequences (T8 electrode) with T(eq) exceeding a critical level of 40°C. Anatomical imaging should be based on T1-weighted sequences (FLASH, MPRAGE, MDEFT) with an SAR below values for functional MRI sequences based on gradient echo planar imaging. Anatomical sequences with a high SAR can pose a significant risk, which is reduced by using head coil transmission. Copyright © 2011 Wiley-Liss, Inc.
Sequence-Dependent Elasticity and Electrostatics of Single-Stranded DNA: Signatures of Base-Stacking
McIntosh, Dustin B.; Duggan, Gina; Gouil, Quentin; Saleh, Omar A.
2014-01-01
Base-stacking is a key factor in the energetics that determines nucleic acid structure. We measure the tensile response of single-stranded DNA as a function of sequence and monovalent salt concentration to examine the effects of base-stacking on the mechanical and thermodynamic properties of single-stranded DNA. By comparing the elastic response of highly stacked poly(dA) and that of a polypyrimidine sequence with minimal stacking, we find that base-stacking in poly(dA) significantly enhances the polymer’s rigidity. The unstacking transition of poly(dA) at high force reveals that the intrinsic electrostatic tension on the molecule varies significantly more weakly on salt concentration than mean-field predictions. Further, we provide a model-independent estimate of the free energy difference between stacked poly(dA) and unstacked polypyrimidine, finding it to be ∼−0.25 kBT/base and nearly constant over three orders of magnitude in salt concentration. PMID:24507606
Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip
2012-01-06
Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.
Gerald, Joe K.; Zhang, Bin; McClure, Leslie A.; Bailey, William C.; Harrington, Kathy F.
2012-01-01
Background Viral upper respiratory infections have been implicated as a major cause of asthma exacerbations among school age children. Regular hand washing is the most effective method to prevent the spread of viral respiratory infections but, effective hand washing practices are difficult to establish in schools. Objectives This randomized controlled trial evaluated whether a standardized regimen of hand washing plus alcohol-based hand sanitizer could reduce asthma exacerbations more than schools’ usual hand hygiene practices. Methods This was a two year, community-based, randomized controlled crossover trial. Schools were randomized to usual care then intervention (Sequence 1) or intervention then usual care (Sequence 2). Intervention schools were provided with alcohol-based hand sanitizer, hand soap, and hand hygiene education. The primary outcome was the proportion of students experiencing an asthma exacerbation each month. Generalized estimating equations were used to model the difference in the marginal rate of exacerbations between sequences while controlling for individual demographic factors and the correlation within each student and between students within each school. Results 527 students with asthma were enrolled among 31 schools. The hand hygiene intervention did not reduce the number of asthma exacerbations as compared to the schools’ usual hand hygiene practices (p=0.132). There was a strong temporal trend as both sequences experienced fewer exacerbations during Year 2 as compared to Year 1 (p<0.001). Conclusions While the intervention was not found to be effective, the results were confounded by the H1N1 influenza pandemic that resulted in substantially increased hand hygiene behaviors and resources in usual care schools. Therefore, these results should be viewed cautiously. PMID:23069487
Precise genotyping and recombination detection of Enterovirus
2015-01-01
Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep conception, http://symbiont.iis.sinica.edu.tw/evidence), for EV genotyping and recombination detection. We introduce the idea of using mixed-ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. PMID:26678286
NASA Astrophysics Data System (ADS)
Keshet, Aviv; Ketterle, Wolfgang
2013-01-01
Atomic physics experiments often require a complex sequence of precisely timed computer controlled events. This paper describes a distributed graphical user interface-based control system designed with such experiments in mind, which makes use of off-the-shelf output hardware from National Instruments. The software makes use of a client-server separation between a user interface for sequence design and a set of output hardware servers. Output hardware servers are designed to use standard National Instruments output cards, but the client-server nature should allow this to be extended to other output hardware. Output sequences running on multiple servers and output cards can be synchronized using a shared clock. By using a field programmable gate array-generated variable frequency clock, redundant buffers can be dramatically shortened, and a time resolution of 100 ns achieved over effectively arbitrary sequence lengths.
Keshet, Aviv; Ketterle, Wolfgang
2013-01-01
Atomic physics experiments often require a complex sequence of precisely timed computer controlled events. This paper describes a distributed graphical user interface-based control system designed with such experiments in mind, which makes use of off-the-shelf output hardware from National Instruments. The software makes use of a client-server separation between a user interface for sequence design and a set of output hardware servers. Output hardware servers are designed to use standard National Instruments output cards, but the client-server nature should allow this to be extended to other output hardware. Output sequences running on multiple servers and output cards can be synchronized using a shared clock. By using a field programmable gate array-generated variable frequency clock, redundant buffers can be dramatically shortened, and a time resolution of 100 ns achieved over effectively arbitrary sequence lengths.
Effects of sequence on DNA wrapping around histones
NASA Astrophysics Data System (ADS)
Ortiz, Vanessa
2011-03-01
A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).
An efficient approach to BAC based assembly of complex genomes.
Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David
2016-01-01
There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.
Hierarchical Traces for Reduced NSM Memory Requirements
NASA Astrophysics Data System (ADS)
Dahl, Torbjørn S.
This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
BLAST and FASTA similarity searching for multiple sequence alignment.
Pearson, William R
2014-01-01
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Recent patents of nanopore DNA sequencing technology: progress and challenges.
Zhou, Jianfeng; Xu, Bingqian
2010-11-01
DNA sequencing techniques witnessed fast development in the last decades, primarily driven by the Human Genome Project. Among the proposed new techniques, Nanopore was considered as a suitable candidate for the single DNA sequencing with ultrahigh speed and very low cost. Several fabrication and modification techniques have been developed to produce robust and well-defined nanopore devices. Many efforts have also been done to apply nanopore to analyze the properties of DNA molecules. By comparing with traditional sequencing techniques, nanopore has demonstrated its distinctive superiorities in main practical issues, such as sample preparation, sequencing speed, cost-effective and read-length. Although challenges still remain, recent researches in improving the capabilities of nanopore have shed a light to achieve its ultimate goal: Sequence individual DNA strand at single nucleotide level. This patent review briefly highlights recent developments and technological achievements for DNA analysis and sequencing at single molecule level, focusing on nanopore based methods.
Local alignment of two-base encoded DNA sequence
Homer, Nils; Merriman, Barry; Nelson, Stanley F
2009-01-01
Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young
2017-08-15
Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.
Effects of 16S rDNA sampling on estimates of the number of endosymbiont lineages in sucking lice
Burleigh, J. Gordon; Light, Jessica E.; Reed, David L.
2016-01-01
Phylogenetic trees can reveal the origins of endosymbiotic lineages of bacteria and detect patterns of co-evolution with their hosts. Although taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships are based on few available bacterial sequences. Here we examined how different sampling strategies of Gammaproteobacteria sequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from other Gammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. Sampling of 16S rDNA sequences affects the estimates of endosymbiont diversity in sucking lice until we reach a threshold of genetic diversity, the size of which depends on the sampling strategy. Sampling by maximizing the diversity of 16S rDNA sequences is more efficient than randomly sampling available 16S rDNA sequences. Although simulation results validate estimates of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that the precise number of endosymbiont origins is still uncertain. PMID:27547523
Multi person detection and tracking based on hierarchical level-set method
NASA Astrophysics Data System (ADS)
Khraief, Chadia; Benzarti, Faouzi; Amiri, Hamid
2018-04-01
In this paper, we propose an efficient unsupervised method for mutli-person tracking based on hierarchical level-set approach. The proposed method uses both edge and region information in order to effectively detect objects. The persons are tracked on each frame of the sequence by minimizing an energy functional that combines color, texture and shape information. These features are enrolled in covariance matrix as region descriptor. The present method is fully automated without the need to manually specify the initial contour of Level-set. It is based on combined person detection and background subtraction methods. The edge-based is employed to maintain a stable evolution, guide the segmentation towards apparent boundaries and inhibit regions fusion. The computational cost of level-set is reduced by using narrow band technique. Many experimental results are performed on challenging video sequences and show the effectiveness of the proposed method.
Design of association studies with pooled or un-pooled next-generation sequencing data.
Kim, Su Yeon; Li, Yingrui; Guo, Yiran; Li, Ruiqiang; Holmkvist, Johan; Hansen, Torben; Pedersen, Oluf; Wang, Jun; Nielsen, Rasmus
2010-07-01
Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing. (c) 2010 Wiley-Liss, Inc.
The siRNA Non-seed Region and Its Target Sequences Are Auxiliary Determinants of Off-Target Effects.
Kamola, Piotr J; Nakano, Yuko; Takahashi, Tomoko; Wilson, Paul A; Ui-Tei, Kumiko
2015-12-01
RNA interference (RNAi) is a powerful tool for post-transcriptional gene silencing. However, the siRNA guide strand may bind unintended off-target transcripts via partial sequence complementarity by a mechanism closely mirroring micro RNA (miRNA) silencing. To better understand these off-target effects, we investigated the correlation between sequence features within various subsections of siRNA guide strands, and its corresponding target sequences, with off-target activities. Our results confirm previous reports that strength of base-pairing in the siRNA seed region is the primary factor determining the efficiency of off-target silencing. However, the degree of downregulation of off-target transcripts with shared seed sequence is not necessarily similar, suggesting that there are additional auxiliary factors that influence the silencing potential. Here, we demonstrate that both the melting temperature (Tm) in a subsection of siRNA non-seed region, and the GC contents of its corresponding target sequences, are negatively correlated with the efficiency of off-target effect. Analysis of experimentally validated miRNA targets demonstrated a similar trend, indicating a putative conserved mechanistic feature of seed region-dependent targeting mechanism. These observations may prove useful as parameters for off-target prediction algorithms and improve siRNA 'specificity' design rules.
Zhou, H; Miller, A W; Sosic, Z; Buchholz, B; Barron, A E; Kotler, L; Karger, B L
2000-03-01
This paper presents results on ultralong read DNA sequencing with relatively short separation times using capillary electrophoresis with replaceable polymer matrixes. In previous work, the effectiveness of mixed replaceable solutions of linear polyacrylamide (LPA) was demonstrated, and 1000 bases were routinely obtained in less than 1 h. Substantially longer read lengths have now been achieved by a combination of improved formulation of LPA mixtures, optimization of temperature and electric field, adjustment of the sequencing reaction, and refinement of the base-caller. The average molar masses of LPA used as DNA separation matrixes were measured by gel permeation chromatography and multiangle laser light scattering. Newly formulated matrixes comprising 0.5% (w/w) 270 kDa and 2% (w/w) 10 or 17 MDa LPA raised the optimum column temperature from 60 to 70 degrees C, increasing the selectivity for large DNA fragments, while maintaining high selectivity for small fragments as well. This improved resolution was further enhanced by reducing the electric field strength from 200 to 125 V/cm. In addition, because sequencing accuracy beyond 1000 bases was diminished by the low signal from G-terminated fragments when the standard reaction protocol for a commercial dye primer kit was used, the amount of these fragments was doubled. Augmenting the base-calling expert system with rules specific for low peak resolution also had a significant effect, contributing slightly less than half of the total increase in read length. With full optimization, this read length reached up to 1300 bases (average 1250) with 98.5% accuracy in 2 h for a single-stranded M13 template.
ERIC Educational Resources Information Center
Watt, Sarah Jean
2013-01-01
Research to identify validated instructional approaches to teach math to students with LD and those at-risk for failure in both core and supplemental instructional settings is necessary to assist teachers in closing the achievement gaps that exist across the country. The concrete-to-representational-to-abstract instructional sequence (CRA) has…
A Dynamic Applet for the Exploration of the Concept of the Limit of a Sequence
ERIC Educational Resources Information Center
Cheng, Kell; Leung, Allen
2015-01-01
This paper reports findings of an explorative study that examine the effectiveness of a GeoGebra-based dynamic applet in supporting students' construction of the formal definition of the limit of a sequence or convergence. More specifically, it is about how the use of the applet enables students to make connections between the graphical…
Mining dynamic noteworthy functions in software execution sequences.
Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong
2017-01-01
As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.
Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing
2011-01-01
Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247
Directed evolution of the TALE N-terminal domain for recognition of all 5′ bases
Lamb, Brian M.; Mercer, Andrew C.; Barbas, Carlos F.
2013-01-01
Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5′-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5′ T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5′ T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes. PMID:23980031
2013-01-01
Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482
Computational tool for the early screening of monoclonal antibodies for their viscosities
Agrawal, Neeraj J; Helk, Bernhard; Kumar, Sandeep; Mody, Neil; Sathish, Hasige A.; Samra, Hardeep S.; Buck, Patrick M; Li, Li; Trout, Bernhardt L
2016-01-01
Highly concentrated antibody solutions often exhibit high viscosities, which present a number of challenges for antibody-drug development, manufacturing and administration. The antibody sequence is a key determinant for high viscosity of highly concentrated solutions; therefore, a sequence- or structure-based tool that can identify highly viscous antibodies from their sequence would be effective in ensuring that only antibodies with low viscosity progress to the development phase. Here, we present a spatial charge map (SCM) tool that can accurately identify highly viscous antibodies from their sequence alone (using homology modeling to determine the 3-dimensional structures). The SCM tool has been extensively validated at 3 different organizations, and has proved successful in correctly identifying highly viscous antibodies. As a quantitative tool, SCM is amenable to high-throughput automated analysis, and can be effectively implemented during the antibody screening or engineering phase for the selection of low-viscosity antibodies. PMID:26399600
Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding.
Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh
2018-04-11
Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.
Regularized rare variant enrichment analysis for case-control exome sequencing data.
Larson, Nicholas B; Schaid, Daniel J
2014-02-01
Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D
2004-10-01
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.
NASA Technical Reports Server (NTRS)
Zorumski, W. E.
1983-01-01
Analytic propeller noise prediction involves a sequence of computations culminating in the application of acoustic equations. The prediction sequence currently used by NASA in its ANOPP (aircraft noise prediction) program is described. The elements of the sequence are called program modules. The first group of modules analyzes the propeller geometry, the aerodynamics, including both potential and boundary layer flow, the propeller performance, and the surface loading distribution. This group of modules is based entirely on aerodynamic strip theory. The next group of modules deals with the actual noise prediction, based on data from the first group. Deterministic predictions of periodic thickness and loading noise are made using Farassat's time-domain methods. Broadband noise is predicted by the semi-empirical Schlinker-Amiet method. Near-field predictions of fuselage surface pressures include the effects of boundary layer refraction and (for a cylinder) scattering. Far-field predictions include atmospheric and ground effects. Experimental data from subsonic and transonic propellers are compared and NASA's future direction is propeller noise technology development are indicated.
Reducing DNA context dependence in bacterial promoters
Carr, Swati B.; Densmore, Douglas M.
2017-01-01
Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio. PMID:28422998
Gowrisankar, Sivakumar; Lerner-Ellis, Jordan P; Cox, Stephanie; White, Emily T; Manion, Megan; LeVan, Kevin; Liu, Jonathan; Farwell, Lisa M; Iartchouk, Oleg; Rehm, Heidi L; Funke, Birgit H
2010-11-01
Medical sequencing for diseases with locus and allelic heterogeneities has been limited by the high cost and low throughput of traditional sequencing technologies. "Second-generation" sequencing (SGS) technologies allow the parallel processing of a large number of genes and, therefore, offer great promise for medical sequencing; however, their use in clinical laboratories is still in its infancy. Our laboratory offers clinical resequencing for dilated cardiomyopathy (DCM) using an array-based platform that interrogates 19 of more than 30 genes known to cause DCM. We explored both the feasibility and cost effectiveness of using PCR amplification followed by SGS technology for sequencing these 19 genes in a set of five samples enriched for known sequence alterations (109 unique substitutions and 27 insertions and deletions). While the analytical sensitivity for substitutions was comparable to that of the DCM array (98%), SGS technology performed better than the DCM array for insertions and deletions (90.6% versus 58%). Overall, SGS performed substantially better than did the current array-based testing platform; however, the operational cost and projected turnaround time do not meet our current standards. Therefore, efficient capture methods and/or sample pooling strategies that shorten the turnaround time and decrease reagent and labor costs are needed before implementing this platform into routine clinical applications.
Klein, Donald A.; Flores, Romeo M.; Venot, Christophe; Gabbert, Kendra; Schmidt, Raleigh; Stricker, Gary D.; Pruden, Amy; Mandernack, Kevin
2008-01-01
Coalbed methane regeneration is of increasing interest, and is gaining global attention with respect to enhancement of gas recovery. The objective of this study is to determine if there are differences in methanogen nucleic acid sequences associated with low rank coals from the Powder River Basin, Wyoming, in comparison with sequences that can be recovered from coal bed-associated produced waters. Based on results obtained to date, the sequences from the coals appear to be associated with putatively deep-rooted thermophilic autotrophic methanogens, whereas the sequences from the waters are associated with thermophilic autotrophic and heterotrophic methanogens. The recovered sequences associated with coal thus appear to be both phylogenetically and functionally distinct from those that are more closely associated with the produced water. To be able to relate such recovered sequences to organisms that might be present and possibly active in these environments, it is suggested that direct observation, followed by isolation and single cell-based physiological/molecular analyses, be used to characterize methanogenic consortia possibly associated with coals and/or produced waters. It is also important to characterize the microenvironment where these microbes might be found, in both ecological and geological contexts, to be able to develop effective, ecologically relevant coalbed methane regeneration processes.
NASA Astrophysics Data System (ADS)
Liu, Yan; Deng, Honggui; Ren, Shuang; Tang, Chengying; Qian, Xuewen
2018-01-01
We propose an efficient partial transmit sequence technique based on genetic algorithm and peak-value optimization algorithm (GAPOA) to reduce high peak-to-average power ratio (PAPR) in visible light communication systems based on orthogonal frequency division multiplexing (VLC-OFDM). By analysis of hill-climbing algorithm's pros and cons, we propose the POA with excellent local search ability to further process the signals whose PAPR is still over the threshold after processed by genetic algorithm (GA). To verify the effectiveness of the proposed technique and algorithm, we evaluate the PAPR performance and the bit error rate (BER) performance and compare them with partial transmit sequence (PTS) technique based on GA (GA-PTS), PTS technique based on genetic and hill-climbing algorithm (GH-PTS), and PTS based on shuffled frog leaping algorithm and hill-climbing algorithm (SFLAHC-PTS). The results show that our technique and algorithm have not only better PAPR performance but also lower computational complexity and BER than GA-PTS, GH-PTS, and SFLAHC-PTS technique.
Solution to a gene divergence problem under arbitrary stable nucleotide transition probabilities
NASA Technical Reports Server (NTRS)
Holmquist, R.
1976-01-01
A nucleic acid chain, L nucleotides in length, with the specific base sequence B(1)B(2) ... B(L) is defined by the L-dimensional vector B = (B(1), B(2), ..., B(L)). For twelve given constant non-negative transition probabilities that, in a specified position, the base B is replaced by the base B' in a single step, an exact analytical expression is derived for the probability that the position goes from base B to B' in X steps. Assuming that each base mutates independently of the others, an exact expression is derived for the probability that the initial gene sequence B goes to a sequence B' = (B'(1), B'(2), ..., B'(L)) after X = (X(1), X(2), ..., X(L)) base replacements. The resulting equations allow a more precise accounting for the effects of Darwinian natural selection in molecular evolution than does the idealized (biologically less accurate) assumption that each of the four nucleotides is equally likely to mutate to and be fixed as one of the other three. Illustrative applications of the theory to some problems of biological evolution are given.
2016-01-01
Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919
Holovachov, Oleksandr
2016-01-01
Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.
Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan
2016-04-20
DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .
Human Y chromosome copy number variation in the next generation sequencing era and beyond.
Massaia, Andrea; Xue, Yali
2017-05-01
The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.
NASA Astrophysics Data System (ADS)
Bridges, Jon P.
Improving the STEM readiness of students from historically underserved groups is a moral and economic imperative requiring greater attention and effort than has been shown to date. The current literature suggests a high school science sequence beginning with physics and centered on developing conceptual understanding, using inquiry labs and modeling to allow students to explore new ideas, and addressing and correcting student misconceptions can increase student interest in and preparation for STEM careers. The purpose of this study was to determine if the science college readiness of historically underserved students can be improved by implementing an inquiry-based high school science sequence comprised of coursework in physics, chemistry, and biology for every student. The study used a retrospective cohort observational design to address the primary research question: are there differences between historically underserved students completing a Physics First science sequence and their peers completing a traditional science sequence in 1) science college-readiness test scores, 2) rates of science college-and career-readiness, and 3) interest in STEM? Small positive effects were found for all three outcomes for historically underserved students in the Physics First sequence.
Research on Image Encryption Based on DNA Sequence and Chaos Theory
NASA Astrophysics Data System (ADS)
Tian Zhang, Tian; Yan, Shan Jun; Gu, Cheng Yan; Ren, Ran; Liao, Kai Xin
2018-04-01
Nowadays encryption is a common technique to protect image data from unauthorized access. In recent years, many scientists have proposed various encryption algorithms based on DNA sequence to provide a new idea for the design of image encryption algorithm. Therefore, a new method of image encryption based on DNA computing technology is proposed in this paper, whose original image is encrypted by DNA coding and 1-D logistic chaotic mapping. First, the algorithm uses two modules as the encryption key. The first module uses the real DNA sequence, and the second module is made by one-dimensional logistic chaos mapping. Secondly, the algorithm uses DNA complementary rules to encode original image, and uses the key and DNA computing technology to compute each pixel value of the original image, so as to realize the encryption of the whole image. Simulation results show that the algorithm has good encryption effect and security.
NASA Astrophysics Data System (ADS)
Yang, Hongxin; Su, Fulin
2018-01-01
We propose a moving target analysis algorithm using speeded-up robust features (SURF) and regular moment in inverse synthetic aperture radar (ISAR) image sequences. In our study, we first extract interest points from ISAR image sequences by SURF. Different from traditional feature point extraction methods, SURF-based feature points are invariant to scattering intensity, target rotation, and image size. Then, we employ a bilateral feature registering model to match these feature points. The feature registering scheme can not only search the isotropic feature points to link the image sequences but also reduce the error matching pairs. After that, the target centroid is detected by regular moment. Consequently, a cost function based on correlation coefficient is adopted to analyze the motion information. Experimental results based on simulated and real data validate the effectiveness and practicability of the proposed method.
NASA Astrophysics Data System (ADS)
Mangia, Mauro; Pareschi, Fabio; Rovatti, Riccardo; Setti, Gianluca
This paper presents a way to cope with the need of simultaneously rejecting narrowband interference and multi-access interference in a UWB system based on direct-sequence CDMA. With this aim in mind, we rely on a closed-form expression of the system bit error probability in presence of both effects. By means of such a formula, we evaluate the effect of spectrum shaping techniques applied to the spreading sequences. The availability of a certain number of degrees of freedom in deciding the spectral profile allows us to cope with different configurations depending on the relative interfering power but also on the relative position of the signal center frequency and the narrowband interferer.
NASA Technical Reports Server (NTRS)
Heuer, H.; Spijkers, W.; Kiesswetter, E.; Schmidtke, V.
1998-01-01
Tacit knowledge is part of many professional skills and can be studied experimentally with implicit-learning paradigms. The authors explored the effects of 2 different stressors, loss of sleep and mental fatigue, on implicit learning in a serial-response time (RT) task. In the 1st experiment, 1 night of sleep deprivation was shown to impair implicit but not explicit sequence learning. In the 2nd experiment, no impairment of both types of sequence learning was found after 1.5 hr of mental work. Serial-RT performance, in contrast, suffered from both stressors. These findings suggest that sleep deprivation induces specific risks for automatic, skill-based behavior that are not present in consciously controlled performance.
Zillmann, M; Limauro, S E; Goodchild, J
1997-01-01
By truncating helix II to two base pairs in a hammerhead ribozyme having long flanking sequences (greater than 30 bases), the rate of cleavage in 1 mM magnesium can be increased roughly 100-fold. Replacing most of the nucleotides in a typical stem-loop II with 1-4 randomized nucleotides gave an RNA library that, even before selection, was more active in 1 mM magnesium than the parent ribozyme, but considerably less active than the truncated stem-loop II ribozyme. A novel, multiround selection for intermolecular cleavage was exploited to optimize this library for cleavage in low concentrations of magnesium. After three rounds of selection at sequentially lower concentrations of magnesium, the library cleaved substrate RNA 20-fold faster than the initial pool and was cloned. This pool was heavily enriched for one particular sequence (5'-CGUG-3') that represented 16 of 52 isolates (the next most common sequence was represented only six times). This sequence also represented the most active sequence, exceeding the activity of the short helix II variant under the conditions of the selection, thereby demonstrating the effectiveness of the selection technique. Analysis of the cleavage rates of RNAs made from eight isolates having different four-base insert sequences allowed assignment of highly preferred bases at each position in the insert. Analysis of pool clones having insert of differing lengths showed that, in general, activity decreased as the length of the insert decreased from 4 to 1. This supports the suggested role of stem-loop II in stabilizing the non-Watson-Crick interactions between the conserved bases of the catalytic core. PMID:9214657
USDA-ARS?s Scientific Manuscript database
Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylat...
The comparison between motor imagery and verbal rehearsal on the learning of sequential movements
Saimpont, Arnaud; Lafleur, Martin F.; Malouin, Francine; Richards, Carol L.; Doyon, Julien; Jackson, hb Philip L.
2013-01-01
Mental practice refers to the cognitive rehearsal of a physical activity. It is widely used by athletes to enhance their performance and its efficiency to help train motor function in people with physical disabilities is now recognized. Mental practice is generally based on motor imagery (MI), i.e., the conscious simulation of a movement without its actual execution. It may also be based on verbal rehearsal (VR), i.e., the silent rehearsal of the labels associated with an action. In this study, the effect of MI training or VR on the learning and retention of a foot-sequence task was investigated. Thirty right-footed subjects, aged between 22 and 37 years old (mean: 27.4 ± 4.1 years) and randomly assigned to one of three groups, practiced a serial reaction time task involving a sequence of three dorsiflexions and three plantar flexions with the left foot. One group (n = 10) mentally practiced the sequence with MI for 5 weeks, another group (n = 10) mentally practiced the sequence with VR of the foot positions for the same duration, and a control group (n = 10) did not practice the sequence mentally. The time to perform the practiced sequence as well as an unpracticed sequence was recorded before training, immediately after training and 6 months after training (retention). The main results showed that the speed improvement after training was significantly greater in the MI group compared to the control group and tended to be greater in the VR group compared to the control group. The improvement in performance did not differ in the MI and VR groups. At retention, however, no difference in response times was found among the three groups, indicating that the effect of mental practice did not last over a long period without training. Interestingly, this pattern of results was similar for the practiced and non-practiced sequence. Overall, these results suggest that both MI training and VR help to improve motor performance and that mental practice may induce non-specific effects. PMID:24302905
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Garrido-Martín, Diego; Pazos, Florencio
2018-02-27
The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ton, H.; Yeung, E.S.
1997-02-15
An integrated on-line prototype for coupling a microreactor to capillary electrophoresis for DNA sequencing has been demonstrated. A dye-labeled terminator cycle-sequencing reaction is performed in a fused-silica capillary. Subsequently, the sequencing ladder is directly injected into a size-exclusion chromatographic column operated at nearly 95{degree}C for purification. On-line injection to a capillary for electrophoresis is accomplished at a junction set at nearly 70{degree}C. High temperature at the purification column and injection junction prevents the renaturation of DNA fragments during on-line transfer without affecting the separation. The high solubility of DNA in and the relatively low ionic strength of 1 x TEmore » buffer permit both effective purification and electrokinetic injection of the DNA sample. The system is compatible with highly efficient separations by a replaceable poly(ethylene oxide) polymer solution in uncoated capillary tubes. Future automation and adaptation to a multiple-capillary array system should allow high-speed, high-throughput DNA sequencing from templates to called bases in one step. 32 refs., 5 figs.« less
NASA Astrophysics Data System (ADS)
Zhang, Chongfu; Qiu, Kun; Zhou, Heng; Ling, Yun; Wang, Yawei; Xu, Bo
2010-03-01
In this paper, the tunable multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) is experimentally demonstrated for the first time. The tunable MOOCS-based optical label is performed by using fiber Bragg grating (FBG)-based optical en/decoders group and optical switches configured by using Field Programmable Gate Array (FPGA), and the optical label is erased by using Semiconductor Optical Amplifier (SOA). Some waveforms of the MOOCS-based optical label, optical packet including the MOOCS-based optical label and the payloads are obtained, the switching control mechanism and the switching matrix are discussed, the bit error rate (BER) performance of this system is also studied. These experimental results show that the tunable MOOCS-OPS scheme is effective.
PomBase: a comprehensive online resource for fission yeast
Wood, Valerie; Harris, Midori A.; McDowall, Mark D.; Rutherford, Kim; Vaughan, Brendan W.; Staines, Daniel M.; Aslett, Martin; Lock, Antonia; Bähler, Jürg; Kersey, Paul J.; Oliver, Stephen G.
2012-01-01
PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance. PMID:22039153
Learner Attention to Form in ACCESS Task-Based Interaction
ERIC Educational Resources Information Center
Dao, Phung; Iwashita, Noriko; Gatbonton, Elizabeth
2017-01-01
This study explored the potential effects of communicative tasks developed using a reformulation of a task-based language teaching called Automatization in Communicative Contexts of Essential Speech Sequences (ACCESS) that includes automatization of language elements as one of its goals on learner attention to form in task-based interaction. The…
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.
van Binsbergen, Rianne; Calus, Mario P L; Bink, Marco C A M; van Eeuwijk, Fred A; Schrooten, Chris; Veerkamp, Roel F
2015-09-17
In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data. Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training. Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed. Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.
Music performance and the perception of key.
Thompson, W F; Cuddy, L L
1997-02-01
The effect of music performance on perceived key movement was examined. Listeners judged key movement in sequences presented without performance expression (mechanical) in Experiment 1 and with performance expression in Experiment 2. Modulation distance varied. Judgments corresponded to predictions based on the cycle of fifths and toroidal models of key relatedness, with the highest correspondence for performed versions with the toroidal model. In Experiment 3, listeners compared mechanical sequences with either performed sequences or modifications of performed sequences. Modifications preserved expressive differences between chords, but not between voices. Predictions from Experiments 1 and 2 held only for performed sequences, suggesting that differences between voices are informative of key movement. Experiment 4 confirmed that modifications did not disrupt musicality. Analyses of performances further suggested a link between performance expression and key.
Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen
2015-04-15
In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Xia, Xia-Yu; Ge, Meng; Hsi, Jenny H; He, Xiang; Ruan, Yu-Hua; Wang, Zhi-Xin; Shao, Yi-Ming; Pan, Xian-Ming
2014-01-01
Accurate estimates of HIV-1 incidence are essential for monitoring epidemic trends and evaluating intervention efforts. However, the long asymptomatic stage of HIV-1 infection makes it difficult to effectively distinguish incident infections from chronic ones. Current incidence assays based on serology or viral sequence diversity are both still lacking in accuracy. In the present work, a sequence clustering based diversity (SCBD) assay was devised by utilizing the fact that viral sequences derived from each transmitted/founder (T/F) strain tend to cluster together at early stage, and that only the intra-cluster diversity is correlated with the time since HIV-1 infection. The dot-matrix pairwise alignment was used to eliminate the disproportional impact of insertion/deletions (indels) and recombination events, and so was the proportion of clusterable sequences (Pc) as an index to identify late chronic infections with declined viral genetic diversity. Tested on a dataset containing 398 incident and 163 chronic infection cases collected from the Los Alamos HIV database (last modified 2/8/2012), our SCBD method achieved 99.5% sensitivity and 98.8% specificity, with an overall accuracy of 99.3%. Further analysis and evaluation also suggested its performance was not affected by host factors such as the viral subtypes and transmission routes. The SCBD method demonstrated the potential of sequencing based techniques to become useful for identifying incident infections. Its use may be most advantageous for settings with low to moderate incidence relative to available resources. The online service is available at http://www.bioinfo.tsinghua.edu.cn:8080/SCBD/index.jsp.
An Assessment of Cumulative Axial and Torsional Fatigue in a Cobalt-Base Superalloy
NASA Technical Reports Server (NTRS)
Kalluri, Sreeramesh; Bonacuse, Peter J.
2010-01-01
Cumulative fatigue under axial and torsional loading conditions can include both load-order (higMow and low/high) as well as load-type sequence (axial/torsional and torsional/axial) effects. Previously reported experimental studies on a cobalt-base superalloy, Haynes 188 at 538 C, addressed these effects. These studies characterized the cumulative axial and torsional fatigue behavior under high amplitude followed by low amplitude (Kalluri, S. and Bonacuse, P. J., "Cumulative Axial and Torsional Fatigue: An Investigation of Load-Type Sequance Effects," in Multiaxial Fatigue and Deformation: Testing and Prediction, ASTM STP 1387, S. Kalluri, and P. J. Bonacuse, Eds., American Society for Testing and Materials, West Conshohocken, PA, 2000, pp. 281-301) and low amplitude followed by high amplitude (Bonacuse, P. and Kalluri, S. "Sequenced Axial and Torsional Cumulative Fatigue: Low Amplitude Followed by High Amplitude Loading," Biaxial/Multiaxial Fatigue and Fracture, ESIS Publication 31, A. Carpinteri, M. De Freitas, and A. Spagnoli, Eds., Elsevier, New York, 2003, pp. 165-182) conditions. In both studies, experiments with the following four load-type sequences were performed: (a) axial/axial, (b) torsional/torsional, (c) axial/torsional, and (d) torsional/axial. In this paper, the cumulative axial and torsional fatigue data generated in the two previous studies are combined to generate a comprehensive cumulative fatigue database on both the load-order and load-type sequence effects. This comprehensive database is used to examine applicability of the Palmgren-langer-Miner linear damage rule and a nonlinear damage curve approach for Haynes 188 subjected to the load-order and load-type sequencing described above. Summations of life fractions from the experiments are compared to the predictions from both the linear and nonlinear cumulative fatigue damage approaches. The significance of load-order versus load-type sequence effects for axial and torsional loading conditions is discussed. Possible reasons for the observed differences between the computed and observed summations of cycle fractions are rationalized in terms of the observed ever lutions of cyclic axial and shear stress ranges in the cumulative fatigue tests.
ERIC Educational Resources Information Center
Savinainen, Antti; Mäkynen, Asko; Nieminen, Pasi; Viiri, Jouni
2017-01-01
This paper presents a research-based teaching-learning sequence (TLS) that focuses on the notion of interaction in teaching Newton's third law (N3 law) which is, as earlier studies have shown, a challenging topic for students to learn. The TLS made systematic use of a visual representation tool--an interaction diagram (ID)--highlighting…
Yousef, Abdulaziz; Moghadam Charkari, Nasrollah
2013-11-07
Protein-Protein interaction (PPI) is one of the most important data in understanding the cellular processes. Many interesting methods have been proposed in order to predict PPIs. However, the methods which are based on the sequence of proteins as a prior knowledge are more universal. In this paper, a sequence-based, fast, and adaptive PPI prediction method is introduced to assign two proteins to an interaction class (yes, no). First, in order to improve the presentation of the sequences, twelve physicochemical properties of amino acid have been used by different representation methods to transform the sequence of protein pairs into different feature vectors. Then, for speeding up the learning process and reducing the effect of noise PPI data, principal component analysis (PCA) is carried out as a proper feature extraction algorithm. Finally, a new and adaptive Learning Vector Quantization (LVQ) predictor is designed to deal with different models of datasets that are classified into balanced and imbalanced datasets. The accuracy of 93.88%, 90.03%, and 89.72% has been found on S. cerevisiae, H. pylori, and independent datasets, respectively. The results of various experiments indicate the efficiency and validity of the method. © 2013 Published by Elsevier Ltd.
Wittevrongel, Benjamin; Van Wolputte, Elia; Van Hulle, Marc M
2017-11-08
When encoding visual targets using various lagged versions of a pseudorandom binary sequence of luminance changes, the EEG signal recorded over the viewer's occipital pole exhibits so-called code-modulated visual evoked potentials (cVEPs), the phase lags of which can be tied to these targets. The cVEP paradigm has enjoyed interest in the brain-computer interfacing (BCI) community for the reported high information transfer rates (ITR, in bits/min). In this study, we introduce a novel decoding algorithm based on spatiotemporal beamforming, and show that this algorithm is able to accurately identify the gazed target. Especially for a small number of repetitions of the coding sequence, our beamforming approach significantly outperforms an optimised support vector machine (SVM)-based classifier, which is considered state-of-the-art in cVEP-based BCI. In addition to the traditional 60 Hz stimulus presentation rate for the coding sequence, we also explore the 120 Hz rate, and show that the latter enables faster communication, with a maximal median ITR of 172.87 bits/min. Finally, we also report on a transition effect in the EEG signal following the onset of the stimulus sequence, and recommend to exclude the first 150 ms of the trials from decoding when relying on a single presentation of the stimulus sequence.
Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function.
Mehrotra, Shweta; Goyal, Vinod
2014-08-01
Repetitive DNA sequences are a major component of eukaryotic genomes and may account for up to 90% of the genome size. They can be divided into minisatellite, microsatellite and satellite sequences. Satellite DNA sequences are considered to be a fast-evolving component of eukaryotic genomes, comprising tandemly-arrayed, highly-repetitive and highly-conserved monomer sequences. The monomer unit of satellite DNA is 150-400 base pairs (bp) in length. Repetitive sequences may be species- or genus-specific, and may be centromeric or subtelomeric in nature. They exhibit cohesive and concerted evolution caused by molecular drive, leading to high sequence homogeneity. Repetitive sequences accumulate variations in sequence and copy number during evolution, hence they are important tools for taxonomic and phylogenetic studies, and are known as "tuning knobs" in the evolution. Therefore, knowledge of repetitive sequences assists our understanding of the organization, evolution and behavior of eukaryotic genomes. Repetitive sequences have cytoplasmic, cellular and developmental effects and play a role in chromosomal recombination. In the post-genomics era, with the introduction of next-generation sequencing technology, it is possible to evaluate complex genomes for analyzing repetitive sequences and deciphering the yet unknown functional potential of repetitive sequences. Copyright © 2014 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Hu, Lanying; Lim, Kah Wai; Bouaziz, Serge; Phan, Anh Tuân
2009-11-25
Recently, it has been shown that in K(+) solution the human telomeric sequence d[TAGGG(TTAGGG)(3)] forms a (3 + 1) intramolecular G-quadruplex, while the Bombyx mori telomeric sequence d[TAGG(TTAGG)(3)], which differs from the human counterpart only by one G deletion in each repeat, forms a chair-type intramolecular G-quadruplex, indicating an effect of G-tract length on the folding topology of G-quadruplexes. To explore the effect of loop length and sequence on the folding topology of G-quadruplexes, here we examine the structure of the four-repeat Giardia telomeric sequence d[TAGGG(TAGGG)(3)], which differs from the human counterpart only by one T deletion within the non-G linker in each repeat. We show by NMR that this sequence forms two different intramolecular G-quadruplexes in K(+) solution. The first one is a novel basket-type antiparallel-stranded G-quadruplex containing two G-tetrads, a G x (A-G) triad, and two A x T base pairs; the three loops are consecutively edgewise-diagonal-edgewise. The second one is a propeller-type parallel-stranded G-quadruplex involving three G-tetrads; the three loops are all double-chain-reversal. Recurrence of several structural elements in the observed structures suggests a "cut and paste" principle for the design and prediction of G-quadruplex topologies, for which different elements could be extracted from one G-quadruplex and inserted into another.
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.
Hua, Wei; Wang, Jiasong; Zhao, Jian
2014-01-01
Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W
2018-05-01
The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.
Belotserkovskii, Boris P.; Neil, Alexander J.; Saleh, Syed Shayon; Shin, Jane Hae Soo; Mirkin, Sergei M.; Hanawalt, Philip C.
2013-01-01
The ability of DNA to adopt non-canonical structures can affect transcription and has broad implications for genome functioning. We have recently reported that guanine-rich (G-rich) homopurine-homopyrimidine sequences cause significant blockage of transcription in vitro in a strictly orientation-dependent manner: when the G-rich strand serves as the non-template strand [Belotserkovskii et al. (2010) Mechanisms and implications of transcription blockage by guanine-rich DNA sequences., Proc. Natl Acad. Sci. USA, 107, 12816–12821]. We have now systematically studied the effect of the sequence composition and single-stranded breaks on this blockage. Although substitution of guanine by any other base reduced the blockage, cytosine and thymine reduced the blockage more significantly than adenine substitutions, affirming the importance of both G-richness and the homopurine-homopyrimidine character of the sequence for this effect. A single-strand break in the non-template strand adjacent to the G-rich stretch dramatically increased the blockage. Breaks in the non-template strand result in much weaker blockage signals extending downstream from the break even in the absence of the G-rich stretch. Our combined data support the notion that transcription blockage at homopurine-homopyrimidine sequences is caused by R-loop formation. PMID:23275544
Mohkam, Milad; Nezafat, Navid; Berenjian, Aydin; Mobasher, Mohammad Ali; Ghasemi, Younes
2016-03-01
Some Bacillus species, especially Bacillus subtilis and Bacillus pumilus groups, have highly similar 16S rRNA gene sequences, which are hard to identify based on 16S rDNA sequence analysis. To conquer this drawback, rpoB, recA sequence analysis along with randomly amplified polymorphic (RAPD) fingerprinting was examined as an alternative method for differentiating Bacillus species. The 16S rRNA, rpoB and recA genes were amplified via a polymerase chain reaction using their specific primers. The resulted PCR amplicons were sequenced, and phylogenetic analysis was employed by MEGA 6 software. Identification based on 16S rRNA gene sequencing was underpinned by rpoB and recA gene sequencing as well as RAPD-PCR technique. Subsequently, concatenation and phylogenetic analysis showed that extent of diversity and similarity were better obtained by rpoB and recA primers, which are also reinforced by RAPD-PCR methods. However, in one case, these approaches failed to identify one isolate, which in combination with the phenotypical method offsets this issue. Overall, RAPD fingerprinting, rpoB and recA along with concatenated genes sequence analysis discriminated closely related Bacillus species, which highlights the significance of the multigenic method in more precisely distinguishing Bacillus strains. This research emphasizes the benefit of RAPD fingerprinting, rpoB and recA sequence analysis superior to 16S rRNA gene sequence analysis for suitable and effective identification of Bacillus species as recommended for probiotic products.
SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome.
Li, Yiwei; Ilie, Lucian
2017-11-15
Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/ .
Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H
2012-05-01
Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.
Does order matter? Investigating the effect of sequence on glance duration during on-road driving
Roberts, Shannon C.; Reimer, Bryan; Mehler, Bruce
2017-01-01
Previous literature has shown that vehicle crash risks increases as drivers’ off-road glance duration increases. Many factors influence drivers’ glance duration such as individual differences, driving environment, or task characteristics. Theories and past studies suggest that glance duration increases as the task progresses, but the exact relationship between glance sequence and glance durations is not fully understood. The purpose of this study was to examine the effect of glance sequence on glance duration among drivers completing a visual-manual radio tuning task and an auditory-vocal based multi-modal navigation entry task. Eighty participants drove a vehicle on urban highways while completing radio tuning and navigation entry tasks. Forty participants drove under an experimental protocol that required three button presses followed by rotation of a tuning knob to complete the radio tuning task while the other forty participants completed the task with one less button press. Multiple statistical analyses were conducted to measure the effect of glance sequence on glance duration. Results showed that across both tasks and a variety of statistical tests, glance sequence had inconsistent effects on glance duration—the effects varied according to the number of glances, task type, and data set that was being evaluated. Results suggest that other aspects of the task as well as interface design effect glance duration and should be considered in the context of examining driver attention or lack thereof. All in all, interface design and task characteristics have a more influential impact on glance duration than glance sequence, suggesting that classical design considerations impacting driver attention, such as the size and location of buttons, remain fundamental in designing in-vehicle interfaces. PMID:28158301
Carbonell, Alberto; Fahlgren, Noah; Mitchell, Skyler; ...
2015-05-20
Artificial microRNAs (amiRNAs) are used for selective gene silencing in plants. However, current methods to produce amiRNA constructs for silencing transcripts in monocot species are not suitable for simple, cost-effective and large-scale synthesis. Here, a series of expression vectors based on Oryza sativa MIR390 (OsMIR390) precursor was developed for high-throughput cloning and high expression of amiRNAs in monocots. Four different amiRNA sequences designed to target specifically endogenous genes and expressed from OsMIR390-based vectors were validated in transgenic Brachypodium distachyon plants. Surprisingly, amiRNAs accumulated to higher levels and were processed more accurately when expressed from chimeric OsMIR390-based precursors that include distalmore » stem-loop sequences from Arabidopsis thaliana MIR390a (AtMIR390a). In all cases, transgenic plants displayed the predicted phenotypes induced by target gene repression, and accumulated high levels of amiRNAs and low levels of the corresponding target transcripts. Genome-wide transcriptome profiling combined with 5-RLM-RACE analysis in transgenic plants confirmed that amiRNAs were highly specific. Finally, significance Statement A series of amiRNA vectors based on Oryza sativa MIR390 (OsMIR390) precursor were developed for simple, cost-effective and large-scale synthesis of amiRNA constructs to silence genes in monocots. Unexpectedly, amiRNAs produced from chimeric OsMIR390-based precursors including Arabidopsis thaliana MIR390a distal stem-loop sequences accumulated elevated levels of highly effective and specific amiRNAs in transgenic Brachypodium distachyon plants.« less
2014-01-01
Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993
Rudnizky, Sergei; Khamis, Hadeel; Malik, Omri; Squires, Allison H; Meller, Amit; Melamed, Philippa
2018-01-01
Abstract Most functional transcription factor (TF) binding sites deviate from their ‘consensus’ recognition motif, although their sites and flanking sequences are often conserved across species. Here, we used single-molecule DNA unzipping with optical tweezers to study how Egr-1, a TF harboring three zinc fingers (ZF1, ZF2 and ZF3), is modulated by the sequence and context of its functional sites in the Lhb gene promoter. We find that both the core 9 bp bound to Egr-1 in each of the sites, and the base pairs flanking them, modulate the affinity and structure of the protein–DNA complex. The effect of the flanking sequences is asymmetric, with a stronger effect for the sequence flanking ZF3. Characterization of the dissociation time of Egr-1 revealed that a local, mechanical perturbation of the interactions of ZF3 destabilizes the complex more effectively than a perturbation of the ZF1 interactions. Our results reveal a novel role for ZF3 in the interaction of Egr-1 with other proteins and the DNA, providing insight on the regulation of Lhb and other genes by Egr-1. Moreover, our findings reveal the potential of small changes in DNA sequence to alter transcriptional regulation, and may shed light on the organization of regulatory elements at promoters. PMID:29253225
NASA Astrophysics Data System (ADS)
Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian
Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.
ERIC Educational Resources Information Center
Taylor, Joseph; Kowalski, Susan; Getty, Stephen; Wilson, Christopher; Carlson, Janet
2013-01-01
Effective instructional materials can be valuable interventions to improve student interest and achievement in science (National Research Council [NRC], 2007); yet, analyses indicate that many science instructional materials and curricula are fragmented, lack coherence, and are not carefully articulated through a sequence of grade levels (AAAS,…
Long-range barcode labeling-sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Feng; Zhang, Tao; Singh, Kanwar K.
Methods for sequencing single large DNA molecules by clonal multiple displacement amplification using barcoded primers. Sequences are binned based on barcode sequences and sequenced using a microdroplet-based method for sequencing large polynucleotide templates to enable assembly of haplotype-resolved complex genomes and metagenomes.
Fine, Eric M; Delis, Dean C; Holdnack, James
2011-11-01
The Delis-Kaplan Executive Function System (D-KEFS) Trail Making Test (TMT), a modification of the original TMT, was created to isolate set-shifting (Letter-Number Switching) from other component skills. This was accomplished by including four baseline conditions (Visual Scanning, Number Sequencing, Letter Sequencing, and Motor Speed) and by placing equal numbers of stimuli in the three sequencing conditions. Given that some studies with the original TMT demonstrated a significant effect of education and intellectual functioning on performance, we utilized the D-KEFS national standardization sample to examine the effects of education and vocabulary level-i.e., Vocabulary subtest from the Wechsler Abbreviated Scale of Intelligence (WASI)-on the D-KEFS TMT. The results indicate a significant effect of these variables on each D-KEFS TMT condition. Normative tables for education- and vocabulary-adjusted scaled scores based on the database from the D-KEFS national normative study were generated.
Shi, YingWu; Yang, Hongmei; Zhang, Tao; Sun, Jian; Lou, Kai
2014-01-01
Plants harbors complex and variable microbial communities. Endophytic bacteria play an important function and potential role more effectively in developing sustainable systems of crop production. To examine how endophytic bacteria in sugar beet (Beta vulgaris L.) vary across both host growth period and location, PCR-based Illumina was applied to revealed the diversity and stability of endophytic bacteria in sugar beet on the north slope of Tianshan mountain, China. A total of 60.84 M effective sequences of 16S rRNA gene V3 region were obtained from sugar beet samples. These sequences revealed huge amount of operational taxonomic units (OTUs) in sugar beet, that is, 19-121 OTUs in a beet sample, at 3 % cutoff level and sequencing depth of 30,000 sequences. We identified 13 classes from the resulting 449,585 sequences. Alphaproteobacteria were the dominant class in all sugar beets, followed by Acidobacteria, Gemmatimonadetes and Actinobacteria. A marked difference in the diversity of endophytic bacteria in sugar beet for different growth periods was evident. The greatest number of OTUs was detected during rossette formation (109 OTUs) and tuber growth (146 OTUs). Endophytic bacteria diversity was reduced during seedling growth (66 OTUs) and sucrose accumulation (95 OTUs). Forty-three OTUs were common to all four periods. There were more tags of Alphaproteobacteria and Gammaproteobacteria in Shihezi than in Changji. The dynamics of endophytic bacteria communities were influenced by plant genotype and plant growth stage. To the best of our knowledge, this study is the first application of PCR-based Illumina pyrosequencing to characterize and compare multiple sugar beet samples.
Flow cytometry for enrichment and titration in massively parallel DNA sequencing
Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim
2009-01-01
Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748
[Molecular identification of astragali radix and its adulterants by ITS sequences].
Cui, Zhan-Hu; Li, Yue; Yuan, Qing-Jun; Zhou, Li-She; Li, Min-Hui
2012-12-01
To explore a new method for identification Astragali Radix from its adulterants by using ITS sequence. Thirteen samples of the different Astragali Radix materials and 6 samples of the adulterants of the roots of Hedysarum polybotrys, Medicago sativa and Althaea rosea were collected. ITS sequence was amplified by PCR and sequenced unidirectionally. The interspecific K-2-P distances of Astragali Radix and its adulterants were calculated, and NJ tree and UPGMA tree were constructed by MEGA 4. ITS sequences were obtained from 19 samples respectively, there were Astragali Radix 646-650 bp, H. polybotrys 664 bp, Medicago sativa 659 bp, Althaea rosea 728 bp, which were registered in the GenBank. Phylogeny trees reconstruction using NJ and UPGMA analysis based on ITS nucleotide sequences can effectively distinguish Astragali Radix from adulterants. ITS sequence can be used to identify Astragali Radix from its adulterants successfully and is an efficient molecular marker for authentication of Astragali Radix and its adulterants.
2011-01-01
Background The advent of genomics-based technologies has revolutionized many fields of biological enquiry. However, chromosome walking or flanking sequence cloning is still a necessary and important procedure to determining gene structure. Such methods are used to identify T-DNA insertion sites and so are especially relevant for organisms where large T-DNA insertion libraries have been created, such as rice and Arabidopsis. The currently available methods for flanking sequence cloning, including the popular TAIL-PCR technique, are relatively laborious and slow. Results Here, we report a simple and effective fusion primer and nested integrated PCR method (FPNI-PCR) for the identification and cloning of unknown genomic regions flanked known sequences. In brief, a set of universal primers was designed that consisted of various 15-16 base arbitrary degenerate oligonucleotides. These arbitrary degenerate primers were fused to the 3' end of an adaptor oligonucleotide which provided a known sequence without degenerate nucleotides, thereby forming the fusion primers (FPs). These fusion primers are employed in the first step of an integrated nested PCR strategy which defines the overall FPNI-PCR protocol. In order to demonstrate the efficacy of this novel strategy, we have successfully used it to isolate multiple genomic sequences namely, 21 orthologs of genes in various species of Rosaceace, 4 MYB genes of Rosa rugosa, 3 promoters of transcription factors of Petunia hybrida, and 4 flanking sequences of T-DNA insertion sites in transgenic tobacco lines and 6 specific genes from sequenced genome of rice and Arabidopsis. Conclusions The successful amplification of target products through FPNI-PCR verified that this novel strategy is an effective, low cost and simple procedure. Furthermore, FPNI-PCR represents a more sensitive, rapid and accurate technique than the established TAIL-PCR and hiTAIL-PCR procedures. PMID:22093809
Convergence of decision rules for value-based pricing of new innovative drugs.
Gandjour, Afschin
2015-04-01
Given the high costs of innovative new drugs, most European countries have introduced policies for price control, in particular value-based pricing (VBP) and international reference pricing. The purpose of this study is to describe how profit-maximizing manufacturers would optimally adjust their launch sequence to these policies and how VBP countries may best respond. To decide about the launching sequence, a manufacturer must consider a tradeoff between price and sales volume in any given country as well as the effect of price in a VBP country on the price in international reference pricing countries. Based on the manufacturer's rationale, it is best for VBP countries in Europe to implicitly collude in the long term and set cost-effectiveness thresholds at the level of the lowest acceptable VBP country. This way, international reference pricing countries would also converge towards the lowest acceptable threshold in Europe.
NASA Astrophysics Data System (ADS)
Han, Pingping; Zhang, Haitian; Chen, Lingqi; Zhang, Xiaoan
2018-01-01
The models of doubly fed induction generator (DFIG) and its grid-side converter (GSC) are established under unbalanced grid condition based on DIgSILENT/PowerFactory. According to the mathematical model, the vector equations of positive and negative sequence voltage and current are deduced in the positive sequence synchronous rotating reference frame d-q-0 when the characteristics of the simulation software are considered adequately. Moreover, the reference value of current component of GSC in the positive sequence frame d-q-0 under unbalanced condition can be obtained to improve the traditional control of GSC when the national issue of unbalanced current limits is combined. The simulated results indicate that the control strategy can restrain negative sequence current and the two times frequency power wave of GSC’s ac side effectively. The voltage of DC bus can be maintained a constant to ensure the uninterrupted operation of DFIG under unbalanced grid condition eventually.
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai
2017-01-01
Foxtail millet (Setaria italica) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. PMID:28364039
Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai
2017-05-05
Foxtail millet ( Setaria italica ) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. Copyright © 2017 Zhang et al.
Adamiak, Paul; Vanderkooi, Otto G; Kellner, James D; Schryvers, Anthony B; Bettinger, Julie A; Alcantara, Joenel
2014-06-03
Multi-locus sequence typing (MLST) is a portable, broadly applicable method for classifying bacterial isolates at an intra-species level. This methodology provides clinical and scientific investigators with a standardized means of monitoring evolution within bacterial populations. MLST uses the DNA sequences from a set of genes such that each unique combination of sequences defines an isolate's sequence type. In order to reliably determine the sequence of a typing gene, matching sequence reads for both strands of the gene must be obtained. This study assesses the ability of both the standard, and an alternative set of, Streptococcus pneumoniae MLST primers to completely sequence, in both directions, the required typing alleles. The results demonstrated that for five (aroE, recP, spi, xpt, ddl) of the seven S. pneumoniae typing alleles, the standard primers were unable to obtain the complete forward and reverse sequences. This is due to the standard primers annealing too closely to the target regions, and current sequencing technology failing to sequence the bases that are too close to the primer. The alternative primer set described here, which includes a combination of primers proposed by the CDC and several designed as part of this study, addresses this limitation by annealing to highly conserved segments further from the target region. This primer set was subsequently employed to sequence type 105 S. pneumoniae isolates collected by the Canadian Immunization Monitoring Program ACTive (IMPACT) over a period of 18 years. The inability of several of the standard S. pneumoniae MLST primers to fully sequence the required region was consistently observed and is the result of a shift in sequencing technology occurring after the original primers were designed. The results presented here introduce clear documentation describing this phenomenon into the literature, and provide additional guidance, through the introduction of a widely validated set of alternative primers, to research groups seeking to undertake S. pneumoniae MLST based studies.
Sideris, Eleftherios; Corbett, Mark; Palmer, Stephen; Woolacott, Nerys; Bojke, Laura
2016-11-01
As part of the National Institute for Health and Clinical Excellence (NICE) single technology appraisal (STA) process, the manufacturer of apremilast was invited to submit evidence for its clinical and cost effectiveness for the treatment of active psoriatic arthritis (PsA) for whom disease-modifying anti-rheumatic drugs (DMARDs) have been inadequately effective, not tolerated or contraindicated. The Centre for Reviews and Dissemination and Centre for Health Economics at the University of York were commissioned to act as the independent Evidence Review Group (ERG). This paper provides a description of the ERG review of the company's submission, the ERG report and submission and summarises the NICE Appraisal Committee's subsequent guidance (December 2015). In the company's initial submission, the base-case analysis resulted in an incremental cost-effectiveness ratio (ICER) of £14,683 per quality-adjusted life-year (QALY) gained for the sequence including apremilast (positioned before tumour necrosis factor [TNF]-α inhibitors) versus a comparator sequence without apremilast. However, the ERG considered that the base-case sequence proposed by the company represented a limited set of potentially relevant treatment sequences and positions for apremilast. The company's base-case results were therefore not a sufficient basis to inform the most efficient use and position of apremilast. The exploratory ERG analyses indicated that apremilast is more effective (i.e. produces higher health gains) when positioned after TNF-α inhibitor therapies. Furthermore, assumptions made regarding a potential beneficial effect of apremilast on long-term Health Assessment Questionnaire (HAQ) progression, which cannot be substantiated, have a very significant impact on results. The NICE Appraisal Committee (AC), when taking into account their preferred assumptions for HAQ progression for patients on treatment with apremilast, placebo response and monitoring costs for apremilast, concluded that the addition of apremilast resulted in cost savings but also a QALY loss. These cost savings were not high enough to compensate for the clinical effectiveness that would be lost. The AC thus decided that apremilast alone or in combination with DMARD therapy is not recommended for treating adults with active PsA that has not responded to prior DMARD therapy, or where such therapy is not tolerated.
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
NASA Astrophysics Data System (ADS)
Abdollahi Azghan, Mehdi; Eslami-Farsani, Reza
2018-02-01
The current study aimed at investigating the effects of different stacking sequences and thermal cycling on the flexural properties of fibre metal laminates (FMLs). FMLs were composed of two aluminium alloy 2024-T3 sheets and epoxy polymer-matrix composites that have four layers of basalt and/or glass fibres with five different stacking sequences. For FML samples the thermal cycle time was about 6 min for temperature cycles from 25 °C to 115 °C. Flexural properties of samples evaluated after 55 thermal cycles and compared to non-exposed samples. Surface modification of aluminium performed by electrochemical treatment (anodizing) method and aluminium surfaces have been examined by scanning electron microscopy (SEM). Also, the flexural failure mechanisms investigated by the optical microscope study of fractured surfaces. SEM images indicated that the porosity of the aluminium surface increased after anodizing process. The findings of the present study showed that flexural modulus were maximum for basalt fibres based FML, minimum for glass fibres based FML while basalt/glass fibres based FML lies between them. Due to change in the failure mechanism of basalt/glass fibres based FMLs that have glass fibres at outer layer of the polymer composite, the flexural strength of this FML is lower than glass and basalt fibres based FML. After thermal cycling, due to the good thermal properties of basalt fibres, flexural properties of basalt fibres based FML structures decreased less than other composites.
NASA Astrophysics Data System (ADS)
Zhang, Ji; Li, Tao; Zheng, Shiqiang; Li, Yiyong
2015-03-01
To reduce the effects of respiratory motion in the quantitative analysis based on liver contrast-enhanced ultrasound (CEUS) image sequencesof single mode. The image gating method and the iterative registration method using model image were adopted to register liver contrast-enhanced ultrasound image sequences of single mode. The feasibility of the proposed respiratory motion correction method was explored preliminarily using 10 hepatocellular carcinomas CEUS cases. The positions of the lesions in the time series of 2D ultrasound images after correction were visually evaluated. Before and after correction, the quality of the weighted sum of transit time (WSTT) parametric images were also compared, in terms of the accuracy and spatial resolution. For the corrected and uncorrected sequences, their mean deviation values (mDVs) of time-intensity curve (TIC) fitting derived from CEUS sequences were measured. After the correction, the positions of the lesions in the time series of 2D ultrasound images were almost invariant. In contrast, the lesions in the uncorrected images all shifted noticeably. The quality of the WSTT parametric maps derived from liver CEUS image sequences were improved more greatly. Moreover, the mDVs of TIC fitting derived from CEUS sequences after the correction decreased by an average of 48.48+/-42.15. The proposed correction method could improve the accuracy of quantitative analysis based on liver CEUS image sequences of single mode, which would help in enhancing the differential diagnosis efficiency of liver tumors.
A hybrid model based on neural networks for biomedical relation extraction.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Zhang, Shaowu; Sun, Yuanyuan; Yang, Liang
2018-05-01
Biomedical relation extraction can automatically extract high-quality biomedical relations from biomedical texts, which is a vital step for the mining of biomedical knowledge hidden in the literature. Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are two major neural network models for biomedical relation extraction. Neural network-based methods for biomedical relation extraction typically focus on the sentence sequence and employ RNNs or CNNs to learn the latent features from sentence sequences separately. However, RNNs and CNNs have their own advantages for biomedical relation extraction. Combining RNNs and CNNs may improve biomedical relation extraction. In this paper, we present a hybrid model for the extraction of biomedical relations that combines RNNs and CNNs. First, the shortest dependency path (SDP) is generated based on the dependency graph of the candidate sentence. To make full use of the SDP, we divide the SDP into a dependency word sequence and a relation sequence. Then, RNNs and CNNs are employed to automatically learn the features from the sentence sequence and the dependency sequences, respectively. Finally, the output features of the RNNs and CNNs are combined to detect and extract biomedical relations. We evaluate our hybrid model using five public (protein-protein interaction) PPI corpora and a (drug-drug interaction) DDI corpus. The experimental results suggest that the advantages of RNNs and CNNs in biomedical relation extraction are complementary. Combining RNNs and CNNs can effectively boost biomedical relation extraction performance. Copyright © 2018 Elsevier Inc. All rights reserved.
Ma, Youlong; Teng, Feiyue; Libera, Matthew
2018-06-05
Solid-phase oligonucleotide amplification is of interest because of possible applications to next-generation sequencing, multiplexed microarray-based detection, and cell-free synthetic biology. Its efficiency is, however, less than that of traditional liquid-phase amplification involving unconstrained primers and enzymes, and understanding how to optimize the solid-phase amplification process remains challenging. Here, we demonstrate the concept of solid-phase nucleic acid sequence-based amplification (SP-NASBA) and use it to study the effect of tethering density on amplification efficiency. SP-NASBA involves two enzymes, avian myeloblastosis virus reverse transcriptase (AMV-RT) and RNase H, to convert tethered forward and reverse primers into tethered double-stranded DNA (ds-DNA) bridges from which RNA - amplicons can be generated by a third enzyme, T7 RNA polymerase. We create microgels on silicon surfaces using electron-beam patterning of thin-film blends of hydroxyl-terminated and biotin-terminated poly(ethylene glycol) (PEG-OH, PEG-B). The tethering density is linearly related to the PEG-B concentration, and biotinylated primers and molecular beacon detection probes are tethered to streptavidin-activated microgels. While SP-NASBA is very efficient at low tethering densities, the efficiency decreases dramatically with increasing tethering density due to three effects: (a) a reduced hybridization efficiency of tethered molecular beacon detection probes; (b) a decrease in T7 RNA polymerase efficiency; (c) inhibition of T7 RNA polymerase activity by AMV-RT.
Walia, Rasna R; Xue, Li C; Wilkins, Katherine; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2014-01-01
Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.
Sequence-similar, structure-dissimilar protein pairs in the PDB.
Kosloff, Mickey; Kolodny, Rachel
2008-05-01
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
Spherical: an iterative workflow for assembling metagenomic datasets.
Hitch, Thomas C A; Creevey, Christopher J
2018-01-24
The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P < 0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .
A Randomized Field Trial of the Fast ForWord Language Computer-Based Training Program
ERIC Educational Resources Information Center
Borman, Geoffrey D.; Benson, James G.; Overman, Laura
2009-01-01
This article describes an independent assessment of the Fast ForWord Language computer-based training program developed by Scientific Learning Corporation. Previous laboratory research involving children with language-based learning impairments showed strong effects on their abilities to recognize brief and fast sequences of nonspeech and speech…
Hearts and Minds: The Priority of Affective versus Cognitive Factors in Person Perception.
ERIC Educational Resources Information Center
Edwards, Kari; Hippel, William von
1995-01-01
In two experiments, affect-based and cognition-based attitudes toward a person were induced by varying sequence of affective and cognitive information presented to subjects while holding content constant. Results indicated affect-based attitudes were most effectively changed by affective persuasive appeals, whether these appeals were produced by…
A Novel Locomotion-based Validation Assay for Candidate Drugs Using Drosophila DYT1 Disease Model
2013-11-01
the genome using the same parental fly line, minimizing the effect of surrounding sequences and genetic variations on the ...locomotion and GTPC cyclrohydolase protein levels; (3) supplementation of dopamine can partially rescue the locomotion defects of Drosophila larvae...8217- GCGAACAACCAAAAAATCATTGAGATAATAAACTCCTCCATTAG-3’) to make dtorsin cDNA that lacks GAC (D307) (Fig. 1) respectively. After confirming mutated sequences , the insert was again
Conformation and Stability of Intramolecular Telomeric G-Quadruplexes: Sequence Effects in the Loops
Sattin, Giovanna; Artese, Anna; Nadai, Matteo; Costa, Giosuè; Parrotta, Lucia; Alcaro, Stefano; Palumbo, Manlio; Richter, Sara N.
2013-01-01
Telomeres are guanine-rich sequences that protect the ends of chromosomes. These regions can fold into G-quadruplex structures and their stabilization by G-quadruplex ligands has been employed as an anticancer strategy. Genetic analysis in human telomeres revealed extensive allelic variation restricted to loop bases, indicating that the variant telomeric sequences maintain the ability to fold into G-quadruplex. To assess the effect of mutations in loop bases on G-quadruplex folding and stability, we performed a comprehensive analysis of mutant telomeric sequences by spectroscopic techniques, molecular dynamics simulations and gel electrophoresis. We found that when the first position in the loop was mutated from T to C or A the resulting structure adopted a less stable antiparallel topology; when the second position was mutated to C or A, lower thermal stability and no evident conformational change were observed; in contrast, substitution of the third position from A to C induced a more stable and original hybrid conformation, while mutation to T did not significantly affect G-quadruplex topology and stability. Our results indicate that allelic variations generate G-quadruplex telomeric structures with variable conformation and stability. This aspect needs to be taken into account when designing new potential anticancer molecules. PMID:24367632
Plasmonic SERS nanochips and nanoprobes for medical diagnostics and bio-energy applications
NASA Astrophysics Data System (ADS)
Ngo, Hoan T.; Wang, Hsin-Neng; Crawford, Bridget M.; Fales, Andrew M.; Vo-Dinh, Tuan
2017-02-01
The development of rapid, easy-to-use, cost-effective, high accuracy, and high sensitive DNA detection methods for molecular diagnostics has been receiving increasing interest. Over the last five years, our laboratory has developed several chip-based DNA detection techniques including the molecular sentinel-on-chip (MSC), the multiplex MSC, and the inverse molecular sentinel-on-chip (iMS-on-Chip). In these techniques, plasmonic surface-enhanced Raman scattering (SERS) Nanowave chips were functionalized with DNA probes for single-step DNA detection. Sensing mechanisms were based on hybridization of target sequences and DNA probes, resulting in a distance change between SERS reporters and the Nanowave chip's gold surface. This distance change resulted in change in SERS intensity, thus indicating the presence and capture of the target sequences. Our techniques were single-step DNA detection techniques. Target sequences were detected by simple delivery of sample solutions onto DNA probe-functionalized Nanowave chips and SERS signals were measured after 1h - 2h incubation. Target sequence labeling or washing to remove unreacted components was not required, making the techniques simple, easy-to-use, and cost effective. The usefulness of the techniques for medical diagnostics was illustrated by the detection of genetic biomarkers for respiratory viral infection and of dengue virus 4 DNA.
Mining dynamic noteworthy functions in software execution sequences
Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong
2017-01-01
As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely. PMID:28278276
Parikh, Rohan C; Du, Xianglin L; Robert, Morgan O; Lairson, David R
2017-01-01
Treatment patterns for metastatic colorectal cancer (mCRC) patients have changed considerably over the last decade with the introduction of new chemotherapies and targeted biologics. These treatments are often administered in various sequences with limited evidence regarding their cost-effectiveness. To conduct a pharmacoeconomic evaluation of commonly administered treatment sequences among elderly mCRC patients. A probabilistic discrete event simulation model assuming Weibull distribution was developed to evaluate the cost-effectiveness of the following common treatment sequences: (a) first-line oxaliplatin/irinotecan followed by second-line oxaliplatin/irinotecan + bevacizumab (OI-OIB); (b) first-line oxaliplatin/irinotecan + bevacizumab followed by second-line oxaliplatin/irinotecan + bevacizumab (OIB-OIB); (c) OI-OIB followed by a third-line targeted biologic (OI-OIB-TB); and (d) OIB-OIB followed by a third-line targeted biologic (OIB-OIB-TB). Input parameters for the model were primarily obtained from the Surveillance, Epidemiology, and End Results-Medicare linked dataset for incident mCRC patients aged 65 years and older diagnosed from January 2004 through December 2009. A probabilistic sensitivity analysis was performed to account for parameter uncertainty. Costs (2014 U.S. dollars) and effectiveness were discounted at an annual rate of 3%. In the base case analyses, at the willingness-to-pay (WTP) threshold of $100,000/quality-adjusted life-year (QALY) gained, the treatment sequence OIB-OIB (vs. OI-OIB) was not cost-effective with an incremental cost-effectiveness ratio (ICER) per patient of $119,007/QALY; OI-OIB-TB (vs. OIB-OIB) was dominated; and OIB-OIB-TB (vs. OIB-OIB) was not cost-effective with an ICER of $405,857/QALY. Results similar to the base case analysis were obtained assuming log-normal distribution. Cost-effectiveness acceptability curves derived from a probabilistic sensitivity analysis showed that at a WTP of $100,000/QALY gained, sequence OI-OIB was 34% cost-effective, followed by OIB-OIB (31%), OI-OIB-TB (20%), and OIB-OIB-TB (15%). Overall, survival increases marginally with the addition of targeted biologics, such as bevacizumab, at first line and third line at substantial costs. Treatment sequences with bevacizumab at first line and targeted biologics at third line may not be cost-effective at the commonly used threshold of $100,000/QALY gained, but a marginal decrease in the cost of bevacizumab may make treatment sequences with first-line bevacizumab cost-effective. Future economic evaluations should validate the study results using parameters from ongoing clinical trials. This study was supported in part by a grant from the Agency for Healthcare Research and Quality (R01-HS018956) and in part by a grant from the Cancer Prevention and Research Institute of Texas (RP130051), which were obtained by Du. The authors report no conflicts of interest. Study concept and design were primarily contributed by Parikh, along with the other authors. All authors participated in data collection, and Parikh took the lead in data interpretation and analysis, along with Lairson and Morgan, with assistance from Du. The manuscript was written primarily by Parikh, along with Lairson, Morgan, and Du, and revised by Parikh.
Huang, Xuan; Zheng, Jing; Chen, Min; Zhao, Yangyu; Zhang, Chunlei; Liu, Lifu; Xie, Weiwei; Shi, Shuqiong; Wei, Yuan; Lei, Dongzhu; Xu, Chenming; Wu, Qichang; Guo, Xiaoling; Shi, Xiaomei; Zhou, Yi; Liu, Qiufang; Gao, Ya; Jiang, Fuman; Zhang, Hongyun; Su, Fengxia; Ge, Huijuan; Li, Xuchao; Pan, Xiaoyu; Chen, Shengpei; Chen, Fang; Fang, Qun; Jiang, Hui; Lau, Tze Kin; Wang, Wei
2014-04-01
The objective of this study is to assess the performance of noninvasive prenatal testing for trisomies 21 and 18 on the basis of massively parallel sequencing of cell-free DNA from maternal plasma in twin pregnancies. A double-blind study was performed over 12 months. A total of 189 pregnant women carrying twins were recruited from seven hospitals. Maternal plasma DNA sequencing was performed to detect trisomies 21 and 18. The fetal karyotype was used as gold standard to estimate the sensitivity and specificity of sequencing-based noninvasive prenatal test. There were nine cases of trisomy 21 and two cases of trisomy 18 confirmed by karyotyping. Plasma DNA sequencing correctly identified nine cases of trisomy 21 and one case of trisomy 18. The discordant case of trisomy 18 was an unusual case of monozygotic twin with discordant fetal karyotype (one normal and the other trisomy 18). The sensitivity and specificity of maternal plasma DNA sequencing for fetal trisomy 21 were both 100% and for fetal trisomy 18 were 50% and 100%, respectively. Our study further supported that sequencing-based noninvasive prenatal testing of trisomy 21 in twin pregnancies could be achieved with a high accuracy, which could effectively avoid almost 95% of invasive prenatal diagnosis procedures. © 2013 John Wiley & Sons, Ltd.
Intraspecific variation in Cryptocaryon irritans.
Diggles, B K; Adlard, R D
1997-01-01
Intraspecific variation in the ciliate Cryptocaryon irritans was examined using sequences of the first internal transcribed spacer region (ITS-1) of ribosomal DNA (rDNA) combined with developmental and morphological characters. Amplified rDNA sequences consisting of 151 bases of the flanking 18 S and 5.8 S regions, and the entire ITS-1 region (169 or 170 bases), were determined and compared for 16 isolates of C. irritans from Australia, Israel and the USA. There was one variable base between isolates in the 18 S region and 11 variable bases in the ITS-1 region. Despite their similar morphology, significant sequence variation (4.1% divergence) and developmental differences indicate that Australian C. irritans isolates from estuarine (Moreton Bay) and coral reef (Heron Island) environments are distinct. The Heron Island isolate was genetically closer to morphologically dissimilar isolates from Israel (1.8% divergence) and the USA (2.3% divergence) than it was to the Moreton Bay isolates. Three isolates maintained in our laboratory since February 1994 differed in sequence from earlier laboratory isolates (2.9% to 3.5% divergence), even though all were similar morphologically and originated from the same source. During this time the sequence of the isolates from wild fish in Moreton Bay remained unchanged. These genetic differences indicate the existence of a founder effect in laboratory populations of C. irritans. The genetic variation found here, combined with known morphological and developmental differences, is used to characterise four strains of C. irritans.
Experimental Influences in the Accurate Measurement of Cartilage Thickness in MRI.
Wang, Nian; Badar, Farid; Xia, Yang
2018-01-01
Objective To study the experimental influences to the measurement of cartilage thickness by magnetic resonance imaging (MRI). Design The complete thicknesses of healthy and trypsin-degraded cartilage were measured at high-resolution MRI under different conditions, using two intensity-based imaging sequences (ultra-short echo [UTE] and multislice-multiecho [MSME]) and 3 quantitative relaxation imaging sequences (T 1 , T 2 , and T 1 ρ). Other variables included different orientations in the magnet, 2 soaking solutions (saline and phosphate buffered saline [PBS]), and external loading. Results With cartilage soaked in saline, UTE and T 1 methods yielded complete and consistent measurement of cartilage thickness, while the thickness measurement by T 2 , T 1 ρ, and MSME methods were orientation dependent. The effect of external loading on cartilage thickness is also sequence and orientation dependent. All variations in cartilage thickness in MRI could be eliminated with the use of a 100 mM PBS or imaged by UTE sequence. Conclusions The appearance of articular cartilage and the measurement accuracy of cartilage thickness in MRI can be influenced by a number of experimental factors in ex vivo MRI, from the use of various pulse sequences and soaking solutions to the health of the tissue. T 2 -based imaging sequence, both proton-intensity sequence and quantitative relaxation sequence, similarly produced the largest variations. With adequate resolution, the accurate measurement of whole cartilage tissue in clinical MRI could be utilized to detect differences between healthy and osteoarthritic cartilage after compression.
Nanopore sequencing in microgravity
McIntyre, Alexa B R; Rizzardi, Lindsay; Yu, Angela M; Alexander, Noah; Rosen, Gail L; Botkin, Douglas J; Stahl, Sarah E; John, Kristen K; Castro-Wallace, Sarah L; McGrath, Ken; Burton, Aaron S; Feinberg, Andrew P; Mason, Christopher E
2016-01-01
Rapid DNA sequencing and analysis has been a long-sought goal in remote research and point-of-care medicine. In microgravity, DNA sequencing can facilitate novel astrobiological research and close monitoring of crew health, but spaceflight places stringent restrictions on the mass and volume of instruments, crew operation time, and instrument functionality. The recent emergence of portable, nanopore-based tools with streamlined sample preparation protocols finally enables DNA sequencing on missions in microgravity. As a first step toward sequencing in space and aboard the International Space Station (ISS), we tested the Oxford Nanopore Technologies MinION during a parabolic flight to understand the effects of variable gravity on the instrument and data. In a successful proof-of-principle experiment, we found that the instrument generated DNA reads over the course of the flight, including the first ever sequenced in microgravity, and additional reads measured after the flight concluded its parabolas. Here we detail modifications to the sample-loading procedures to facilitate nanopore sequencing aboard the ISS and in other microgravity environments. We also evaluate existing analysis methods and outline two new approaches, the first based on a wave-fingerprint method and the second on entropy signal mapping. Computationally light analysis methods offer the potential for in situ species identification, but are limited by the error profiles (stays, skips, and mismatches) of older nanopore data. Higher accuracies attainable with modified sample processing methods and the latest version of flow cells will further enable the use of nanopore sequencers for diagnostics and research in space. PMID:28725742
Ochoa, David; García-Gutiérrez, Ponciano; Juan, David; Valencia, Alfonso; Pazos, Florencio
2013-01-27
A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.
Kheiri, Ahmed; Keedwell, Ed
2017-01-01
Operations research is a well-established field that uses computational systems to support decisions in business and public life. Good solutions to operations research problems can make a large difference to the efficient running of businesses and organisations and so the field often searches for new methods to improve these solutions. The high school timetabling problem is an example of an operations research problem and is a challenging task which requires assigning events and resources to time slots subject to a set of constraints. In this article, a new sequence-based selection hyper-heuristic is presented that produces excellent results on a suite of high school timetabling problems. In this study, we present an easy-to-implement, easy-to-maintain, and effective sequence-based selection hyper-heuristic to solve high school timetabling problems using a benchmark of unified real-world instances collected from different countries. We show that with sequence-based methods, it is possible to discover new best known solutions for a number of the problems in the timetabling domain. Through this investigation, the usefulness of sequence-based selection hyper-heuristics has been demonstrated and the capability of these methods has been shown to exceed the state of the art.
Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches.
Yahr, Rebecca; Schoch, Conrad L; Dentinger, Bryn T M
2016-09-05
The fungal kingdom is a hyperdiverse group of multicellular eukaryotes with profound impacts on human society and ecosystem function. The challenge of documenting and describing fungal diversity is exacerbated by their typically cryptic nature, their ability to produce seemingly unrelated morphologies from a single individual and their similarity in appearance to distantly related taxa. This multiplicity of hurdles resulted in the early adoption of DNA-based comparisons to study fungal diversity, including linking curated DNA sequence data to expertly identified voucher specimens. DNA-barcoding approaches in fungi were first applied in specimen-based studies for identification and discovery of taxonomic diversity, but are now widely deployed for community characterization based on sequencing of environmental samples. Collectively, fungal barcoding approaches have yielded important advances across biological scales and research applications, from taxonomic, ecological, industrial and health perspectives. A major outstanding issue is the growing problem of 'sequences without names' that are somewhat uncoupled from the traditional framework of fungal classification based on morphology and preserved specimens. This review summarizes some of the most significant impacts of fungal barcoding, its limitations, and progress towards the challenge of effective utilization of the exponentially growing volume of data gathered from high-throughput sequencing technologies.This article is part of the themed issue 'From DNA barcodes to biomes'. © 2016 The Authors.
Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus
2015-03-01
The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.
Using ProMED-Mail and MedWorm blogs for cross-domain pattern analysis in epidemic intelligence.
Stewart, Avaré; Denecke, Kerstin
2010-01-01
In this work we motivate the use of medical blog user generated content for gathering facts about disease reporting events to support biosurveillance investigation. Given the characteristics of blogs, the extraction of such events is made more difficult due to noise and data abundance. We address the problem of automatically inferring disease reporting event extraction patterns in this more noisy setting. The sublanguage used in outbreak reports is exploited to align with the sequences of disease reporting sentences in blogs. Based our Cross Domain Pattern Analysis Framework, experimental results show that Phase-Level sequences tend to produce more overlap across the domains than Word-Level sequences. The cross domain alignment process is effective at filtering noisy sequences from blogs and extracting good candidate sequence patterns from an abundance of text.
The Impact of Transcription Writing Interventions for First-Grade Students
Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie; Kim, Young-Suk Grace
2016-01-01
We examined the effects of transcription instruction for students in first grade. Students in the lowest 70% of the participating schools were selected for the study. These 81 students were randomly assigned to: (a) spelling instruction, (b) handwriting instruction, (c) combination spelling and handwriting instruction, or (d) no intervention. Intervention was provided in small groups of 4 students, 25 min a day, 4 days a week for 8 weeks. Students in the spelling condition outperformed the control group on spelling measures with moderate effect sizes noted on curriculum-based writing measures (e.g., correct word sequence; g range = 0.34 to 0.68). Students in the handwriting condition outperformed the control group on correct word sequences with small to moderate effects on other handwriting and writing measures (g range = 0.31 to 0.71). Students in the combined condition outperformed the control group on correct word sequences with a small effect on total words written (g range = 0.39 to 0.84). PMID:28989267
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.
Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C
2018-06-01
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan
2016-01-01
Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.
PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics
2012-01-01
Background The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. Description With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors. Conclusion As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop. PMID:22712730
High dynamic range image acquisition based on multiplex cameras
NASA Astrophysics Data System (ADS)
Zeng, Hairui; Sun, Huayan; Zhang, Tinghua
2018-03-01
High dynamic image is an important technology of photoelectric information acquisition, providing higher dynamic range and more image details, and it can better reflect the real environment, light and color information. Currently, the method of high dynamic range image synthesis based on different exposure image sequences cannot adapt to the dynamic scene. It fails to overcome the effects of moving targets, resulting in the phenomenon of ghost. Therefore, a new high dynamic range image acquisition method based on multiplex cameras system was proposed. Firstly, different exposure images sequences were captured with the camera array, using the method of derivative optical flow based on color gradient to get the deviation between images, and aligned the images. Then, the high dynamic range image fusion weighting function was established by combination of inverse camera response function and deviation between images, and was applied to generated a high dynamic range image. The experiments show that the proposed method can effectively obtain high dynamic images in dynamic scene, and achieves good results.
Zhao, Jiaduo; Gong, Weiguo; Tang, Yuzhen; Li, Weihong
2016-01-20
In this paper, we propose an effective human and nonhuman pyroelectric infrared (PIR) signal recognition method to reduce PIR detector false alarms. First, using the mathematical model of the PIR detector, we analyze the physical characteristics of the human and nonhuman PIR signals; second, based on the analysis results, we propose an empirical mode decomposition (EMD)-based symbolic dynamic analysis method for the recognition of human and nonhuman PIR signals. In the proposed method, first, we extract the detailed features of a PIR signal into five symbol sequences using an EMD-based symbolization method, then, we generate five feature descriptors for each PIR signal through constructing five probabilistic finite state automata with the symbol sequences. Finally, we use a weighted voting classification strategy to classify the PIR signals with their feature descriptors. Comparative experiments show that the proposed method can effectively classify the human and nonhuman PIR signals and reduce PIR detector's false alarms.
Panda, Rakhi; Fiedler, Katherine L; Cho, Chung Y; Cheng, Raymond; Stutts, Whitney L; Jackson, Lauren S; Garber, Eric A E
2015-12-09
The effectiveness of a proline endopeptidase (PEP) in hydrolyzing gluten and its putative immunopathogenic sequences was examined using antibody-based methods and mass spectrometry (MS). Based on the results of the antibody-based methods, fermentation of wheat gluten containing sorghum beer resulted in a reduction in the detectable gluten concentration. The addition of PEP further reduced the gluten concentration. Only one sandwich ELISA was able to detect the apparent low levels of gluten present in the beers. A competitive ELISA using a pepsin-trypsin hydrolysate calibrant was unreliable because the peptide profiles of the beers were inconsistent with that of the hydrolysate calibrant. Analysis by MS indicated that PEP enhanced the loss of a fragment of an immunopathogenic 33-mer peptide in the beer. However, Western blot results indicated partial resistance of the high molecular weight (HMW) glutenins to the action of PEP, questioning the ability of PEP in digesting all immunopathogenic sequences present in gluten.
Malhotra, Karan; Noor, M Omair; Krull, Ulrich J
2018-05-29
Diagnostic technology that makes use of paper platforms in conjunction with the ubiquitous availability of digital cameras in cellular telephones and personal assistive devices offers opportunities for development of bioassays that are cost effective and widely distributed. Assays that operate effectively in aqueous solution require further development for implementation in paper substrates, overcoming issues associated with surface interactions on a matrix that offers a large surface-to-volume ratio and constraints on convective mixing. This report presents and compares two related methods for determination of oligonucleotides that serve as indicators of cystic fibrosis, differentiating between the normal wild-type sequence, and a mutant-type sequence that has a 3-base replacement. The transduction strategy operates by selective hybridization of oligonucleotide probes that are conjugated to fluorescent quantum dots, where hybridization of target sequences causes a molecular fluorophore to approach the quantum dot and become emissive through fluorescence resonance energy transfer. Detection can rely on hybridization of a target that is labelled with Cy3 fluorophore, or in the presence of an unlabelled target when a sandwich assay format is implemented with a labelled reporter oligonucleotide. Selectivity to determine the presence of mismatched sequences involves appropriate selection of nucleotide sequences to set melt temperatures, in conjunction with control of stringency conditions using formamide as a chaotrope. It was determined that both direct and sandwich assays on paper substrates are able to distinguish between wild-type and mutant-type samples.
Alam, Nuhu; Kim, Jeong Hwa; Shim, Mi Ja; Lee, U Youn
2010-01-01
This study evaluated the optimal vegetative growth conditions and molecular phylogenetic relationships of eleven strains of Agrocybe cylindracea collected from different ecological regions of Korea, China and Taiwan. The optimal temperature and pH for mycelial growth were observed at 25℃ and 6. Potato dextrose agar and Hennerberg were the favorable media for vegetative growth, whereas glucose tryptone was unfavorable. Dextrin, maltose, and fructose were the most effective carbon sources. The most suitable nitrogen sources were arginine and glycine, whereas methionine, alanine, histidine, and urea were least effective for the mycelial propagation of A. cylindracea. The internal transcribed spacer (ITS) regions of rDNA were amplified using PCR. The sequence of ITS2 was more variable than that of ITS1, while the 5.8S sequences were identical. The reciprocal homologies of the ITS sequences ranged from 98 to 100%. The strains were also analyzed by random amplification of polymorphic DNA (RAPD) using 20 arbitrary primers. Fifteen primers efficiently amplified the genomic DNA. The average number of polymorphic bands observed per primer was 3.8. The numbers of amplified bands varied based on the primers and strains, with polymorphic fragments ranging from 0.1 to 2.9 kb. The results of RAPD analysis were similar to the ITS region sequences. The results revealed that RAPD and ITS techniques were well suited for detecting the genetic diversity of all A. cylindracea strains tested. PMID:23956633
Genetic risk prediction using a spatial autoregressive model with adaptive lasso.
Wen, Yalu; Shen, Xiaoxi; Lu, Qing
2018-05-31
With rapidly evolving high-throughput technologies, studies are being initiated to accelerate the process toward precision medicine. The collection of the vast amounts of sequencing data provides us with great opportunities to systematically study the role of a deep catalog of sequencing variants in risk prediction. Nevertheless, the massive amount of noise signals and low frequencies of rare variants in sequencing data pose great analytical challenges on risk prediction modeling. Motivated by the development in spatial statistics, we propose a spatial autoregressive model with adaptive lasso (SARAL) for risk prediction modeling using high-dimensional sequencing data. The SARAL is a set-based approach, and thus, it reduces the data dimension and accumulates genetic effects within a single-nucleotide variant (SNV) set. Moreover, it allows different SNV sets having various magnitudes and directions of effect sizes, which reflects the nature of complex diseases. With the adaptive lasso implemented, SARAL can shrink the effects of noise SNV sets to be zero and, thus, further improve prediction accuracy. Through simulation studies, we demonstrate that, overall, SARAL is comparable to, if not better than, the genomic best linear unbiased prediction method. The method is further illustrated by an application to the sequencing data from the Alzheimer's Disease Neuroimaging Initiative. Copyright © 2018 John Wiley & Sons, Ltd.
Sequence search on a supercomputer.
Gotoh, O; Tagashira, Y
1986-01-10
A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.
Specific minor groove solvation is a crucial determinant of DNA binding site recognition
Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.
2014-01-01
The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976
Chakrapani, Sunil Kishore; Barnard, Daniel J; Dayal, Vinay
2016-05-01
This paper presents the study of influence of laminate sequence and fabric type on the baseline acoustic nonlinearity of fiber-reinforced composites. Nonlinear elastic wave techniques are increasingly becoming popular in detecting damage in composite materials. It was earlier observed by the authors that the non-classical nonlinear response of fiber-reinforced composite is influenced by the fiber orientation [Chakrapani, Barnard, and Dayal, J. Acoust. Soc. Am. 137(2), 617-624 (2015)]. The current study expands this effort to investigate the effect of laminate sequence and fabric type on the non-classical nonlinear response. Two hypotheses were developed using the previous results, and the theory of interlaminar stresses to investigate the influence of laminate sequence and fabric type. Each hypothesis was tested by capturing the nonlinear response by performing nonlinear resonance spectroscopy and measuring frequency shifts, loss factors, and higher harmonics. It was observed that the laminate sequence can either increase or decrease the nonlinear response based on the stacking sequence. Similarly, tests were performed to compare unidirectional fabric and woven fabric and it was observed that woven fabric exhibited a lower nonlinear response compared to the unidirectional fabric. Conjectures based on the matrix properties and interlaminar stresses were used in an attempt to explain the observed nonlinear responses for different configurations.
Phylo-VISTA: Interactive visualization of multiple DNA sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.
The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. Results: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a frameworkmore » based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. Availability: Phylo-VISTA is available at http://www-gsd.lbl. gov/phylovista. It requires an Internet browser with Java Plugin 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu« less
Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun
2016-03-01
The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.
Pham, Nikki T.; Wei, Tong; Schackwitz, Wendy S.; Lipzen, Anna M.; Duong, Phat Q.; Jones, Kyle C.; Ruan, Deling; Bauer, Diane; Peng, Yi; Schmutz, Jeremy
2017-01-01
The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake (Oryza sativa ssp japonica), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. PMID:28576844
Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes
Liu, Kuan-Liang; Porras-Alfaro, Andrea; Eichorst, Stephanie A.
2012-01-01
Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp). PMID:22194300
Sastre, Natalia; Ravera, Ivan; Villanueva, Sergio; Altet, Laura; Bardagí, Mar; Sánchez, Armand; Francino, Olga; Ferrer, Lluís
2012-12-01
The historical classification of Demodex mites has been based on their hosts and morphological features. Genome sequencing has proved to be a very effective taxonomic tool in phylogenetic studies and has been applied in the classification of Demodex. Mitochondrial 16S rDNA has been demonstrated to be an especially useful marker to establish phylogenetic relationships. To amplify and sequence a segment of the mitochondrial 16S rDNA from Demodex canis and Demodex injai, as well as from the short-bodied mite called, unofficially, D. cornei and to determine their genetic proximity. Demodex mites were examined microscopically and classified as Demodex folliculorum (one sample), D. canis (four samples), D. injai (two samples) or the short-bodied species D. cornei (three samples). DNA was extracted, and a 338 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of the four D. canis mites were identical and shared 99.6 and 97.3% identity with two D. canis sequences available at GenBank. The sequences of the D. cornei isolates were identical and showed 97.8, 98.2 and 99.6% identity with the D. canis isolates. The sequences of the two D. injai isolates were also identical and showed 76.6% identity with the D. canis sequence. Demodex canis and D. injai are two different species, with a genetic distance of 23.3%. It would seem that the short-bodied Demodex mite D. cornei is a morphological variant of D. canis. © 2012 The Authors. Veterinary Dermatology © 2012 ESVD and ACVD.
Wuriyanghan, Hada; Falk, Bryce W.
2013-01-01
The potato/tomato psyllid, Bactericera cockerelli (B. cockerelli), is an important plant pest and the vector of the phloem-limited bacterium Candidatus Liberibacter psyllaurous (solanacearum), which is associated with the zebra chip disease of potatoes. Previously, we reported induction of RNA interference effects in B. cockerelli via in vitro-prepared dsRNA/siRNAs after intrathoracic injection, and after feeding of artificial diets containing these effector RNAs. In order to deliver RNAi effectors via plant hosts and to rapidly identify effective target sequences in plant-feeding B. cockerelli, here we developed a plant virus vector-based in planta system for evaluating candidate sequences. We show that recombinant Tobacco mosaic virus (TMV) containing B. cockerelli sequences can efficiently infect and generate small interfering RNAs in tomato (Solanum lycopersicum), tomatillo (Physalis philadelphica) and tobacco (Nicotiana tabacum) plants, and more importantly delivery of interfering sequences via TMV induces RNAi effects, as measured by actin and V-ATPase mRNA reductions, in B. cockerelli feeding on these plants. RNAi effects were primarily detected in the B. cockerelli guts. In contrast to our results with TMV, recombinant Potato virus X (PVX) and Tobacco rattle virus (TRV) did not give robust infections in all plants and did not induce detectable RNAi effects in B. cockerelli. The greatest RNA interference effects were observed when B. cockerelli nymphs were allowed to feed on leaf discs collected from inoculated or lower expanded leaves from corresponding TMV-infected plants. Tomatillo plants infected with recombinant TMV containing B. cockerelli actin or V-ATPase sequences also showed phenotypic effects resulting in decreased B. cockerelli progeny production as compared to plants infected by recombinant TMV containing GFP. These results showed that RNAi effects can be achieved in plants against the phloem feeder, B. cockerelli, and the TMV-plant system will provide a faster and more convenient method for screening of suitable RNAi target sequences in planta. PMID:23824081
GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii).
Xu, Zhenzhen; Liu, Jing; Ni, Wanchao; Peng, Zhen; Guo, Yue; Ye, Wuwei; Huang, Fang; Zhang, Xianggui; Xu, Peng; Guo, Qi; Shen, Xinlian; Du, Jianchang
2017-01-01
Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1 , 12 Mutators , 435 PIF-Harbingers , 275 CACTAs and 14 Helitrons . Meanwhile, the web-based sequence browsing, searching, downloading and blast tool were implemented to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. GrTEdb provides resources and information related with TEs in G. raimondii , and will facilitate gene and genome analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes in Gossypium species. http://www.grtedb.org/. © The Author(s) 2017. Published by Oxford University Press.
An integrated semiconductor device enabling non-optical genome sequencing.
Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James
2011-07-20
The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
Gatenby, J. Christopher; Gore, John C.; Tong, Frank
2012-01-01
High-resolution functional MRI is a leading application for very high field (7 Tesla) human MR imaging. Though higher field strengths promise improvements in signal-to-noise ratios (SNR) and BOLD contrast relative to fMRI at 3 Tesla, these benefits may be partially offset by accompanying increases in geometric distortion and other off-resonance effects. Such effects may be especially pronounced with the single-shot EPI pulse sequences typically used for fMRI at standard field strengths. As an alternative, one might consider multishot pulse sequences, which may lead to somewhat lower temporal SNR than standard EPI, but which are also often substantially less susceptible to off-resonance effects. Here we consider retinotopic mapping of human visual cortex as a practical test case by which to compare examples of these sequence types for high-resolution fMRI at 7 Tesla. We performed polar angle retinotopic mapping at each of 3 isotropic resolutions (2.0, 1.7, and 1.1 mm) using both accelerated single-shot 2D EPI and accelerated multishot 3D gradient-echo pulse sequences. We found that single-shot EPI indeed led to greater temporal SNR and contrast-to-noise ratios (CNR) than the multishot sequences. However, additional distortion correction in postprocessing was required in order to fully realize these advantages, particularly at higher resolutions. The retinotopic maps produced by both sequence types were qualitatively comparable, and showed equivalent test/retest reliability. Thus, when surface-based analyses are planned, or in other circumstances where geometric distortion is of particular concern, multishot pulse sequences could provide a viable alternative to single-shot EPI. PMID:22514646
Swisher, Jascha D; Sexton, John A; Gatenby, J Christopher; Gore, John C; Tong, Frank
2012-01-01
High-resolution functional MRI is a leading application for very high field (7 Tesla) human MR imaging. Though higher field strengths promise improvements in signal-to-noise ratios (SNR) and BOLD contrast relative to fMRI at 3 Tesla, these benefits may be partially offset by accompanying increases in geometric distortion and other off-resonance effects. Such effects may be especially pronounced with the single-shot EPI pulse sequences typically used for fMRI at standard field strengths. As an alternative, one might consider multishot pulse sequences, which may lead to somewhat lower temporal SNR than standard EPI, but which are also often substantially less susceptible to off-resonance effects. Here we consider retinotopic mapping of human visual cortex as a practical test case by which to compare examples of these sequence types for high-resolution fMRI at 7 Tesla. We performed polar angle retinotopic mapping at each of 3 isotropic resolutions (2.0, 1.7, and 1.1 mm) using both accelerated single-shot 2D EPI and accelerated multishot 3D gradient-echo pulse sequences. We found that single-shot EPI indeed led to greater temporal SNR and contrast-to-noise ratios (CNR) than the multishot sequences. However, additional distortion correction in postprocessing was required in order to fully realize these advantages, particularly at higher resolutions. The retinotopic maps produced by both sequence types were qualitatively comparable, and showed equivalent test/retest reliability. Thus, when surface-based analyses are planned, or in other circumstances where geometric distortion is of particular concern, multishot pulse sequences could provide a viable alternative to single-shot EPI.
High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.
Harrison, Lucas B; Hanson, Nancy D
2017-06-01
Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.
Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C
2012-09-11
Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
A sequence-dependent rigid-base model of DNA
NASA Astrophysics Data System (ADS)
Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.
2013-02-01
A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.
A sequence-dependent rigid-base model of DNA.
Gonzalez, O; Petkevičiūtė, D; Maddocks, J H
2013-02-07
A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can successfully predict the nonlocal changes in the minimum energy configuration of an oligomer that are consequent upon a local change of sequence at the level of a single point mutation.
Constructing DNA Barcode Sets Based on Particle Swarm Optimization.
Wang, Bin; Zheng, Xuedong; Zhou, Shihua; Zhou, Changjun; Wei, Xiaopeng; Zhang, Qiang; Wei, Ziqi
2018-01-01
Following the completion of the human genome project, a large amount of high-throughput bio-data was generated. To analyze these data, massively parallel sequencing, namely next-generation sequencing, was rapidly developed. DNA barcodes are used to identify the ownership between sequences and samples when they are attached at the beginning or end of sequencing reads. Constructing DNA barcode sets provides the candidate DNA barcodes for this application. To increase the accuracy of DNA barcode sets, a particle swarm optimization (PSO) algorithm has been modified and used to construct the DNA barcode sets in this paper. Compared with the extant results, some lower bounds of DNA barcode sets are improved. The results show that the proposed algorithm is effective in constructing DNA barcode sets.
Taylor, James; Tyekucheva, Svitlana; King, David C; Hardison, Ross C; Miller, Webb; Chiaromonte, Francesca
2006-12-01
Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).
Whiley, David M; Jacob, Kevin; Nakos, Jennifer; Bletchly, Cheryl; Nimmo, Graeme R; Nissen, Michael D; Sloots, Theo P
2012-06-01
Numerous real-time PCR assays have been described for detection of the influenza A H275Y alteration. However, the performance of these methods can be undermined by sequence variation in the regions flanking the codon of interest. This is a problem encountered more broadly in microbial diagnostics. In this study, we developed a modification of hybridization probe-based melting curve analysis, whereby primers are used to mask proximal mutations in the sequence targets of hybridization probes, so as to limit the potential for sequence variation to interfere with typing. The approach was applied to the H275Y alteration of the influenza A (H1N1) 2009 strain, as well as a Neisseria gonorrhoeae mutation associated with antimicrobial resistance. Assay performances were assessed using influenza A and N. gonorrhoeae strains characterized by DNA sequencing. The modified hybridization probe-based approach proved successful in limiting the effects of proximal mutations, with the results of melting curve analyses being 100% consistent with the results of DNA sequencing for all influenza A and N. gonorrhoeae strains tested. Notably, these included influenza A and N. gonorrhoeae strains exhibiting additional mutations in hybridization probe targets. Of particular interest was that the H275Y assay correctly typed influenza A strains harbouring a T822C nucleotide substitution, previously shown to interfere with H275Y typing methods. Overall our modified hybridization probe-based approach provides a simple means of circumventing problems caused by sequence variation, and offers improved detection of the influenza A H275Y alteration and potentially other resistance mechanisms.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
Burrow-generated false facies and phantom sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wanless, H.R.; Tagett, M.
Callianassa (=Ophiomorpha) and other burrowers deeply rework shallow marine sequences. Through in-situ reworking, they create false sedimentary facies and stratigraphic sequences. Callianassa's key to effectiveness is that it expels sand and mud from burrow excavations but concentrates coarse material at the base of the burrow complex. Coarse material can be derived by falling into the burrow entrance, by reworking the existing sediment sequence, or by a combination of both. Examples come from shallow marine carbonate environments of south Florida and the Turks and Caicos Islands, British West Indies. Many mudbanks in south Florida are formed as stacks of layered mudstonemore » units 20-100 cm thick. Between events, seagrasses may recolonize, and a burrowing benthic community may repopulate the substrate. The layered mudstone beneath older areas of mudbank flats can gradually be converted to a bioturbated skeletal wackestone by the deep burrowing community. Burrowing also causes mixing of faunal assemblages. On Caicos Bank, an extensive carbonate tidal flat (3-4 m thick) is slowly being transgressed. About 1 m of tidal-flat sequence is eroded at the shoreline. The remaining 2-3 m could be preserved as part of the transgressive sequence. Callianassa burrowing, however, quickly reworks the sequence, replacing tidal-flat sands and muds with marine peloidal and skeletal sediment. Within 100 m of the shoreline, the only evidence of the tidal-flat sequence is a concentration of high-spired gastropods in Calliannassa burrows at the base of the Holocene sequence and a few patches of tidal-flat sediment that burrowers missed. What looks like a basal transgressive lag is in fact a biogenic concentrate from in-situ reworking of a now phantom sequence.« less
2011-01-01
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110
Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos
2011-07-27
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Subjective quality evaluation of low-bit-rate video
NASA Astrophysics Data System (ADS)
Masry, Mark; Hemami, Sheila S.; Osberger, Wilfried M.; Rohaly, Ann M.
2001-06-01
A subjective quality evaluation was performed to qualify vie4wre responses to visual defects that appear in low bit rate video at full and reduced frame rates. The stimuli were eight sequences compressed by three motion compensated encoders - Sorenson Video, H.263+ and a Wavelet based coder - operating at five bit/frame rate combinations. The stimulus sequences exhibited obvious coding artifacts whose nature differed across the three coders. The subjective evaluation was performed using the Single Stimulus Continuos Quality Evaluation method of UTI-R Rec. BT.500-8. Viewers watched concatenated coded test sequences and continuously registered the perceived quality using a slider device. Data form 19 viewers was colleted. An analysis of their responses to the presence of various artifacts across the range of possible coding conditions and content is presented. The effects of blockiness and blurriness on perceived quality are examined. The effects of changes in frame rate on perceived quality are found to be related to the nature of the motion in the sequence.
Cumulative Axial and Torsional Fatigue: An Investigation of Load-Type Sequencing Effects
NASA Technical Reports Server (NTRS)
Kalluri, Sreeramesh; Bonacuse, Peter J.
2000-01-01
Cumulative fatigue behavior of a wrought cobalt-base superalloy, Haynes 188 was investigated at 538 C under various single-step sequences of axial and torsional loading conditions. Initially, fully-reversed, axial and torsional fatigue tests were conducted under strain control at 538 C on thin-walled tubular specimens to establish baseline fatigue life relationships. Subsequently, four sequences (axial/axial, torsional/torsional, axial/torsional, and torsional/axial) of two load-level fatigue tests were conducted to characterize both the load-order (high/low) and load-type sequencing effects. For the two load-level tests, summations of life fractions and the remaining fatigue lives at the second load-level were computed by the Miner's Linear Damage Rule (LDR) and a nonlinear Damage Curve Approach (DCA). In general, for all four cases predictions by LDR were unconservative. Predictions by the DCA were within a factor of two of the experimentally observed fatigue lives for a majority of the cumulative axial and torsional fatigue tests.
NASA Astrophysics Data System (ADS)
Wang, Guanxi; Tie, Yun; Qi, Lin
2017-07-01
In this paper, we propose a novel approach based on Depth Maps and compute Multi-Scale Histograms of Oriented Gradient (MSHOG) from sequences of depth maps to recognize actions. Each depth frame in a depth video sequence is projected onto three orthogonal Cartesian planes. Under each projection view, the absolute difference between two consecutive projected maps is accumulated through a depth video sequence to form a Depth Map, which is called Depth Motion Trail Images (DMTI). The MSHOG is then computed from the Depth Maps for the representation of an action. In addition, we apply L2-Regularized Collaborative Representation (L2-CRC) to classify actions. We evaluate the proposed approach on MSR Action3D dataset and MSRGesture3D dataset. Promising experimental result demonstrates the effectiveness of our proposed method.
On construction of stochastic genetic networks based on gene expression sequences.
Ching, Wai-Ki; Ng, Michael M; Fung, Eric S; Akutsu, Tatsuya
2005-08-01
Reconstruction of genetic regulatory networks from time series data of gene expression patterns is an important research topic in bioinformatics. Probabilistic Boolean Networks (PBNs) have been proposed as an effective model for gene regulatory networks. PBNs are able to cope with uncertainty, corporate rule-based dependencies between genes and discover the sensitivity of genes in their interactions with other genes. However, PBNs are unlikely to use directly in practice because of huge amount of computational cost for obtaining predictors and their corresponding probabilities. In this paper, we propose a multivariate Markov model for approximating PBNs and describing the dynamics of a genetic network for gene expression sequences. The main contribution of the new model is to preserve the strength of PBNs and reduce the complexity of the networks. The number of parameters of our proposed model is O(n2) where n is the number of genes involved. We also develop efficient estimation methods for solving the model parameters. Numerical examples on synthetic data sets and practical yeast data sequences are given to demonstrate the effectiveness of the proposed model.
HomPPI: a class of sequence homology based protein-protein interface prediction methods
2011-01-01
Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
Repair of DNA damage caused by cytosine deamination in mitochondrial DNA of forensic case samples.
Gorden, Erin M; Sturk-Andreaggi, Kimberly; Marshall, Charla
2018-05-01
DNA sequence damage from cytosine deamination is well documented in degraded samples, such as those from ancient and forensic contexts. This study examined the effect of a DNA repair treatment on mitochondrial DNA (mtDNA) from aged and degraded skeletal samples. DNA extracts from 21 non-probative, degraded skeletal samples (aged 50-70 years) were utilized for the analysis. A portion of each sample extract was subjected to DNA repair using a commercial repair kit, the New England BioLabs' NEBNext FFPE DNA Repair Kit (Ipswich, MA). MtDNA was enriched using PCR and targeted capture in a side-by-side experiment of untreated and repaired DNA. Sequencing was performed using both traditional (Sanger-type; STS) and next-generation sequencing (NGS) methods Although cytosine deamination was evident in the mtDNA sequence data, the observed level of damaged bases varied by sequencing method as well as by enrichment type. The STS PCR amplicon data did not show evidence of cytosine deamination that could be distinguished from background signal in either the untreated or repaired sample set. However, the same PCR amplicons showed 850 C → T/G → A substitutions consistent with cytosine deamination with variant frequencies (VFs) of up to 25% when sequenced using NGS methods The occurrence of base misincorporation due to cytosine deamination was reduced by 98% (to 10) in the NGS amplicon data after repair. The NGS capture data indicated low levels (1-2%) of cytosine deamination in mtDNA fragments that was effectively mitigated by DNA repair. The observed difference in the level of cytosine deamination between the PCR and capture enrichment methods can be attributed to the greater propensity for stochastic effects from the PCR enrichment technique employed (e.g., low template input, increased PCR cycles). Altogether these results indicate that DNA repair may be required when sequencing PCR-amplified DNA from degraded forensic case samples with NGS methods. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Dynamic updating atlas for heart segmentation with a nonlinear field-based model.
Cai, Ken; Yang, Rongqian; Yue, Hongwei; Li, Lihua; Ou, Shanxing; Liu, Feng
2017-09-01
Segmentation of cardiac computed tomography (CT) images is an effective method for assessing the dynamic function of the heart and lungs. In the atlas-based heart segmentation approach, the quality of segmentation usually relies upon atlas images, and the selection of those reference images is a key step. The optimal goal in this selection process is to have the reference images as close to the target image as possible. This study proposes an atlas dynamic update algorithm using a scheme of nonlinear deformation field. The proposed method is based on the features among double-source CT (DSCT) slices. The extraction of these features will form a base to construct an average model and the created reference atlas image is updated during the registration process. A nonlinear field-based model was used to effectively implement a 4D cardiac segmentation. The proposed segmentation framework was validated with 14 4D cardiac CT sequences. The algorithm achieved an acceptable accuracy (1.0-2.8 mm). Our proposed method that combines a nonlinear field-based model and dynamic updating atlas strategies can provide an effective and accurate way for whole heart segmentation. The success of the proposed method largely relies on the effective use of the prior knowledge of the atlas and the similarity explored among the to-be-segmented DSCT sequences. Copyright © 2016 John Wiley & Sons, Ltd.
Church, George M.; Kieffer-Higgins, Stephen
1992-01-01
This invention features vectors and a method for sequencing DNA. The method includes the steps of: a) ligating the DNA into a vector comprising a tag sequence, the tag sequence includes at least 15 bases, wherein the tag sequence will not hybridize to the DNA under stringent hybridization conditions and is unique in the vector, to form a hybrid vector, b) treating the hybrid vector in a plurality of vessels to produce fragments comprising the tag sequence, wherein the fragments differ in length and terminate at a fixed known base or bases, wherein the fixed known base or bases differs in each vessel, c) separating the fragments from each vessel according to their size, d) hybridizing the fragments with an oligonucleotide able to hybridize specifically with the tag sequence, and e) detecting the pattern of hybridization of the tag sequence, wherein the pattern reflects the nucleotide sequence of the DNA.
Using high throughput sequencing to explore the biodiversity in oral bacterial communities.
Diaz, P I; Dupuy, A K; Abusleme, L; Reese, B; Obergfell, C; Choquette, L; Dongari-Bagtzoglou, A; Peterson, D E; Terzi, E; Strausbaugh, L D
2012-06-01
High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. © 2012 John Wiley & Sons A/S.
Przybilski, Rita; Hammann, Christian
2007-01-01
Tertiary interacting elements are important features of functional RNA molecules, for example, in all small nucleolytic ribozymes. The recent crystal structure of a tertiary stabilized type I hammerhead ribozyme revealed a conventional Watson–Crick base pair in the catalytic core, formed between nucleotides C3 and G8. We show that any Watson–Crick base pair between these positions retains cleavage competence in two type III ribozymes. In the Arabidopsis thaliana sequence, only moderate differences in cleavage rates are observed for the different base pairs, while the peach latent mosaic viroid (PLMVd) ribozyme exhibits a preference for a pyrimidine at position 3 and a purine at position 8. To understand these differences, we created a series of chimeric ribozymes in which we swapped sequence elements that surround the catalytic core. The kinetic characterization of the resulting ribozymes revealed that the tertiary interacting loop sequences of the PLMVd ribozyme are sufficient to induce the preference for Y3–R8 base pairs in the A. thaliana hammerhead ribozyme. In contrast to this, only when the entire stem–loops I and II of the A. thaliana sequences are grafted on the PLMVd ribozyme is any Watson–Crick base pair similarly tolerated. The data provide evidence for a complex interplay of secondary and tertiary structure elements that lead, mediated by long-range effects, to an individual modulation of the local structure in the catalytic core of different hammerhead ribozymes. PMID:17666711
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won
2014-08-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choi, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B.; Gupta, Neha; Kohane, Isaac S.; Green, Robert C.; Kong, Sek Won
2014-01-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates. PMID:24829188
Clément, Nathalie; Velu, Thierry; Brandenburger, Annick
2002-09-01
The production of currently available vectors derived from autonomous parvoviruses requires the expression of capsid proteins in trans, from helper sequences. Cotransfection of a helper plasmid always generates significant amounts of replication-competent virus (RCV) that can be reduced by the integration of helper sequences into a packaging cell line. Although stocks of minute virus of mice (MVM)-based vectors with no detectable RCV could be produced by transfection into packaging cells; the latter appear after one or two rounds of replication, precluding further amplification of the vector stock. Indeed, once RCVs become detectable, they are efficiently amplified and rapidly take over the culture. Theoretically RCV-free vector stocks could be produced if all homology between vector and helper DNA is eliminated, thus preventing homologous recombination. We constructed new vectors based on the structure of spontaneously occurring defective particles of MVM. Based on published observations related to the size of vectors and the sequence of the viral origin of replication, these vectors were modified by the insertion of foreign DNA sequences downstream of the transgene and by the introduction of a consensus NS-1 nick site near the origin of replication to optimize their production. In one of the vectors the inserted fragment of mouse genomic DNA had a synergistic effect with the modified origin of replication in increasing vector production.
Southam, Lorraine; Gilly, Arthur; Süveges, Dániel; Farmaki, Aliki-Eleni; Schwartzentruber, Jeremy; Tachmazidou, Ioanna; Matchan, Angela; Rayner, Nigel W.; Tsafantakis, Emmanouil; Karaleftheri, Maria; Xue, Yali; Dedoussis, George; Zeggini, Eleftheria
2017-01-01
Next-generation association studies can be empowered by sequence-based imputation and by studying founder populations. Here we report ∼9.5 million variants from whole-genome sequencing (WGS) of a Cretan-isolated population, and show enrichment of rare and low-frequency variants with predicted functional consequences. We use a WGS-based imputation approach utilizing 10,422 reference haplotypes to perform genome-wide association analyses and observe 17 genome-wide significant, independent signals, including replicating evidence for association at eight novel low-frequency variant signals. Two novel cardiometabolic associations are at lead variants unique to the founder population sequences: chr16:70790626 (high-density lipoprotein levels beta −1.71 (SE 0.25), P=1.57 × 10−11, effect allele frequency (EAF) 0.006); and rs145556679 (triglycerides levels beta −1.13 (SE 0.17), P=2.53 × 10−11, EAF 0.013). Our findings add empirical support to the contribution of low-frequency variants in complex traits, demonstrate the advantage of including population-specific sequences in imputation panels and exemplify the power gains afforded by population isolates. PMID:28548082
Clustering evolving proteins into homologous families.
Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A
2013-04-08
Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.
You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng
2018-03-07
Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins. The key of this method is to extract not only homology information but also diverse, deep- rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification. The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.
Weight distributions for turbo codes using random and nonrandom permutations
NASA Technical Reports Server (NTRS)
Dolinar, S.; Divsalar, D.
1995-01-01
This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.
Genotype diversity of hepatitis C virus (HCV) in HCV-associated liver disease patients in Indonesia.
Utama, Andi; Tania, Navessa Padma; Dhenni, Rama; Gani, Rino Alvani; Hasan, Irsan; Sanityoso, Andri; Lelosutan, Syafruddin A R; Martamala, Ruswhandi; Lesmana, Laurentius Adrianus; Sulaiman, Ali; Tai, Susan
2010-09-01
Hepatitis C virus (HCV) genotype distribution in Indonesia has been reported. However, the identification of HCV genotype was based on 5'-UTR or NS5B sequence. This study was aimed to observe HCV core sequence variation among HCV-associated liver disease patients in Jakarta, and to analyse the HCV genotype diversity based on the core sequence. Sixty-eight chronic hepatitis (CH), 48 liver cirrhosis (LC) and 34 hepatocellular carcinoma (HCC) were included in this study. HCV core variation was analysed by direct sequencing. Alignment of HCV core sequences demonstrated that the core sequence was relatively varied among the genotype. Indeed, 237 bases of the core sequence could classify the HCV subtype; however, 236 bases failed to differentiate several subtypes. Based on 237 bases of the core sequences, the HCV strains were classified into genotypes 1 (subtypes 1a, 1b and 1c), 2 (subtypes 2a, 2e and 2f) and 3 (subtypes 3a and 3k). The HCV 1b (47.3%) was the most prevalent, followed by subtypes 1c (18.7%), 3k (10.7%), 2a (10.0%), 1a (6.7%), 2e (5.3%), 2f (0.7%) and 3a (0.7%). HCV 1b was the most common in all patients, and the prevalence increased with the severity of liver disease (36.8% in CH, 54.2% in LC and 58.8% in HCC). These results were similar to a previous report based on NS5B sequence analysis. Hepatitis C virus core sequence (237 bases) could identify the HCV subtype and the prevalence of HCV subtype based on core sequence was similar to those based on the NS5B region.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-08-01
RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Pembleton, Luke W; Drayton, Michelle C; Bain, Melissa; Baillie, Rebecca C; Inch, Courtney; Spangenberg, German C; Wang, Junping; Forster, John W; Cogan, Noel O I
2016-05-01
A targeted amplicon-based genotyping-by-sequencing approach has permitted cost-effective and accurate discrimination between ryegrass species (perennial, Italian and inter-species hybrid), and identification of cultivars based on bulked samples. Perennial ryegrass and Italian ryegrass are the most important temperate forage species for global agriculture, and are represented in the commercial pasture seed market by numerous cultivars each composed of multiple highly heterozygous individuals. Previous studies have identified difficulties in the use of morphophysiological criteria to discriminate between these two closely related taxa. Recently, a highly multiplexed single nucleotide polymorphism (SNP)-based genotyping assay has been developed that permits accurate differentiation between both species and cultivars of ryegrasses at the genetic level. This assay has since been further developed into an amplicon-based genotyping-by-sequencing (GBS) approach implemented on a second-generation sequencing platform, allowing accelerated throughput and ca. sixfold reduction in cost. Using the GBS approach, 63 cultivars of perennial, Italian and interspecific hybrid ryegrasses, as well as intergeneric Festulolium hybrids, were genotyped. The genetic relationships between cultivars were interpreted in terms of known breeding histories and indistinct species boundaries within the Lolium genus, as well as suitability of current cultivar registration methodologies. An example of applicability to quality assurance and control (QA/QC) of seed purity is also described. Rapid, low-cost genotypic assays provide new opportunities for breeders to more fully explore genetic diversity within breeding programs, allowing the combination of novel unique genetic backgrounds. Such tools also offer the potential to more accurately define cultivar identities, allowing protection of varieties in the commercial market and supporting processes of cultivar accreditation and quality assurance.
Kowialiewski, Benjamin; Majerus, Steve
2016-01-01
Several models in the verbal domain of short-term memory (STM) consider a dissociation between item and order processing. This view is supported by data demonstrating that different types of time-based interference have a greater effect on memory for the order of to-be-remembered items than on memory for the items themselves. The present study investigated the domain-generality of the item versus serial order dissociation by comparing the differential effects of time-based interfering tasks, such as rhythmic interference and articulatory suppression, on item and order processing in verbal and musical STM domains. In Experiment 1, participants had to maintain sequences of verbal or musical information in STM, followed by a probe sequence, this under different conditions of interference (no-interference, rhythmic interference, articulatory suppression). They were required to decide whether all items of the probe list matched those of the memory list (item condition) or whether the order of the items in the probe sequence matched the order in the memory list (order condition). In Experiment 2, participants performed a serial order probe recognition task for verbal and musical sequences ensuring sequential maintenance processes, under no-interference or rhythmic interference conditions. For Experiment 1, serial order recognition was not significantly more impacted by interfering tasks than was item recognition, this for both verbal and musical domains. For Experiment 2, we observed selective interference of the rhythmic interference condition on both musical and verbal order STM tasks. Overall, the results suggest a similar and selective sensitivity to time-based interference for serial order STM in verbal and musical domains, but only when the STM tasks ensure sequential maintenance processes. PMID:27992565
Gorin, Simon; Kowialiewski, Benjamin; Majerus, Steve
2016-01-01
Several models in the verbal domain of short-term memory (STM) consider a dissociation between item and order processing. This view is supported by data demonstrating that different types of time-based interference have a greater effect on memory for the order of to-be-remembered items than on memory for the items themselves. The present study investigated the domain-generality of the item versus serial order dissociation by comparing the differential effects of time-based interfering tasks, such as rhythmic interference and articulatory suppression, on item and order processing in verbal and musical STM domains. In Experiment 1, participants had to maintain sequences of verbal or musical information in STM, followed by a probe sequence, this under different conditions of interference (no-interference, rhythmic interference, articulatory suppression). They were required to decide whether all items of the probe list matched those of the memory list (item condition) or whether the order of the items in the probe sequence matched the order in the memory list (order condition). In Experiment 2, participants performed a serial order probe recognition task for verbal and musical sequences ensuring sequential maintenance processes, under no-interference or rhythmic interference conditions. For Experiment 1, serial order recognition was not significantly more impacted by interfering tasks than was item recognition, this for both verbal and musical domains. For Experiment 2, we observed selective interference of the rhythmic interference condition on both musical and verbal order STM tasks. Overall, the results suggest a similar and selective sensitivity to time-based interference for serial order STM in verbal and musical domains, but only when the STM tasks ensure sequential maintenance processes.
SHAPE Selection (SHAPES) enrich for RNA structure signal in SHAPE sequencing-based probing data
Poulsen, Line Dahl; Kielpinski, Lukasz Jan; Salama, Sofie R.; Krogh, Anders; Vinther, Jeppe
2015-01-01
Selective 2′ Hydroxyl Acylation analyzed by Primer Extension (SHAPE) is an accurate method for probing of RNA secondary structure. In existing SHAPE methods, the SHAPE probing signal is normalized to a no-reagent control to correct for the background caused by premature termination of the reverse transcriptase. Here, we introduce a SHAPE Selection (SHAPES) reagent, N-propanone isatoic anhydride (NPIA), which retains the ability of SHAPE reagents to accurately probe RNA structure, but also allows covalent coupling between the SHAPES reagent and a biotin molecule. We demonstrate that SHAPES-based selection of cDNA–RNA hybrids on streptavidin beads effectively removes the large majority of background signal present in SHAPE probing data and that sequencing-based SHAPES data contain the same amount of RNA structure data as regular sequencing-based SHAPE data obtained through normalization to a no-reagent control. Moreover, the selection efficiently enriches for probed RNAs, suggesting that the SHAPES strategy will be useful for applications with high-background and low-probing signal such as in vivo RNA structure probing. PMID:25805860
Effects of Notch2 and Notch3 on Cell Proliferation and Apoptosis of Trophoblast Cell Lines.
Zhao, Wei-Xiu; Zhuang, Xu; Huang, Tao-Tao; Feng, Ran; Lin, Jian-Hua
2015-01-01
To investigate the effect of Notch2 and Notch3 on cell proliferation and apoptosis of two trophoblast cell lines, BeWo and JAR. Notch2 and Notch3 expression in BeWo and JAR cells was upregulated or downregulated using lentivirus-mediated overexpression or RNA interference. The effect of Notch2 and Notch3 on cell proliferation was assessed by the CCK-8 assay. The effect of Notch2 and Notch3 on the apoptosis of BeWo and JAR cells was evaluated by flow cytometry using the Annexin V-PE Apoptosis kit. Lentivirus-based overexpression vectors were constructed by cloning the full-length coding sequences of human Notch2 and Notch3 C-terminally tagged with GFP or GFP alone (control) into a lentivirus-based expression vector. Lentivirus-based gene silencing vectors were prepared by cloning small interfering sequences targeting human Notch2 and Notch3 and scrambled control RNA sequence into a lentivirus-based gene knockdown vector. The effect of Notch2 and Notch3 on cell proliferation was assessed by the CCK-8 assay. And the effect of Notch2 and Notch3 on the apoptosis of BeWo and JAR cells was evaluated by flow cytometry using the Annexin V PE Apoptosis kit. We found that the downregulation of Notch2 and Notch3 gene expression in BeWo and JAR cells resulted in an increase in cell proliferation, while upregulation of Notch3 and Notch2 expression led to a decrease in cell proliferation. Moreover, the overexpression of Notch3 and Notch2 in BeWo and JAR cells reduced apoptosis in these trophoblast cell lines, whereas apoptosis was increased in the cells in which the expression of Notch3 and Notch2 was downregulated. Notch2 and Notch3 inhibited both cell proliferation and cell apoptosis in BeWo and JAR trophoblast cell lines.
Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart
2010-07-01
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.
Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart
2010-01-01
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users. PMID:20501601
Okamoto, Hidehiko; Stracke, Henning; Lagemann, Lothar; Pantev, Christo
2010-01-01
The capability of involuntarily tracking certain sound signals during the simultaneous presence of noise is essential in human daily life. Previous studies have demonstrated that top-down auditory focused attention can enhance excitatory and inhibitory neural activity, resulting in sharpening of frequency tuning of auditory neurons. In the present study, we investigated bottom-up driven involuntary neural processing of sound signals in noisy environments by means of magnetoencephalography. We contrasted two sound signal sequencing conditions: "constant sequencing" versus "random sequencing." Based on a pool of 16 different frequencies, either identical (constant sequencing) or pseudorandomly chosen (random sequencing) test frequencies were presented blockwise together with band-eliminated noises to nonattending subjects. The results demonstrated that the auditory evoked fields elicited in the constant sequencing condition were significantly enhanced compared with the random sequencing condition. However, the enhancement was not significantly different between different band-eliminated noise conditions. Thus the present study confirms that by constant sound signal sequencing under nonattentive listening the neural activity in human auditory cortex can be enhanced, but not sharpened. Our results indicate that bottom-up driven involuntary neural processing may mainly amplify excitatory neural networks, but may not effectively enhance inhibitory neural circuits.
Multiplex De Novo Sequencing of Peptide Antibiotics
NASA Astrophysics Data System (ADS)
Mohimani, Hosein; Liu, Wei-Ting; Yang, Yu-Liang; Gaudêncio, Susana P.; Fenical, William; Dorrestein, Pieter C.; Pevzner, Pavel A.
Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is NMR-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic NRPs using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown familiy of cyclic NRPs produced by marine bacteria.
Mohd-Yusoff, Nur Fatihah; Ruperao, Pradeep; Tomoyoshi, Nurain Emylia; Edwards, David; Gresshoff, Peter M.; Biswas, Bandana; Batley, Jacqueline
2015-01-01
Genetic structure can be altered by chemical mutagenesis, which is a common method applied in molecular biology and genetics. Second-generation sequencing provides a platform to reveal base alterations occurring in the whole genome due to mutagenesis. A model legume, Lotus japonicus ecotype Miyakojima, was chemically mutated with alkylating ethyl methanesulfonate (EMS) for the scanning of DNA lesions throughout the genome. Using second-generation sequencing, two individually mutated third-generation progeny (M3, named AM and AS) were sequenced and analyzed to identify single nucleotide polymorphisms and reveal the effects of EMS on nucleotide sequences in these mutant genomes. Single-nucleotide polymorphisms were found in every 208 kb (AS) and 202 kb (AM) with a bias mutation of G/C-to-A/T changes at low percentage. Most mutations were intergenic. The mutation spectrum of the genomes was comparable in their individual chromosomes; however, each mutated genome has unique alterations, which are useful to identify causal mutations for their phenotypic changes. The data obtained demonstrate that whole genomic sequencing is applicable as a high-throughput tool to investigate genomic changes due to mutagenesis. The identification of these single-point mutations will facilitate the identification of phenotypically causative mutations in EMS-mutated germplasm. PMID:25660167
He, Zongxiao; Zhang, Di; Renton, Alan E; Li, Biao; Zhao, Linhai; Wang, Gao T; Goate, Alison M; Mayeux, Richard; Leal, Suzanne M
2017-02-02
Whole-genome and exome sequence data can be cost-effectively generated for the detection of rare-variant (RV) associations in families. Causal variants that aggregate in families usually have larger effect sizes than those found in sporadic cases, so family-based designs can be a more powerful approach than population-based designs. Moreover, some family-based designs are robust to confounding due to population admixture or substructure. We developed a RV extension of the generalized disequilibrium test (GDT) to analyze sequence data obtained from nuclear and extended families. The GDT utilizes genotype differences of all discordant relative pairs to assess associations within a family, and the RV extension combines the single-variant GDT statistic over a genomic region of interest. The RV-GDT has increased power by efficiently incorporating information beyond first-degree relatives and allows for the inclusion of covariates. Using simulated genetic data, we demonstrated that the RV-GDT method has well-controlled type I error rates, even when applied to admixed populations and populations with substructure. It is more powerful than existing family-based RV association methods, particularly for the analysis of extended pedigrees and pedigrees with missing data. We analyzed whole-genome sequence data from families affected by Alzheimer disease to illustrate the application of the RV-GDT. Given the capability of the RV-GDT to adequately control for population admixture or substructure and analyze pedigrees with missing genotype data and its superior power over other family-based methods, it is an effective tool for elucidating the involvement of RVs in the etiology of complex traits. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.
Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab
2012-01-01
RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Breathing dynamics based parameter sensitivity analysis of hetero-polymeric DNA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Talukder, Srijeeta; Sen, Shrabani; Chaudhury, Pinaki, E-mail: pinakc@rediffmail.com
We study the parameter sensitivity of hetero-polymeric DNA within the purview of DNA breathing dynamics. The degree of correlation between the mean bubble size and the model parameters is estimated for this purpose for three different DNA sequences. The analysis leads us to a better understanding of the sequence dependent nature of the breathing dynamics of hetero-polymeric DNA. Out of the 14 model parameters for DNA stability in the statistical Poland-Scheraga approach, the hydrogen bond interaction ε{sub hb}(AT) for an AT base pair and the ring factor ξ turn out to be the most sensitive parameters. In addition, the stackingmore » interaction ε{sub st}(TA-TA) for an TA-TA nearest neighbor pair of base-pairs is found to be the most sensitive one among all stacking interactions. Moreover, we also establish that the nature of stacking interaction has a deciding effect on the DNA breathing dynamics, not the number of times a particular stacking interaction appears in a sequence. We show that the sensitivity analysis can be used as an effective measure to guide a stochastic optimization technique to find the kinetic rate constants related to the dynamics as opposed to the case where the rate constants are measured using the conventional unbiased way of optimization.« less
Daikoku, Tatsuya; Takahashi, Yuji; Futagami, Hiroko; Tarumoto, Nagayoshi; Yasuda, Hideki
2017-02-01
In real-world auditory environments, humans are exposed to overlapping auditory information such as those made by human voices and musical instruments even during routine physical activities such as walking and cycling. The present study investigated how concurrent physical exercise affects performance of incidental and intentional learning of overlapping auditory streams, and whether physical fitness modulates the performances of learning. Participants were grouped with 11 participants with lower and higher fitness each, based on their Vo 2 max value. They were presented simultaneous auditory sequences with a distinct statistical regularity each other (i.e. statistical learning), while they were pedaling on the bike and seating on a bike at rest. In experiment 1, they were instructed to attend to one of the two sequences and ignore to the other sequence. In experiment 2, they were instructed to attend to both of the two sequences. After exposure to the sequences, learning effects were evaluated by familiarity test. In the experiment 1, performance of statistical learning of ignored sequences during concurrent pedaling could be higher in the participants with high than low physical fitness, whereas in attended sequence, there was no significant difference in performance of statistical learning between high than low physical fitness. Furthermore, there was no significant effect of physical fitness on learning while resting. In the experiment 2, the both participants with high and low physical fitness could perform intentional statistical learning of two simultaneous sequences in the both exercise and rest sessions. The improvement in physical fitness might facilitate incidental but not intentional statistical learning of simultaneous auditory sequences during concurrent physical exercise.
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L
2012-01-01
Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
The Use of Audio and Animation in Computer Based Instruction.
ERIC Educational Resources Information Center
Koroghlanian, Carol; Klein, James D.
This study investigated the effects of audio, animation, and spatial ability in a computer-based instructional program for biology. The program presented instructional material via test or audio with lean text and included eight instructional sequences presented either via static illustrations or animations. High school students enrolled in a…
Teaching/Learning Methods and Students' Classification of Food Items
ERIC Educational Resources Information Center
Hamilton-Ekeke, Joy-Telu; Thomas, Malcolm
2011-01-01
Purpose: This study aims to investigate the effectiveness of a teaching method (TLS (Teaching/Learning Sequence)) based on a social constructivist paradigm on students' conceptualisation of classification of food. Design/methodology/approach: The study compared the TLS model developed by the researcher based on the social constructivist paradigm…
The Listening and Reading Comprehension (LARC) Program....Experiential Based Sequential Training.
ERIC Educational Resources Information Center
Blumenstyk, Holly; And Others
The LARC (Listening and Reading Comprehension) Program, an experiential based story grammar approach to listening and reading comprehension is described, and a pilot study of its effectiveness with communication handicapped children is reviewed. The LARC framework translates children's own recent experiences into sequenced story episodes which are…
Hong, Jungeui; Gresham, David
2017-11-01
Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L
2014-01-05
Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.
Generate Optimized Genetic Rhythm for Enzyme Expression in Non-native systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-11-03
Most amino acids are represented by more than one codon, resulting in redundancy in the genetic code. Silent codon substitutions that do not alter the amino acid sequence still have an effect on protein expression. We have developed an algorithm, GoGREEN, to enhance the expression of foreign proteins in a host organism. GoGREEN selects codons according to frequency patterns seen in the gene of interest using the codon usage table from the host organism. GoGREEN is also designed to accommodate gaps in the sequence.This software takes for input (1) the aligned protein sequences for genes the user wishes to express,more » (2) the codon usage table for the host organism, (3) and the DNA sequence for the target protein found in the host organism. The program will select codons based on codon usage patterns for the target DNA sequence. The program will also select codons for “gaps” found in the aligned protein sequences using the codon usage table from the host organism.« less
Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA
Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev
2012-01-01
B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350
DOE Office of Scientific and Technical Information (OSTI.GOV)
O’Shea, Tuathan P., E-mail: tuathan.oshea@icr.ac.uk; Bamber, Jeffrey C.; Harris, Emma J.
Purpose: Ultrasound-based motion estimation is an expanding subfield of image-guided radiation therapy. Although ultrasound can detect tissue motion that is a fraction of a millimeter, its accuracy is variable. For controlling linear accelerator tracking and gating, ultrasound motion estimates must remain highly accurate throughout the imaging sequence. This study presents a temporal regularization method for correlation-based template matching which aims to improve the accuracy of motion estimates. Methods: Liver ultrasound sequences (15–23 Hz imaging rate, 2.5–5.5 min length) from ten healthy volunteers under free breathing were used. Anatomical features (blood vessels) in each sequence were manually annotated for comparison withmore » normalized cross-correlation based template matching. Five sequences from a Siemens Acuson™ scanner were used for algorithm development (training set). Results from incremental tracking (IT) were compared with a temporal regularization method, which included a highly specific similarity metric and state observer, known as the α–β filter/similarity threshold (ABST). A further five sequences from an Elekta Clarity™ system were used for validation, without alteration of the tracking algorithm (validation set). Results: Overall, the ABST method produced marked improvements in vessel tracking accuracy. For the training set, the mean and 95th percentile (95%) errors (defined as the difference from manual annotations) were 1.6 and 1.4 mm, respectively (compared to 6.2 and 9.1 mm, respectively, for IT). For each sequence, the use of the state observer leads to improvement in the 95% error. For the validation set, the mean and 95% errors for the ABST method were 0.8 and 1.5 mm, respectively. Conclusions: Ultrasound-based motion estimation has potential to monitor liver translation over long time periods with high accuracy. Nonrigid motion (strain) and the quality of the ultrasound data are likely to have an impact on tracking performance. A future study will investigate spatial uniformity of motion and its effect on the motion estimation errors.« less
Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.
2014-01-01
When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups “Quality Control” and “Dropping WGS through families using GWAS framework” focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184
2011-01-01
Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
Exome sequencing of a multigenerational human pedigree.
Hedges, Dale J; Hedges, Dale; Burges, Dan; Powell, Eric; Almonte, Cherylyn; Huang, Jia; Young, Stuart; Boese, Benjamin; Schmidt, Mike; Pericak-Vance, Margaret A; Martin, Eden; Zhang, Xinmin; Harkins, Timothy T; Züchner, Stephan
2009-12-14
Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or approximately 180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of > or = 3, 86% at a read depth of > or = 10, and over 50% of all targets were covered with > or = 20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at > or = 10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered > or = 8x. Our results offer guidance for "real-world" applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments.
[Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].
Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li
2012-09-01
To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.
Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data.
Ralf, Arwin; Montiel González, Diego; Zhong, Kaiyin; Kayser, Manfred
2018-05-01
Next-generation sequencing (NGS) technologies offer immense possibilities given the large genomic data they simultaneously deliver. The human Y-chromosome serves as good example how NGS benefits various applications in evolution, anthropology, genealogy, and forensics. Prior to NGS, the Y-chromosome phylogenetic tree consisted of a few hundred branches, based on NGS data, it now contains many thousands. The complexity of both, Y tree and NGS data provide challenges for haplogroup assignment. For effective analysis and interpretation of Y-chromosome NGS data, we present Yleaf, a publically available, automated, user-friendly software for high-resolution Y-chromosome haplogroup inference independently of library and sequencing methods.
Error reduction and parameter optimization of the TAPIR method for fast T1 mapping.
Zaitsev, M; Steinhoff, S; Shah, N J
2003-06-01
A methodology is presented for the reduction of both systematic and random errors in T(1) determination using TAPIR, a Look-Locker-based fast T(1) mapping technique. The relations between various sequence parameters were carefully investigated in order to develop recipes for choosing optimal sequence parameters. Theoretical predictions for the optimal flip angle were verified experimentally. Inversion pulse imperfections were identified as the main source of systematic errors in T(1) determination with TAPIR. An effective remedy is demonstrated which includes extension of the measurement protocol to include a special sequence for mapping the inversion efficiency itself. Copyright 2003 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Xu, Ye; Wang, Ling; Wang, Shengyao; Liu, Min
2014-09-01
In this article, an effective hybrid immune algorithm (HIA) is presented to solve the distributed permutation flow-shop scheduling problem (DPFSP). First, a decoding method is proposed to transfer a job permutation sequence to a feasible schedule considering both factory dispatching and job sequencing. Secondly, a local search with four search operators is presented based on the characteristics of the problem. Thirdly, a special crossover operator is designed for the DPFSP, and mutation and vaccination operators are also applied within the framework of the HIA to perform an immune search. The influence of parameter setting on the HIA is investigated based on the Taguchi method of design of experiment. Extensive numerical testing results based on 420 small-sized instances and 720 large-sized instances are provided. The effectiveness of the HIA is demonstrated by comparison with some existing heuristic algorithms and the variable neighbourhood descent methods. New best known solutions are obtained by the HIA for 17 out of 420 small-sized instances and 585 out of 720 large-sized instances.
A simple method for MR elastography: a gradient-echo type multi-echo sequence.
Numano, Tomokazu; Mizuhara, Kazuyuki; Hata, Junichi; Washio, Toshikatsu; Homma, Kazuhiro
2015-01-01
To demonstrate the feasibility of a novel MR elastography (MRE) technique based on a conventional gradient-echo type multi-echo MR sequence which does not need additional bipolar magnetic field gradients (motion encoding gradient: MEG), yet is sensitive to vibration. In a gradient-echo type multi-echo MR sequence, several images are produced from each echo of the train with different echo times (TEs). If these echoes are synchronized with the vibration, each readout's gradient lobes achieve a MEG-like effect, and the later generated echo causes a greater MEG-like effect. The sequence was tested for the tissue-mimicking agarose gel phantoms and the psoas major muscles of healthy volunteers. It was confirmed that the readout gradient lobes caused an MEG-like effect and the later TE images had higher sensitivity to vibrations. The magnitude image of later generated echo suffered the T2 decay and the susceptibility artifacts, but the wave image and elastogram of later generated echo were unaffected by these effects. In in vivo experiments, this method was able to measure the mean shear modulus of the psoas major muscle. From the results of phantom experiments and volunteer studies, it was shown that this method has clinical application potential. Copyright © 2014 Elsevier Inc. All rights reserved.
Study on multiple-hops performance of MOOC sequences-based optical labels for OPS networks
NASA Astrophysics Data System (ADS)
Zhang, Chongfu; Qiu, Kun; Ma, Chunli
2009-11-01
In this paper, we utilize a new study method that is under independent case of multiple optical orthogonal codes to derive the probability function of MOOCS-OPS networks, discuss the performance characteristics for a variety of parameters, and compare some characteristics of the system employed by single optical orthogonal code or multiple optical orthogonal codes sequences-based optical labels. The performance of the system is also calculated, and our results verify that the method is effective. Additionally it is found that performance of MOOCS-OPS networks would, negatively, be worsened, compared with single optical orthogonal code-based optical label for optical packet switching (SOOC-OPS); however, MOOCS-OPS networks can greatly enlarge the scalability of optical packet switching networks.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Keegan, Johnalan; Burke, Edward; Condron, James
2009-01-01
In the field of assistive technology, the electrooculogram (EOG) can be used as a channel of communication and the basis of a man-machine interface. For many people with severe motor disabilities, simple actions such as changing the TV channel require assistance. This paper describes a method of detecting saccadic eye movements and the use of a saccade sequence classification algorithm to facilitate communication and control. Saccades are fast eye movements that occurs when a person's gaze jumps from one fixation point to another. The classification is based on pre-defined sequences of saccades, guided by a static visual template (e.g. a page or poster). The template, consisting of a table of symbols each having a clearly identifiable fixation point, is situated within view of the user. To execute a particular command, the user moves his or her gaze through a pre-defined path of eye movements. This results in a well-formed sequence of saccades which are translated into a command if a match is found in a library of predefined sequences. A coordinate transformation algorithm is applied to each candidate sequence of recorded saccades to mitigate the effect of changes in the user's position and orientation relative to the visual template. Upon recognition of a saccade sequence from the library, its associated command is executed. A preliminary experiment in which two subjects were instructed to perform a series of command sequences consisting of 8 different commands are presented in the final sections. The system is also shown to be extensible to facilitate convenient text entry via an alphabetic visual template.
Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi
2013-11-20
With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard
2004-09-09
Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Meehan, Sean K.; Randhawa, Bubblepreet; Wessel, Brenda; Boyd, Lara A.
2010-01-01
Implicit motor learning is preserved after stroke, but how the brain compensates for damage to facilitate learning is unclear. We used a random effects analysis to determine how stroke alters patterns of brain activity during implicit sequence-specific motor learning as compared to general improvements in motor control. Nine healthy participants and 9 individuals with chronic, right focal sub-cortical stroke performed a continuous joystick-based tracking task during an initial fMRI session, over 5 days of practice, and a retention test during a separate fMRI session. Sequence-specific implicit motor learning was differentiated from general improvements in motor control by comparing tracking performance on a novel, repeated tracking sequences during early practice and again at the retention test. Both groups demonstrated implicit sequence-specific motor learning at the retention test, yet substantial differences were apparent. At retention, healthy control participants demonstrated increased BOLD response in left dorsal premotor cortex (BA 6) but decreased BOLD response left dorsolateral prefrontal cortex (DLPFC; BA 9) during repeated sequence tracking. In contrast, at retention individuals with stroke did not show this reduction in DLPFC during repeated tracking. Instead implicit sequence-specific motor learning and general improvements in motor control were associated with increased BOLD response in the left middle frontal gyrus BA 8, regardless of sequence type after stroke. These data emphasize the potential importance of a prefrontal-based attentional network for implicit motor learning after stroke. The present study is the first to highlight the importance of the prefrontal cortex for implicit sequence-specific motor learning after stroke. PMID:20725908
Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred
2014-01-01
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064
Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred
2014-09-01
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.
Ashfaq, Muhammad; Asif, Muhammad; Anjum, Zahid Iqbal; Zafar, Yusuf
2013-07-01
Although two plastid regions have been adopted as the standard markers for plant DNA barcoding, their limited resolution has provoked the consideration of other gene regions, especially in taxonomically diverse genera. The genus Gossypium (cotton) includes eight diploid genome groups (A-G, and K) and five allotetraploid species which are difficult to discriminate morphologically. In this study, we tested the effectiveness of three widely used markers (matK, rbcL, and ITS2) in the discrimination of 20 diploid and five tetraploid species of cotton. Sequences were analysed locus-wise and in combinations to determine the most effective strategy for species identification. Sequence recovery was high, ranging from 92% to 100% with mean pairwise interspecific distance highest for ITS2 (3.68%) and lowest for rbcL (0.43%). At a 0.5% threshold, the combination of matK+ITS2 produced the greatest number of species clusters. Based on 'best match' analysis, the combination of matK+ITS2 was best, while based on 'all species barcodes' analysis, ITS2 gave the highest percentage of correct species identifications (98.93%). The combination of sequences for all three markers produced the best resolved tree. The disparity index test based on matK+rbcL+ITS2 was significant (P < 0.05) for a higher number of species pairs than the individual gene sequences. Although all three barcodes separated the species with respect to their genome type, no single combination of barcodes could differentiate all the Gossypium species, and tetraploid species were particularly difficult. © 2013 John Wiley & Sons Ltd.
Simple data-smoothing and noise-suppression technique
NASA Technical Reports Server (NTRS)
Duty, R. L.
1970-01-01
Algorithm, based on the Borel method of summing divergent sequences, is used for smoothing noisy data where knowledge of frequency content is not required. Technique's effectiveness is demonstrated by a series of graphs.
NASA Technical Reports Server (NTRS)
Dutta, Soumyo; Way, David W.
2017-01-01
Mars 2020, the next planned U.S. rover mission to land on Mars, is based on the design of the successful 2012 Mars Science Laboratory (MSL) mission. Mars 2020 retains most of the entry, descent, and landing (EDL) sequences of MSL, including the closed-loop entry guidance scheme based on the Apollo guidance algorithm. However, unlike MSL, Mars 2020 will trigger the parachute deployment and descent sequence on range trigger rather than the previously used velocity trigger. This difference will greatly reduce the landing ellipse sizes. Additionally, the relative contribution of each models to the total ellipse sizes have changed greatly due to the switch to range trigger. This paper considers the effect on trajectory dispersions due to changing the trigger schemes and the contributions of these various models to trajectory and EDL performance.
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling
Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien
2012-01-01
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard
2010-03-31
Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
Budak, Hikmet; Kantar, Melda
2015-07-01
MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.
Watanabe, Manabu; Kusano, Junko; Ohtaki, Shinsaku; Ishikura, Takashi; Katayama, Jin; Koguchi, Akira; Paumen, Michael; Hayashi, Yoshiharu
2014-09-01
Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line) were used as a model. Single-cell capture was performed using laser capture microdissection (LCM) with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈ 10(6) cells) were subjected to whole genome amplification (WGA). For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel) was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 10(31-35). For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100 × were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100 × were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.
ZifBASE: a database of zinc finger proteins and associated resources.
Jayakanthan, Mannu; Muthukumaran, Jayaraman; Chandrasekar, Sanniyasi; Chawla, Konika; Punetha, Ankita; Sundar, Durai
2009-09-09
Information on the occurrence of zinc finger protein motifs in genomes is crucial to the developing field of molecular genome engineering. The knowledge of their target DNA-binding sequences is vital to develop chimeric proteins for targeted genome engineering and site-specific gene correction. There is a need to develop a computational resource of zinc finger proteins (ZFP) to identify the potential binding sites and its location, which reduce the time of in vivo task, and overcome the difficulties in selecting the specific type of zinc finger protein and the target site in the DNA sequence. ZifBASE provides an extensive collection of various natural and engineered ZFP. It uses standard names and a genetic and structural classification scheme to present data retrieved from UniProtKB, GenBank, Protein Data Bank, ModBase, Protein Model Portal and the literature. It also incorporates specialized features of ZFP including finger sequences and positions, number of fingers, physiochemical properties, classes, framework, PubMed citations with links to experimental structures (PDB, if available) and modeled structures of natural zinc finger proteins. ZifBASE provides information on zinc finger proteins (both natural and engineered ones), the number of finger units in each of the zinc finger proteins (with multiple fingers), the synergy between the adjacent fingers and their positions. Additionally, it gives the individual finger sequence and their target DNA site to which it binds for better and clear understanding on the interactions of adjacent fingers. The current version of ZifBASE contains 139 entries of which 89 are engineered ZFPs, containing 3-7F totaling to 296 fingers. There are 50 natural zinc finger protein entries ranging from 2-13F, totaling to 307 fingers. It has sequences and structures from literature, Protein Data Bank, ModBase and Protein Model Portal. The interface is cross linked to other public databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point) of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.
Zhang, Boyang; Huang, Kunlun; Zhu, Liye; Luo, Yunbo; Xu, Wentao
2017-07-01
In this review, we introduce a new concept, precision toxicology: the mode of action of chemical- or drug-induced toxicity can be sensitively and specifically investigated by isolating a small group of cells or even a single cell with typical phenotype of interest followed by a single cell sequencing-based analysis. Precision toxicology can contribute to the better detection of subtle intracellular changes in response to exogenous substrates, and thus help researchers find solutions to control or relieve the toxicological effects that are serious threats to human health. We give examples for single cell isolation and recommend laser capture microdissection for in vivo studies and flow cytometric sorting for in vitro studies. In addition, we introduce the procedures for single cell sequencing and describe the expected application of these techniques to toxicological evaluations and mechanism exploration, which we believe will become a trend in toxicology.
2012-01-01
Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Targeted DNA sequencing and in situ mutation analysis using mobile phone microscopy
NASA Astrophysics Data System (ADS)
Kühnemund, Malte; Wei, Qingshan; Darai, Evangelia; Wang, Yingjie; Hernández-Neuta, Iván; Yang, Zhao; Tseng, Derek; Ahlford, Annika; Mathot, Lucy; Sjöblom, Tobias; Ozcan, Aydogan; Nilsson, Mats
2017-01-01
Molecular diagnostics is typically outsourced to well-equipped centralized laboratories, often far from the patient. We developed molecular assays and portable optical imaging designs that permit on-site diagnostics with a cost-effective mobile-phone-based multimodal microscope. We demonstrate that targeted next-generation DNA sequencing reactions and in situ point mutation detection assays in preserved tumour samples can be imaged and analysed using mobile phone microscopy, achieving a new milestone for tele-medicine technologies.
Molecular determinants of origin discrimination by Orc1 initiators in archaea.
Dueber, Erin C; Costa, Alessandro; Corn, Jacob E; Bell, Stephen D; Berger, James M
2011-05-01
Unlike bacteria, many eukaryotes initiate DNA replication from genomic sites that lack apparent sequence conservation. These loci are identified and bound by the origin recognition complex (ORC), and subsequently activated by a cascade of events that includes recruitment of an additional factor, Cdc6. Archaeal organisms generally possess one or more Orc1/Cdc6 homologs, belonging to the Initiator clade of ATPases associated with various cellular activities (AAA(+)) superfamily; however, these proteins recognize specific sequences within replication origins. Atomic resolution studies have shown that archaeal Orc1 proteins contact double-stranded DNA through an N-terminal AAA(+) domain and a C-terminal winged-helix domain (WHD), but use remarkably few base-specific contacts. To investigate the biochemical effects of these associations, we mutated the DNA-interacting elements of the Orc1-1 and Orc1-3 paralogs from the archaeon Sulfolobus solfataricus, and tested their effect on origin binding and deformation. We find that the AAA(+) domain has an unpredicted role in controlling the sequence selectivity of DNA binding, despite an absence of base-specific contacts to this region. Our results show that both the WHD and ATPase region influence origin recognition by Orc1/Cdc6, and suggest that not only DNA sequence, but also local DNA structure help define archaeal initiator binding sites. © The Author(s) 2011. Published by Oxford University Press.
Xiong, Ai-Sheng; Yao, Quan-Hong; Peng, Ri-He; Li, Xian; Fan, Hui-Qin; Cheng, Zong-Ming; Li, Yi
2004-07-07
Chemical synthesis of DNA sequences provides a powerful tool for modifying genes and for studying gene function, structure and expression. Here, we report a simple, high-fidelity and cost-effective PCR-based two-step DNA synthesis (PTDS) method for synthesis of long segments of DNA. The method involves two steps. (i) Synthesis of individual fragments of the DNA of interest: ten to twelve 60mer oligonucleotides with 20 bp overlap are mixed and a PCR reaction is carried out with high-fidelity DNA polymerase Pfu to produce DNA fragments that are approximately 500 bp in length. (ii) Synthesis of the entire sequence of the DNA of interest: five to ten PCR products from the first step are combined and used as the template for a second PCR reaction using high-fidelity DNA polymerase pyrobest, with the two outermost oligonucleotides as primers. Compared with the previously published methods, the PTDS method is rapid (5-7 days) and suitable for synthesizing long segments of DNA (5-6 kb) with high G + C contents, repetitive sequences or complex secondary structures. Thus, the PTDS method provides an alternative tool for synthesizing and assembling long genes with complex structures. Using the newly developed PTDS method, we have successfully obtained several genes of interest with sizes ranging from 1.0 to 5.4 kb.
Genome-wide Target Enrichment-aided Chip Design: a 66 K SNP Chip for Cashmere Goat.
Qiao, Xian; Su, Rui; Wang, Yang; Wang, Ruijun; Yang, Ting; Li, Xiaokai; Chen, Wei; He, Shiyang; Jiang, Yu; Xu, Qiwu; Wan, Wenting; Zhang, Yaolei; Zhang, Wenguang; Chen, Jiang; Liu, Bin; Liu, Xin; Fan, Yixing; Chen, Duoyuan; Jiang, Huaizhi; Fang, Dongming; Liu, Zhihong; Wang, Xiaowen; Zhang, Yanjun; Mao, Danqing; Wang, Zhiying; Di, Ran; Zhao, Qianjun; Zhong, Tao; Yang, Huanming; Wang, Jian; Wang, Wen; Dong, Yang; Chen, Xiaoli; Xu, Xun; Li, Jinquan
2017-08-17
Compared with the commercially available single nucleotide polymorphism (SNP) chip based on the Bead Chip technology, the solution hybrid selection (SHS)-based target enrichment SNP chip is not only design-flexible, but also cost-effective for genotype sequencing. In this study, we propose to design an animal SNP chip using the SHS-based target enrichment strategy for the first time. As an update to the international collaboration on goat research, a 66 K SNP chip for cashmere goat was created from the whole-genome sequencing data of 73 individuals. Verification of this 66 K SNP chip with the whole-genome sequencing data of 436 cashmere goats showed that the SNP call rates was between 95.3% and 99.8%. The average sequencing depth for target SNPs were 40X. The capture regions were shown to be 200 bp that flank target SNPs. This chip was further tested in a genome-wide association analysis of cashmere fineness (fiber diameter). Several top hit loci were found marginally associated with signaling pathways involved in hair growth. These results demonstrate that the 66 K SNP chip is a useful tool in the genomic analyses of cashmere goats. The successful chip design shows that the SHS-based target enrichment strategy could be applied to SNP chip design in other species.
Oliva, M L; Santomauro-Vaz, E M; Andrade, S A; Juliano, M A; Pott, V J; Sampaio, M U; Sampaio, C A
2001-01-01
We have previously described Kunitz-type serine proteinase inhibitors purified from Bauhinia seeds. Human plasma kallikrein shows different susceptibility to those inhibitors. In this communication, we describe the interaction of human plasma kallikrein with fluorogenic and non-fluorogenic peptides based on the Bauhinia inhibitors' reactive site. The hydrolysis of the substrate based on the B. variegata inhibitor reactive site sequence, Abz-VVISALPRSVFIQ-EDDnp (Km 1.42 microM, kcat 0.06 s(-1), and kcat/Km 4.23 x 10(4) M(-1) s(-1)), is more favorable than that of Abz-VMIAALPRTMFIQ-EDDnp, related to the B. ungulata sequence (Km 0.43 microM, kcat 0.00017 s(-1), and kcat/Km 3.9 x 10(2) M(-1) s(-1)). Human plasma kallikrein does not hydrolyze the substrates Abz-RPGLPVRFESPL-EDDnp and Abz-FESPLRINIIKE-EDDnp based on the B. bauhinioides inhibitor reactive site sequence, the most effective inhibitor of the enzyme. These peptides are competitive inhibitors with Ki values in the nM range. The synthetic peptide containing 19 amino acids based on the B. bauhinioides inhibitor reactive site (RPGLPVRFESPL) is poorly cleaved by kallikrein. The given substrates are highly specific for trypsin and chymotrypsin hydrolysis. Other serine proteinases such as factor Xa, factor XII, thrombin and plasmin do not hydrolyze B. bauhinioides inhibitor related substrates.
ERIC Educational Resources Information Center
Schlenker, Richard M.; Schlenker, Karl R.
2000-01-01
Presents a five-activity sequence designed to help students understand the effects of population doubling. Activities consider the effects of population doubling on human interactions, drinking water supplies, and food supply. Students also develop graphs of data and write research papers. (WRM)
ERIC Educational Resources Information Center
Hanley, Gregory P.; Piazza, Cathleen C.; Fisher, Wayne W.; Maglieri, Kristen A.
2005-01-01
The current study describes an assessment sequence that may be used to identify individualized, effective, and preferred interventions for severe problem behavior in lieu of relying on a restricted set of treatment options that are assumed to be in the best interest of consumers. The relative effectiveness of functional communication training…
Determination of a mutational spectrum
Thilly, William G.; Keohavong, Phouthone
1991-01-01
A method of resolving (physically separating) mutant DNA from nonmutant DNA and a method of defining or establishing a mutational spectrum or profile of alterations present in nucleic acid sequences from a sample to be analyzed, such as a tissue or body fluid. The present method is based on the fact that it is possible, through the use of DGGE, to separate nucleic acid sequences which differ by only a single base change and on the ability to detect the separate mutant molecules. The present invention, in another aspect, relates to a method for determining a mutational spectrum in a DNA sequence of interest present in a population of cells. The method of the present invention is useful as a diagnostic or analytical tool in forensic science in assessing environmental and/or occupational exposures to potentially genetically toxic materials (also referred to as potential mutagens); in biotechnology, particularly in the study of the relationship between the amino acid sequence of enzymes and other biologically-active proteins or protein-containing substances and their respective functions; and in determining the effects of drugs, cosmetics and other chemicals for which toxicity data must be obtained.
SPMBR: a scalable algorithm for mining sequential patterns based on bitmaps
NASA Astrophysics Data System (ADS)
Xu, Xiwei; Zhang, Changhai
2013-12-01
Now some sequential patterns mining algorithms generate too many candidate sequences, and increase the processing cost of support counting. Therefore, we present an effective and scalable algorithm called SPMBR (Sequential Patterns Mining based on Bitmap Representation) to solve the problem of mining the sequential patterns for large databases. Our method differs from previous related works of mining sequential patterns. The main difference is that the database of sequential patterns is represented by bitmaps, and a simplified bitmap structure is presented firstly. In this paper, First the algorithm generate candidate sequences by SE(Sequence Extension) and IE(Item Extension), and then obtain all frequent sequences by comparing the original bitmap and the extended item bitmap .This method could simplify the problem of mining the sequential patterns and avoid the high processing cost of support counting. Both theories and experiments indicate that the performance of SPMBR is predominant for large transaction databases, the required memory size for storing temporal data is much less during mining process, and all sequential patterns can be mined with feasibility.
Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri
2016-01-01
Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774
Conceptual issues in Bayesian divergence time estimation
2016-01-01
Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’. PMID:27325831
Conceptual issues in Bayesian divergence time estimation.
Rannala, Bruce
2016-07-19
Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. © 2016 The Author(s).
Extra projection data identification method for fast-continuous-rotation industrial cone-beam CT.
Yang, Min; Duan, Shengling; Duan, Jinghui; Wang, Xiaolong; Li, Xingdong; Meng, Fanyong; Zhang, Jianhai
2013-01-01
Fast-continuous-rotation is an effective measure to improve the scanning speed and decrease the radiation dose for cone-beam CT. However, because of acceleration and deceleration of the motor, as well as the response lag of the scanning control terminals to the host PC, uneven-distributed and redundant projections are inevitably created, which seriously decrease the quality of the reconstruction images. In this paper, we first analyzed the aspects of the theoretical sequence chart of the fast-continuous-rotation mode. Then, an optimized sequence chart was proposed by extending the rotation angle span to ensure the effective 2π-span projections were situated in the stable rotation stage. In order to match the rotation angle with the projection image accurately, structure similarity (SSIM) index was used as a control parameter for extraction of the effective projection sequence which was exactly the complete projection data for image reconstruction. The experimental results showed that SSIM based method had a high accuracy of projection view locating and was easy to realize.
Enantiospecific recognition of DNA sequences by a proflavine Tröger base.
Bailly, C; Laine, W; Demeunynck, M; Lhomme, J
2000-07-05
The DNA interaction of a chiral Tröger base derived from proflavine was investigated by DNA melting temperature measurements and complementary biochemical assays. DNase I footprinting experiments demonstrate that the binding of the proflavine-based Tröger base is both enantio- and sequence-specific. The (+)-isomer poorly interacts with DNA in a non-sequence-selective fashion. In sharp contrast, the corresponding (-)-isomer recognizes preferentially certain DNA sequences containing both A. T and G. C base pairs, such as the motifs 5'-GTT. AAC and 5'-ATGA. TCAT. This is the first experimental demonstration that acridine-type Tröger bases can be used for enantiospecific recognition of DNA sequences. Copyright 2000 Academic Press.
Li, Guotian; Jain, Rashmi; Chern, Mawsheng; Pham, Nikki T; Martin, Joel A; Wei, Tong; Schackwitz, Wendy S; Lipzen, Anna M; Duong, Phat Q; Jones, Kyle C; Jiang, Liangrong; Ruan, Deling; Bauer, Diane; Peng, Yi; Barry, Kerrie W; Schmutz, Jeremy; Ronald, Pamela C
2017-06-01
The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake ( Oryza sativa ssp japonica ), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. © 2017 American Society of Plant Biologists. All rights reserved.
2014-01-01
Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245
Sanger sequencing as a first-line approach for molecular diagnosis of Andersen-Tawil syndrome.
Totomoch-Serra, Armando; Marquez, Manlio F; Cervantes-Barragán, David E
2017-01-01
In 1977, Frederick Sanger developed a new method for DNA sequencing based on the chain termination method, now known as the Sanger sequencing method (SSM). Recently, massive parallel sequencing, better known as next-generation sequencing (NGS), is replacing the SSM for detecting mutations in cardiovascular diseases with a genetic background. The present opinion article wants to remark that "targeted" SSM is still effective as a first-line approach for the molecular diagnosis of some specific conditions, as is the case for Andersen-Tawil syndrome (ATS). ATS is described as a rare multisystemic autosomal dominant channelopathy syndrome caused mainly by a heterozygous mutation in the KCNJ2 gene . KCJN2 has particular characteristics that make it attractive for "directed" SSM. KCNJ2 has a sequence of 17,510 base pairs (bp), and a short coding region with two exons (exon 1=166 bp and exon 2=5220 bp), half of the mutations are located in the C-terminal cytosolic domain, a mutational hotspot has been described in residue Arg218, and this gene explains the phenotype in 60% of ATS cases that fulfill all the clinical criteria of the disease. In order to increase the diagnosis of ATS we urge cardiologists to search for facial and muscular abnormalities in subjects with frequent ventricular arrhythmias (especially bigeminy) and prominent U waves on the electrocardiogram.
Sanger sequencing as a first-line approach for molecular diagnosis of Andersen-Tawil syndrome
Totomoch-Serra, Armando; Marquez, Manlio F.; Cervantes-Barragán, David E.
2017-01-01
In 1977, Frederick Sanger developed a new method for DNA sequencing based on the chain termination method, now known as the Sanger sequencing method (SSM). Recently, massive parallel sequencing, better known as next-generation sequencing (NGS), is replacing the SSM for detecting mutations in cardiovascular diseases with a genetic background. The present opinion article wants to remark that “targeted” SSM is still effective as a first-line approach for the molecular diagnosis of some specific conditions, as is the case for Andersen-Tawil syndrome (ATS). ATS is described as a rare multisystemic autosomal dominant channelopathy syndrome caused mainly by a heterozygous mutation in the KCNJ2 gene . KCJN2 has particular characteristics that make it attractive for “directed” SSM. KCNJ2 has a sequence of 17,510 base pairs (bp), and a short coding region with two exons (exon 1=166 bp and exon 2=5220 bp), half of the mutations are located in the C-terminal cytosolic domain, a mutational hotspot has been described in residue Arg218, and this gene explains the phenotype in 60% of ATS cases that fulfill all the clinical criteria of the disease. In order to increase the diagnosis of ATS we urge cardiologists to search for facial and muscular abnormalities in subjects with frequent ventricular arrhythmias (especially bigeminy) and prominent U waves on the electrocardiogram. PMID:29093808
Li, Guotian; Jain, Rashmi; Chern, Mawsheng; ...
2017-06-02
The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake (Oryza sativa ssp japonica), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportionmore » of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. In conclusion, this work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations.« less
Ryberg, Martin; Kristiansson, Erik; Sjökvist, Elisabet; Nilsson, R Henrik
2009-01-01
The environmental and distributional data associated with fungal internal transcribed spacer (ITS) sequences in GenBank are investigated and a new web-based tool with which these sequences can be explored is introduced. All fungal ITS sequences in GenBank were classified as either identified to species level or insufficiently identified and compared using BLAST. The results are made available as a biweekly updated web service that can be queried to retrieve all insufficiently identified sequences (IIS) associated with any fungal genus. The most commonly available annotation items in GenBank are isolation source (55%); country of origin (50%); and specific host (38%). The molecular sampling of fungi shows a bias towards North America, Europe, China, and Japan whereas vast geographical areas remain effectively unexplored. Mycorrhizal and parasitic genera are on average associated with more IIS than are saprophytic taxa. Glomus, Alternaria, and Tomentella are the genera represented by the highest number of insufficiently identified ITS sequences in GenBank. The web service presented (http://andromeda.botany.gu.se/emerencia.html#genus_search) offers new means, particularly for mycorrhizal and plant pathogenic fungi, to examine the IIS in GenBank in a taxon-oriented framework and to explore their metadata in an easily accessible and time-efficient manner.
Ramirez, Agnese; Crisafulli, Sebastiano G.; Rizzuti, Mafalda; Bresolin, Nereo; Comi, Giacomo P.; Corti, Stefania
2018-01-01
Spinal muscular atrophy (SMA) is an autosomal-recessive childhood motor neuron disease and the main genetic cause of infant mortality. SMA is caused by deletions or mutations in the survival motor neuron 1 (SMN1) gene, which results in SMN protein deficiency. Only one approved drug has recently become available and allows for the correction of aberrant splicing of the paralogous SMN2 gene by antisense oligonucleotides (ASOs), leading to production of full-length SMN protein. We have already demonstrated that a sequence of an ASO variant, Morpholino (MO), is particularly suitable because of its safety and efficacy profile and is both able to increase SMN levels and rescue the murine SMA phenotype. Here, we optimized this strategy by testing the efficacy of four new MO sequences targeting SMN2. Two out of the four new MO sequences showed better efficacy in terms of SMN protein production both in SMA induced pluripotent stem cells (iPSCs) and SMAΔ7 mice. Further, the effect was enhanced when different MO sequences were administered in combination. Our data provide an important insight for MO-based treatment for SMA. Optimization of the target sequence and validation of a treatment based on a combination of different MO sequences could support further pre-clinical studies and the progression toward future clinical trials. PMID:29316633
Ramirez, Agnese; Crisafulli, Sebastiano G; Rizzuti, Mafalda; Bresolin, Nereo; Comi, Giacomo P; Corti, Stefania; Nizzardo, Monica
2018-01-06
Spinal muscular atrophy (SMA) is an autosomal-recessive childhood motor neuron disease and the main genetic cause of infant mortality. SMA is caused by deletions or mutations in the survival motor neuron 1 ( SMN1 ) gene, which results in SMN protein deficiency. Only one approved drug has recently become available and allows for the correction of aberrant splicing of the paralogous SMN2 gene by antisense oligonucleotides (ASOs), leading to production of full-length SMN protein. We have already demonstrated that a sequence of an ASO variant, Morpholino (MO), is particularly suitable because of its safety and efficacy profile and is both able to increase SMN levels and rescue the murine SMA phenotype. Here, we optimized this strategy by testing the efficacy of four new MO sequences targeting SMN2 . Two out of the four new MO sequences showed better efficacy in terms of SMN protein production both in SMA induced pluripotent stem cells (iPSCs) and SMAΔ7 mice. Further, the effect was enhanced when different MO sequences were administered in combination. Our data provide an important insight for MO-based treatment for SMA. Optimization of the target sequence and validation of a treatment based on a combination of different MO sequences could support further pre-clinical studies and the progression toward future clinical trials.
Qiao, Jun-Qin; Liang, Chao; Wei, Lan-Chun; Cao, Zhao-Ming; Lian, Hong-Zhen
2016-12-01
The study on nucleic acid retention in ion-pair reversed-phase high-performance liquid chromatography mainly focuses on size-dependence, however, other factors influencing retention behaviors have not been comprehensively clarified up to date. In this present work, the retention behaviors of oligonucleotides and double-stranded DNAs were investigated on silica-based C 18 stationary phase by ion-pair reversed-phase high-performance liquid chromatography. It is found that the retention of oligonucleotides was influenced by base composition and base sequence as well as size, and oligonucleotides prone to self-dimerization have weaker retention than those not prone to self-dimerization but with the same base composition. However, homo-oligonucleotides are suitable for the size-dependent separation as a special case of oligonucleotides. For double-stranded DNAs, the retention is also influenced by base composition and base sequence, as well as size. This may be attributed to the interaction of exposed bases in major or minor grooves with the hydrophobic alky chains of stationary phase. In addition, no specific influence of guanine and cytosine content was confirmed on retention of double-stranded DNAs. Notably, the space effect resulted from the stereostructure of nucleic acids also influences the retention behavior in ion-pair reversed-phase high-performance liquid chromatography. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ni, Guiyan; Cavero, David; Fangmann, Anna; Erbe, Malena; Simianer, Henner
2017-01-16
With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, -(log 10 P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with -(log 10 P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.
Liu, Yanli; Huangfu, Jie; Qi, Feng; Kaleem, Imdad; E, Wenwen; Li, Chun
2012-01-01
We cloned the β-glucuronidase gene (AtGUS) from Aspergillus terreus Li-20 encoding 657 amino acids (aa), which can transform glycyrrhizin into glycyrrhetinic acid monoglucuronide (GAMG) and glycyrrhetinic acid (GA). Based on sequence alignment, the C-terminal non-conservative sequence showed low identity with those of other species; thus, the partial sequence AtGUS(-3t) (1–592 aa) was amplified to determine the effects of the non-conservative sequence on the enzymatic properties. AtGUS and AtGUS(-3t) were expressed in E. coli BL21, producing AtGUS-E and AtGUS(-3t)-E, respectively. At the similar optimum temperature (55°C) and pH (AtGUS-E, 6.6; AtGUS(-3t)-E, 7.0) conditions, the thermal stability of AtGUS(-3t)-E was enhanced at 65°C, and the metal ions Co2+, Ca2+ and Ni2+ showed opposite effects on AtGUS-E and AtGUS(-3t)-E, respectively. Furthermore, Km of AtGUS(-3t)-E (1.95 mM) was just nearly one-seventh that of AtGUS-E (12.9 mM), whereas the catalytic efficiency of AtGUS(-3t)-E was 3.2 fold higher than that of AtGUS-E (7.16 vs. 2.24 mM s−1), revealing that the truncation of non-conservative sequence can significantly improve the catalytic efficiency of AtGUS. Conformational analysis illustrated significant difference in the secondary structure between AtGUS-E and AtGUS(-3t)-E by circular dichroism (CD). The results showed that the truncation of the non-conservative sequence could preferably alter and influence the stability and catalytic efficiency of enzyme. PMID:22347419
EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.
Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan
2018-01-01
Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Chu, Brian C; Colognori, Daniela B; Yang, Guang; Xie, Min-ge; Lindsey Bergman, R; Piacentini, John
2015-05-01
Behavioral engagement and cognitive coping have been hypothesized to mediate effectiveness of exposure-based therapies. Identifying which specific child factors mediate successful therapy and which therapist factors facilitate change can help make our evidence-based treatments more efficient and robust. The current study examines the specificity and temporal sequence of relations among hypothesized client and therapist mediators in exposure therapy for pediatric Obsessive Compulsive Disorder (OCD). Youth coping (cognitive, behavioral), youth safety behaviors (avoidance, escape, compulsive behaviors), therapist interventions (cognitive, exposure extensiveness), and youth anxiety were rated via observational ratings of therapy sessions of OCD youth (N=43; ages=8 - 17; 62.8% male) who had received Exposure and Response Prevention (ERP). Regression analysis using Generalized Estimation Equations and cross-lagged panel analysis (CLPA) were conducted to model anxiety change within and across sessions, to determine formal mediators of anxiety change, and to establish sequence of effects. Anxiety ratings decreased linearly across exposures within sessions. Youth coping and therapist interventions significantly mediated anxiety change across exposures, and youth-interfering behavior mediated anxiety change at the trend level. In CLPA, youth-interfering behaviors predicted, and were predicted by, changes in anxiety. Youth coping was predicted by prior anxiety change. The study provides a preliminary examination of specificity and temporal sequence among child and therapist behaviors in predicting youth anxiety. Results suggest that therapists should educate clients in the natural rebound effects of anxiety between sessions and should be aware of the negatively reinforcing properties of avoidance during exposure. Copyright © 2015. Published by Elsevier Ltd.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis
Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab
2012-01-01
RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
Worley, K C; Wiese, B A; Smith, R F
1995-09-01
BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).
Save the last dance for me: unwanted serial position effects in jury evaluations.
Bruine de Bruin, Wändi
2005-03-01
Whenever competing options are considered in sequence, their evaluations may be affected by order of appearance. Such serial position effects would threaten the fairness of competitions using jury evaluations. Randomization cannot reduce potential order effects, but it does give candidates an equal chance of being assigned to preferred serial positions. Whether, or what, serial position effects emerge may depend on the cognitive demands of the judgment task. In end-of-sequence procedures, final scores are not given until all candidates have performed, possibly burdening judges' memory. If judges' evaluations are based on how well they remember performances, serial position effects may resemble those found with free recall. Candidates may also be evaluated step-by-step, immediately after each performance. This procedure should not burden memory, though it may produce different serial position effects. Yet, this paper reports similar serial position effects with end-of-sequence and step-by-step procedures used for the Eurovision Song Contest: Ratings increased with serial position. The linear order effect was replicated in the step-by-step judgments of World and European Figure Skating Contests. It is proposed that, independent of the evaluation procedure, judges' initial impressions of sequentially appearing candidates may be formed step-by-step, yielding serial position effects.
Pure Perceptual-Based Sequence Learning: A Role for Visuospatial Attention
ERIC Educational Resources Information Center
Remillard, Gilbert
2009-01-01
Learning the structure of a sequence of target locations when target location is not the response dimension and the sequence of target locations is uncorrelated with the sequence of responses is called pure perceptual-based sequence learning. The paradigm introduced by G. Remillard (2003) was used to determine whether orienting of visuospatial…
A rule of seven in Watson-Crick base-pairing of mismatched sequences.
Cisse, Ibrahim I; Kim, Hajin; Ha, Taekjip
2012-05-13
Sequence recognition through base-pairing is essential for DNA repair and gene regulation, but the basic rules governing this process remain elusive. In particular, the kinetics of annealing between two imperfectly matched strands is not well characterized, despite its potential importance in nucleic acid-based biotechnologies and gene silencing. Here we use single-molecule fluorescence to visualize the multiple annealing and melting reactions of two untethered strands inside a porous vesicle, allowing us to precisely quantify the annealing and melting rates. The data as a function of mismatch position suggest that seven contiguous base pairs are needed for rapid annealing of DNA and RNA. This phenomenological rule of seven may underlie the requirement for seven nucleotides of complementarity to seed gene silencing by small noncoding RNA and may help guide performance improvement in DNA- and RNA-based bio- and nanotechnologies, in which off-target effects can be detrimental.
Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.
1992-05-01
DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-01-01
Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
The rapid evolution of molecular genetic diagnostics in neuromuscular diseases.
Volk, Alexander E; Kubisch, Christian
2017-10-01
The development of massively parallel sequencing (MPS) has revolutionized molecular genetic diagnostics in monogenic disorders. The present review gives a brief overview of different MPS-based approaches used in clinical diagnostics of neuromuscular disorders (NMDs) and highlights their advantages and limitations. MPS-based approaches like gene panel sequencing, (whole) exome sequencing, (whole) genome sequencing, and RNA sequencing have been used to identify the genetic cause in NMDs. Although gene panel sequencing has evolved as a standard test for heterogeneous diseases, it is still debated, mainly because of financial issues and unsolved problems of variant interpretation, whether genome sequencing (and to a lesser extent also exome sequencing) of single patients can already be regarded as routine diagnostics. However, it has been shown that the inclusion of parents and additional family members often leads to a substantial increase in the diagnostic yield in exome-wide/genome-wide MPS approaches. In addition, MPS-based RNA sequencing just enters the research and diagnostic scene. Next-generation sequencing increasingly enables the detection of the genetic cause in highly heterogeneous diseases like NMDs in an efficient and affordable way. Gene panel sequencing and family-based exome sequencing have been proven as potent and cost-efficient diagnostic tools. Although clinical validation and interpretation of genome sequencing is still challenging, diagnostic RNA sequencing represents a promising tool to bypass some hurdles of diagnostics using genomic DNA.
Re-evaluating microglia expression profiles using RiboTag and cell isolation strategies.
Haimon, Zhana; Volaski, Alon; Orthgiess, Johannes; Boura-Halfon, Sigalit; Varol, Diana; Shemer, Anat; Yona, Simon; Zuckerman, Binyamin; David, Eyal; Chappell-Maor, Louise; Bechmann, Ingo; Gericke, Martin; Ulitsky, Igor; Jung, Steffen
2018-06-01
Transcriptome profiling is widely used to infer functional states of specific cell types, as well as their responses to stimuli, to define contributions to physiology and pathophysiology. Focusing on microglia, the brain's macrophages, we report here a side-by-side comparison of classical cell-sorting-based transcriptome sequencing and the 'RiboTag' method, which avoids cell retrieval from tissue context and yields translatome sequencing information. Conventional whole-cell microglial transcriptomes were found to be significantly tainted by artifacts introduced by tissue dissociation, cargo contamination and transcripts sequestered from ribosomes. Conversely, our data highlight the added value of RiboTag profiling for assessing the lineage accuracy of Cre recombinase expression in transgenic mice. Collectively, this study indicates method-based biases, reveals observer effects and establishes RiboTag-based translatome profiling as a valuable complement to standard sorting-based profiling strategies.
Leray, Matthieu; Knowlton, Nancy
2017-01-01
DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
Data-Driven Sequence of Changes to Anatomical Brain Connectivity in Sporadic Alzheimer's Disease.
Oxtoby, Neil P; Garbarino, Sara; Firth, Nicholas C; Warren, Jason D; Schott, Jonathan M; Alexander, Daniel C
2017-01-01
Model-based investigations of transneuronal spreading mechanisms in neurodegenerative diseases relate the pattern of pathology severity to the brain's connectivity matrix, which reveals information about how pathology propagates through the connectivity network. Such network models typically use networks based on functional or structural connectivity in young and healthy individuals, and only end-stage patterns of pathology, thereby ignoring/excluding the effects of normal aging and disease progression. Here, we examine the sequence of changes in the elderly brain's anatomical connectivity over the course of a neurodegenerative disease. We do this in a data-driven manner that is not dependent upon clinical disease stage, by using event-based disease progression modeling. Using data from the Alzheimer's Disease Neuroimaging Initiative dataset, we sequence the progressive decline of anatomical connectivity, as quantified by graph-theory metrics, in the Alzheimer's disease brain. Ours is the first single model to contribute to understanding all three of the nature, the location, and the sequence of changes to anatomical connectivity in the human brain due to Alzheimer's disease. Our experimental results reveal new insights into Alzheimer's disease: that degeneration of anatomical connectivity in the brain may be a viable, even early, biomarker and should be considered when studying such neurodegenerative diseases.
Waddington, Hannah; Sigafoos, Jeff; Lancioni, Giulio E; O'Reilly, Mark F; van der Meer, Larah; Carnett, Amarie; Stevens, Michelle; Roche, Laura; Hodis, Flaviu; Green, Vanessa A; Sutherland, Dean; Lang, Russell; Marschik, Peter B
2014-12-01
Many children with autism spectrum disorder (ASD) have limited or absent speech and might therefore benefit from learning to use a speech-generating device (SGD). The purpose of this study was to evaluate a procedure aimed at teaching three children with ASD to use an iPad(®)-based SGD to make a general request for access to toys, then make a specific request for one of two toys, and then communicate a thank-you response after receiving the requested toy. A multiple-baseline across participants design was used to determine whether systematic instruction involving least-to-most-prompting, time delay, error correction, and reinforcement was effective in teaching the three children to engage in this requesting and social communication sequence. Generalization and follow-up probes were conducted for two of the three participants. With intervention, all three children showed improvement in performing the communication sequence. This improvement was maintained with an unfamiliar communication partner and during the follow-up sessions. With systematic instruction, children with ASD and severe communication impairment can learn to use an iPad-based SGD to complete multi-step communication sequences that involve requesting and social communication functions. Copyright © 2014 ISDN. Published by Elsevier Ltd. All rights reserved.
Giessner-Prettre, C; Ribas Prado, F; Pullman, B; Kan, L; Kast, J R; Ts'o, P O
1981-01-01
A FORTRAN computer program called SHIFTS is described. Through SHIFTS, one can calculate the NMR chemical shifts of the proton resonances of single and double-stranded nucleic acids of known sequences and of predetermined conformations. The program can handle RNA and DNA for an arbitrary sequence of a set of 4 out of the 6 base types A,U,G,C,I and T. Data files for the geometrical parameters are available for A-, A'-, B-, D- and S-conformations. The positions of all the atoms are calculated using a modified version of the SEQ program [1]. Then, based on this defined geometry three chemical shift effects exerted by the atoms of the neighboring nucleotides on the protons of each monomeric unit are calculated separately: the ring current shielding effect: the local atomic magnetic susceptibility effect (including both diamagnetic and paramagnetic terms); and the polarization or electric field effect. Results of the program are compared with experimental results for a gamma (ApApGpCpUpU) 2 helical duplex and with calculated results on this same helix based on model building of A'-form and B-form and on graphical procedure for evaluating the ring current effects.
The future scalability of pH-based genome sequencers: A theoretical perspective
NASA Astrophysics Data System (ADS)
Go, Jonghyun; Alam, Muhammad A.
2013-10-01
Sequencing of human genome is an essential prerequisite for personalized medicine and early prognosis of various genetic diseases. The state-of-art, high-throughput genome sequencing technologies provide improved sequencing; however, their reliance on relatively expensive optical detection schemes has prevented wide-spread adoption of the technology in routine care. In contrast, the recently announced pH-based electronic genome sequencers achieve fast sequencing at low cost because of the compatibility with the current microelectronics technology. While the progress in technology development has been rapid, the physics of the sequencing chips and the potential for future scaling (and therefore, cost reduction) remain unexplored. In this article, we develop a theoretical framework and a scaling theory to explain the principle of operation of the pH-based sequencing chips and use the framework to explore various perceived scaling limits of the technology related to signal to noise ratio, well-to-well crosstalk, and sequencing accuracy. We also address several limitations inherent to the key steps of pH-based genome sequencers, which are widely shared by many other sequencing platforms in the market but remained unexplained properly so far.
A generic, cost-effective, and scalable cell lineage analysis platform
Biezuner, Tamir; Spiro, Adam; Raz, Ofir; Amir, Shiran; Milo, Lilach; Adar, Rivka; Chapal-Ilani, Noa; Berman, Veronika; Fried, Yael; Ainbinder, Elena; Cohen, Galit; Barr, Haim M.; Halaban, Ruth; Shapiro, Ehud
2016-01-01
Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery. PMID:27558250
Validation of two ribosomal RNA removal methods for microbial metatranscriptomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
He, Shaomei; Wurtzel, Omri; Singh, Kanwar
2010-10-01
The predominance of rRNAs in the transcriptome is a major technical challenge in sequence-based analysis of cDNAs from microbial isolates and communities. Several approaches have been applied to deplete rRNAs from (meta)transcriptomes, but no systematic investigation of potential biases introduced by any of these approaches has been reported. Here we validated the effectiveness and fidelity of the two most commonly used approaches, subtractive hybridization and exonuclease digestion, as well as combinations of these treatments, on two synthetic five-microorganism metatranscriptomes using massively parallel sequencing. We found that the effectiveness of rRNA removal was a function of community composition and RNA integritymore » for these treatments. Subtractive hybridization alone introduced the least bias in relative transcript abundance, whereas exonuclease and in particular combined treatments greatly compromised mRNA abundance fidelity. Illumina sequencing itself also can compromise quantitative data analysis by introducing a G+C bias between runs.« less
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fleischmann, R.D.; Adams, M.D.; White, O.
1995-07-28
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Sequence-structure mapping errors in the PDB: OB-fold domains
Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee
2004-01-01
The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Zhang, Yan; Zhao, Fuzheng; Deng, Yongfeng; Zhao, Yanping; Ren, Hongqiang
2015-04-03
Disinfection byproducts (DBPs) in drinking water have been linked to various diseases, including colon, colorectal, rectal, and bladder cancer. Trichloroacetamide (TCAcAm) is an emerging nitrogenous DBP, and our previous study found that TCAcAm could induce some changes associated with host-gut microbiota co-metabolism. In this study, we used an integrated approach combining metagenomics, based on high-throughput sequencing, and metabolomics, based on nuclear magnetic resonance (NMR), to evaluate the toxic effects of TCAcAm exposure on the gut microbiome and urine metabolome. High-throughput sequencing revealed that the gut microbiome's composition and function were significantly altered after TCAcAm exposure for 90 days in Mus musculus mice. In addition, metabolomic analysis showed that a number of gut microbiota-related metabolites were dramatically perturbed in the urine of the mice. These results may provide novel insight into evaluating the health risk of environmental pollutants as well as revealing the potential mechanism of TCAcAm's toxic effects.
Sequence-specific bias correction for RNA-seq data using recurrent neural networks.
Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2017-01-25
The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.
Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou
2018-03-27
P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
Zhang, Yue; Feng, Shiqian; Zeng, Yiying; Ning, Hong; Liu, Lijun; Zhao, Zihua; Jiang, Fan; Li, Zhihong
2018-06-23
Bactrocera tsuneonis (Miyake), generally known as the Japanese orange fly, is considered to be a major pest of commercial citrus crops. It has a limited distribution in China, Japan and Vietnam, but it has the potential to invade areas outside of Asia. More genetic information of B. tsuneonis should be obtained in order to develop effective methodologies for rapid and accurate molecular identification due to the difficulty of distinguishing it from Bactrocera minax based on morphological features. We report here the whole mitochondrial genome of B. tsuneonis sequenced by next-generation sequencing. This mitogenome sequence had a total length of 15,865 bp, a typical circular molecule comprising 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The structure and organization of the molecule were typical and similar compared with the published homologous sequences of other fruit flies in Tephritidae. The phylogenetic analyses based on the mitochondrial genome data presented a close genetic relationship between B. tsuneonis and B. minax. This is the first report of the complete mitochondrial genome of B. tsuneonis, and it can be used in further studies of species diagnosis, evolutionary biology, prevention and control. Copyright © 2018. Published by Elsevier B.V.
Tracing cell lineages in videos of lens-free microscopy.
Rempfler, Markus; Stierle, Valentin; Ditzel, Konstantin; Kumar, Sanjeev; Paulitschke, Philipp; Andres, Bjoern; Menze, Bjoern H
2018-06-05
In vitro experiments with cultured cells are essential for studying their growth and migration pattern and thus, for gaining a better understanding of cancer progression and its treatment. Recent progress in lens-free microscopy (LFM) has rendered it an inexpensive tool for label-free, continuous live cell imaging, yet there is only little work on analysing such time-lapse image sequences. We propose (1) a cell detector for LFM images based on fully convolutional networks and residual learning, and (2) a probabilistic model based on moral lineage tracing that explicitly handles multiple detections and temporal successor hypotheses by clustering and tracking simultaneously. (3) We benchmark our method in terms of detection and tracking scores on a dataset of three annotated sequences of several hours of LFM, where we demonstrate our method to produce high quality lineages. (4) We evaluate its performance on a somewhat more challenging problem: estimating cell lineages from the LFM sequence as would be possible from a corresponding fluorescence microscopy sequence. We present experiments on 16 LFM sequences for which we acquired fluorescence microscopy in parallel and generated annotations from them. Finally, (5) we showcase our methods effectiveness for quantifying cell dynamics in an experiment with skin cancer cells. Copyright © 2018 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Guotian; Jain, Rashmi; Chern, Mawsheng
The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake (Oryza sativa ssp japonica), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportionmore » of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. In conclusion, this work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations.« less
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei
2013-01-01
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly.
Garcia, T I; Shen, Y; Catchen, J; Amores, A; Schartl, M; Postlethwait, J; Walter, R B
2012-01-01
For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set. Published by Elsevier Inc.
Boosting antibody developability through rational sequence optimization.
Seeliger, Daniel; Schulz, Patrick; Litzenburger, Tobias; Spitz, Julia; Hoerer, Stefan; Blech, Michaela; Enenkel, Barbara; Studts, Joey M; Garidel, Patrick; Karow, Anne R
2015-01-01
The application of monoclonal antibodies as commercial therapeutics poses substantial demands on stability and properties of an antibody. Therapeutic molecules that exhibit favorable properties increase the success rate in development. However, it is not yet fully understood how the protein sequences of an antibody translates into favorable in vitro molecule properties. In this work, computational design strategies based on heuristic sequence analysis were used to systematically modify an antibody that exhibited a tendency to precipitation in vitro. The resulting series of closely related antibodies showed improved stability as assessed by biophysical methods and long-term stability experiments. As a notable observation, expression levels also improved in comparison with the wild-type candidate. The methods employed to optimize the protein sequences, as well as the biophysical data used to determine the effect on stability under conditions commonly used in the formulation of therapeutic proteins, are described. Together, the experimental and computational data led to consistent conclusions regarding the effect of the introduced mutations. Our approach exemplifies how computational methods can be used to guide antibody optimization for increased stability.
Iterative refinement of structure-based sequence alignments by Seed Extension
Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook
2009-01-01
Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133
Googling DNA sequences on the World Wide Web.
Hajibabaei, Mehrdad; Singer, Gregory A C
2009-11-10
New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
Shin, Jeong Hong; Jung, Soobin; Ramakrishna, Suresh; Kim, Hyongbum Henry; Lee, Junwon
2018-07-07
Genome editing technology using programmable nucleases has rapidly evolved in recent years. The primary mechanism to achieve precise integration of a transgene is mainly based on homology-directed repair (HDR). However, an HDR-based genome-editing approach is less efficient than non-homologous end-joining (NHEJ). Recently, a microhomology-mediated end-joining (MMEJ)-based transgene integration approach was developed, showing feasibility both in vitro and in vivo. We expanded this method to achieve targeted sequence substitution (TSS) of mutated sequences with normal sequences using double-guide RNAs (gRNAs), and a donor template flanking the microhomologies and target sequence of the gRNAs in vitro and in vivo. Our method could realize more efficient sequence substitution than the HDR-based method in vitro using a reporter cell line, and led to the survival of a hereditary tyrosinemia mouse model in vivo. The proposed MMEJ-based TSS approach could provide a novel therapeutic strategy, in addition to HDR, to achieve gene correction from a mutated sequence to a normal sequence. Copyright © 2018 Elsevier Inc. All rights reserved.
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Sequence Determinants of Compaction in Intrinsically Disordered Proteins
Marsh, Joseph A.; Forman-Kay, Julie D.
2010-01-01
Abstract Intrinsically disordered proteins (IDPs), which lack folded structure and are disordered under nondenaturing conditions, have been shown to perform important functions in a large number of cellular processes. These proteins have interesting structural properties that deviate from the random-coil-like behavior exhibited by chemically denatured proteins. In particular, IDPs are often observed to exhibit significant compaction. In this study, we have analyzed the hydrodynamic radii of a number of IDPs to investigate the sequence determinants of this compaction. Net charge and proline content are observed to be strongly correlated with increased hydrodynamic radii, suggesting that these are the dominant contributors to compaction. Hydrophobicity and secondary structure, on the other hand, appear to have negligible effects on compaction, which implies that the determinants of structure in folded and intrinsically disordered proteins are profoundly different. Finally, we observe that polyhistidine tags seem to increase IDP compaction, which suggests that these tags have significant perturbing effects and thus should be removed before any structural characterizations of IDPs. Using the relationships observed in this analysis, we have developed a sequence-based predictor of hydrodynamic radius for IDPs that shows substantial improvement over a simple model based upon chain length alone. PMID:20483348
Biophysics of protein evolution and evolutionary protein biophysics
Sikosek, Tobias; Chan, Hue Sun
2014-01-01
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599
Phytophthora-ID.org: A sequence-based Phytophthora identification tool
N.J. Grünwald; F.N. Martin; M.M. Larsen; C.M. Sullivan; C.M. Press; M.D. Coffey; E.M. Hansen; J.L. Parke
2010-01-01
Contemporary species identification relies strongly on sequence-based identification, yet resources for identification of many fungal and oomycete pathogens are rare. We developed two web-based, searchable databases for rapid identification of Phytophthora spp. based on sequencing of the internal transcribed spacer (ITS) or the cytochrome oxidase...
Mixing by Unstirring: Hyperuniform Dispersion of Interacting Particles upon Chaotic Advection
NASA Astrophysics Data System (ADS)
Weijs, Joost H.; Bartolo, Denis
2017-07-01
We show how to achieve both fast and hyperuniform dispersions of particles in viscous fluids. To do so, we first extend the concept of critical random organization to chaotic drives. We show how palindromic sequences of chaotic advection cause microscopic particles to effectively interact at long range, thereby inhibiting critical self-organization. Based on this understanding we go around this limitation and design sequences of stirring and unstirring which simultaneously optimize the speed of particle spreading and the homogeneity of the resulting dispersions.
Targeted DNA sequencing and in situ mutation analysis using mobile phone microscopy
Kühnemund, Malte; Wei, Qingshan; Darai, Evangelia; Wang, Yingjie; Hernández-Neuta, Iván; Yang, Zhao; Tseng, Derek; Ahlford, Annika; Mathot, Lucy; Sjöblom, Tobias; Ozcan, Aydogan; Nilsson, Mats
2017-01-01
Molecular diagnostics is typically outsourced to well-equipped centralized laboratories, often far from the patient. We developed molecular assays and portable optical imaging designs that permit on-site diagnostics with a cost-effective mobile-phone-based multimodal microscope. We demonstrate that targeted next-generation DNA sequencing reactions and in situ point mutation detection assays in preserved tumour samples can be imaged and analysed using mobile phone microscopy, achieving a new milestone for tele-medicine technologies. PMID:28094784
A Fast Event Preprocessor and Sequencer for the Simbol-X Low Energy Detector
NASA Astrophysics Data System (ADS)
Schanz, T.; Tenzer, C.; Maier, D.; Kendziorra, E.; Santangelo, A.
2009-05-01
The Simbol-X Low Energy Detector (LED), a 128×128 pixel DEPFET (Depleted Field Effect Transistor) array, will be read out at a very high rate (8000 frames/second) and, therefore, requires a very fast on board electronics. We present an FPGA-based LED camera electronics consisting of an Event Preprocessor (EPP) for on board data preprocessing and filtering of the Simbol-X low-energy detector and a related Sequencer (SEQ) to generate the necessary signals to control the readout.
Student Mental Models of the Greenhouse Effect: Retention Months After Interventions
NASA Astrophysics Data System (ADS)
Harris, S. E.; Gold, A. U.
2013-12-01
Individual understanding of climate science, and the greenhouse effect in particular, is one factor important for societal decision-making. Ideally, learning opportunities about the greenhouse effect will not only move people toward expert-like ideas but will also have long-lasting effects for those individuals. We assessed university students' mental models of the greenhouse effect before and after specific learning experiences, on a final exam, then again a few months later. Our aim was to measure retention after students had not necessarily been thinking about, nor studying, the greenhouse effect recently. How sticky were the ideas learned? 164 students in an introductory science course participated in a sequence of two learning activities and assessments regarding the greenhouse effect. The first lesson involved the full class, then, for the second lesson, half the students completed a simulation-based activity and the other half completed a data-driven activity. We assessed student thinking through concept sketches, multiple choice and short answer questions. All students generated concept sketches four times, and completed a set of multiple choice (MCQs) and short answer questions twice. Later, 3-4 months after the course ended, 27 students ('retention students') completed an additional concept sketch and answered the questions again, as a retention assessment. These 27 students were nearly evenly split between the two contrasting second lessons in the sequence and included both high and low-achieving students. We then compared student sketches and scores to 'expert' answers. The general pattern over time showed a significant increase in student scores from before the lesson sequence to after, both on concept sketches and MCQs, then an additional increase in concept sketch score on the final exam (MCQs were not asked on the final exam). The scores for the retention students were not significantly different from the full class. Within the retention group, there was also no difference in scores based on which contrasting lesson a student did. Students in both of the contrasting lessons scored significantly higher on the retention test than on the initial pre-test. Their concept sketch scores on the retention test were slightly lower than their scores on the final exam (not significantly), but matched their post-lesson-sequence scores. Their MCQ scores were slightly higher on the retention test than on the post-lesson-sequence test (also not significantly). These results imply that students both learned and retained new ideas about the greenhouse effect for at least a few months after the end of the course and did not regress to their pre-lesson ideas. Further analysis should show which particular aspects of student mental models changed over the full temporal sequence.
NASA Astrophysics Data System (ADS)
Bódi, Erika; Buday, Tamás; McIntosh, Richard William
2013-04-01
Defining extraction-modified flow patterns with hydrodynamic models is a pivotal question in preserving groundwater resources regarding both quality and quantity. Modeling is the first step in groundwater protection the main result of which is the determination of the protective area depending on the amount of extracted water. Solid models have significant effects on hydrodynamic models as they are based on the solid models. Due to the legislative regulations, on protection areas certain restrictions must be applied which has firm consequences on economic activities. In Hungarian regulations there are no clear instructions for the establishment of either geological or hydrodynamic modeling, however, modeling itself is an obligation. Choosing the modeling method is a key consideration for further numerical calculations and it is decisive regarding the shape and size of the groundwater protection area. The geometry of hydrodynamic model layers is derived from the solid model. There are different geological approaches including lithological and sequence stratigraphic classifications furthermore in the case of regional models, formation-based hydrostratigraphic units are also applicable. Lithological classification is based on assigning and mapping of lithotypes. When the geometry (e.g. tectonic characteristics) of the research area is not known, horizontal bedding is assumed the probability of which can not be assessed based on only lithology. If the geological correlation is based on sequence stratigraphic studies, the cyclicity of sediment deposition is also considered. This method is more integrated thus numerous parameters (e.g. electrofacies) are taken into consideration studying the geological conditions ensuring more reliable modeling. Layers of sequence stratigraphic models can be either lithologically homogeneous or they may include greater cycles of sediments containing therefore several lithological units. The advantage of this is that the modeling can handle pinching out lithological units and lenticular bodies easier while most hydrodynamic softwares cannot handle flow units related to such model layers. Interpretation of tectonic disturbance is similar. In Hungary groundwater is extracted mainly from Pleistocene and Pannonian aquifers sediments of which were deposited in the ancient Pannonian Lake. When the basin lost its open-marine connection eustasy had no direct effects on facies changes therefore subsidence and sediment supply became the main factors. Various basin-filling related facies developed including alluvial plain facies, different delta facies types and pelitic deep-basin facies. Creating solid models based on sequence stratigraphic methods requires more raw data and also genetic approaches, in addition more working hours hence this method is seldom used in practice. Lithology-based models can be transformed into sequence stratigraphic models by extending the data base (e.g. detecting more survey data). In environments where the obtained models differ significantly notable changes can occur in the supply directions in addition the groundwater travel-time of the two models even on equal extraction terms. Our study aims to call attention to the consequences of using different solid models for typical depositional systems of the Great Hungarian Plain and to their effects on groundwater protection.
Mining for class-specific motifs in protein sequence classification
2013-01-01
Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms. PMID:23496846
Manigart, Olivier; Boeras, Debrah I; Karita, Etienne; Hawkins, Paulina A; Vwalika, Cheswa; Makombe, Nathan; Mulenga, Joseph; Derdeyn, Cynthia A; Allen, Susan; Hunter, Eric
2012-12-01
A critical step in HIV-1 transmission studies is the rapid and accurate identification of epidemiologically linked transmission pairs. To date, this has been accomplished by comparison of polymerase chain reaction (PCR)-amplified nucleotide sequences from potential transmission pairs, which can be cost-prohibitive for use in resource-limited settings. Here we describe a rapid, cost-effective approach to determine transmission linkage based on the heteroduplex mobility assay (HMA), and validate this approach by comparison to nucleotide sequencing. A total of 102 HIV-1-infected Zambian and Rwandan couples, with known linkage, were analyzed by gp41-HMA. A 400-base pair fragment within the envelope gp41 region of the HIV proviral genome was PCR amplified and HMA was applied to both partners' amplicons separately (autologous) and as a mixture (heterologous). If the diversity between gp41 sequences was low (<5%), a homoduplex was observed upon gel electrophoresis and the transmission was characterized as having occurred between partners (linked). If a new heteroduplex formed, within the heterologous migration, the transmission was determined to be unlinked. Initial blind validation of gp-41 HMA demonstrated 90% concordance between HMA and sequencing with 100% concordance in the case of linked transmissions. Following validation, 25 newly infected partners in Kigali and 12 in Lusaka were evaluated prospectively using both HMA and nucleotide sequences. Concordant results were obtained in all but one case (97.3%). The gp41-HMA technique is a reliable and feasible tool to detect linked transmissions in the field. All identified unlinked results should be confirmed by sequence analyses.
Repeated-Sprint Sequences During Female Soccer Matches Using Fixed and Individual Speed Thresholds.
Nakamura, Fábio Y; Pereira, Lucas A; Loturco, Irineu; Rosseti, Marcelo; Moura, Felipe A; Bradley, Paul S
2017-07-01
Nakamura, FY, Pereira, LA, Loturco, I, Rosseti, M, Moura, FA, and Bradley, PS. Repeated-sprint sequences during female soccer matches using fixed and individual speed thresholds. J Strength Cond Res 31(7): 1802-1810, 2017-The main objective of this study was to characterize the occurrence of single sprint and repeated-sprint sequences (RSS) during elite female soccer matches, using fixed (20 km·h) and individually based speed thresholds (>90% of the mean speed from a 20-m sprint test). Eleven elite female soccer players from the same team participated in the study. All players performed a 20-m linear sprint test, and were assessed in up to 10 official matches using Global Positioning System technology. Magnitude-based inferences were used to test for meaningful differences. Results revealed that irrespective of adopting fixed or individual speed thresholds, female players produced only a few RSS during matches (2.3 ± 2.4 sequences using the fixed threshold and 3.3 ± 3.0 sequences using the individually based threshold), with most sequences composing of just 2 sprints. Additionally, central defenders performed fewer sprints (10.2 ± 4.1) than other positions (fullbacks: 28.1 ± 5.5; midfielders: 21.9 ± 10.5; forwards: 31.9 ± 11.1; with the differences being likely to almost certainly associated with effect sizes ranging from 1.65 to 2.72), and sprinting ability declined in the second half. The data do not support the notion that RSS occurs frequently during soccer matches in female players, irrespective of using fixed or individual speed thresholds to define sprint occurrence. However, repeated-sprint ability development cannot be ruled out from soccer training programs because of its association with match-related performance.
A Novel Partial Sequence Alignment Tool for Finding Large Deletions
Aruk, Taner; Ustek, Duran; Kursun, Olcay
2012-01-01
Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777
Social network analysis of a project-based introductory physics course
NASA Astrophysics Data System (ADS)
Oakley, Christopher
2016-03-01
Research suggests that students benefit from peer interaction and active engagement in the classroom. The quality, nature, effect of these interactions is currently being explored by Physics Education Researchers. Spelman College offers an introductory physics sequence that addresses content and research skills by engaging students in open-ended research projects, a form of Project-Based Learning. Students have been surveyed at regular intervals during the second semester of trigonometry-based course to determine the frequency of interactions in and out of class. These interactions can be with current or past students, tutors, and instructors. This line of inquiry focuses on metrics of Social Network analysis, such as centrality of participants as well as segmentation of groups. Further research will refine and highlight deeper questions regarding student performance in this pedagogy and course sequence.
Research on wind field algorithm of wind lidar based on BP neural network and grey prediction
NASA Astrophysics Data System (ADS)
Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei
2018-01-01
This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.
Li, Yantao; Fu, Tuo; Liu, Tao; Guo, Huaizu; Guo, Qingcheng; Xu, Jin; Zhang, Dapeng; Qian, Weizhu; Dai, Jianxin; Li, Bohua; Guo, Yajun; Hou, Sheng; Wang, Hao
2016-07-01
Nivolumab is a therapeutic fully human IgG4 antibody to programmed death 1 (PD-1). In this study, a nivolumab biosimilar, which was produced in our laboratory, was analyzed and characterized. Sequence variants that contain undesired amino acid sequences may cause concern during biosimilar bioprocess development. We found that low levels of sequence variants were detected in the heavy chain of the nivolumab biosimilar by ultra performance liquid chromatography (UPLC) and tandem mass spectrometry. It was further identified with UPLC-MS/MS by IdeS or trypsin digestion. The sequence variant was confirmed through addition of synthetic mutant peptide. Subsequently, the mixing base signal of normal and mutant sequence was detected through DNA sequencing. The relative levels of mutant A424V in the Fc region of the heavy chain have been detected and demonstrated to be 12.25% and 13.54%, via base peak intensity (BPI) and UV chromatography of the tryptic peptide mapping, respectively. A424V variant was also quantified by real-time PCR (RT-PCR) at the DNA and RNA level, which was 19.2% and 16.8%, respectively. The relative content of the mutant was consistent at the DNA, RNA and protein level, indicating that the A424V mutation may have little influence at transcriptional or translational levels. These results demonstrate that orthogonal state-of-the-art techniques such as LC- UV- MS and RT-PCR should be implemented to characterize recombinant proteins and cell lines for development of biosimilars. Our study suggests that it is important to establish an integrated and effective analytical method to monitor and characterize sequence variants during antibody drug development, especially for antibody biosimilar products.
Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G
2018-01-01
Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.
Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua
2018-03-01
More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences
Zhu, Youding; Fujimura, Kikuo
2010-01-01
This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from a depth image sequence is challenging due to the need to resolve the depth ambiguity caused by self-occlusions and the difficulty to recover from tracking failure. Human body poses could be estimated through model fitting using dense correspondences between depth data and an articulated human model (local optimization method). Although it usually achieves a high accuracy due to dense correspondences, it may fail to recover from tracking failure. Alternately, human pose may be reconstructed by detecting and tracking human body anatomical landmarks (key-points) based on low-level depth image analysis. While this method (key-point based method) is robust and recovers from tracking failure, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian framework for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed approach. PMID:22399933
Tracking tumor boundary in MV-EPID images without implanted markers: A feasibility study.
Zhang, Xiaoyong; Homma, Noriyasu; Ichiji, Kei; Takai, Yoshihiro; Yoshizawa, Makoto
2015-05-01
To develop a markerless tracking algorithm to track the tumor boundary in megavoltage (MV)-electronic portal imaging device (EPID) images for image-guided radiation therapy. A level set method (LSM)-based algorithm is developed to track tumor boundary in EPID image sequences. Given an EPID image sequence, an initial curve is manually specified in the first frame. Driven by a region-scalable energy fitting function, the initial curve automatically evolves toward the tumor boundary and stops on the desired boundary while the energy function reaches its minimum. For the subsequent frames, the tracking algorithm updates the initial curve by using the tracking result in the previous frame and reuses the LSM to detect the tumor boundary in the subsequent frame so that the tracking processing can be continued without user intervention. The tracking algorithm is tested on three image datasets, including a 4-D phantom EPID image sequence, four digitally deformable phantom image sequences with different noise levels, and four clinical EPID image sequences acquired in lung cancer treatment. The tracking accuracy is evaluated based on two metrics: centroid localization error (CLE) and volume overlap index (VOI) between the tracking result and the ground truth. For the 4-D phantom image sequence, the CLE is 0.23 ± 0.20 mm, and VOI is 95.6% ± 0.2%. For the digital phantom image sequences, the total CLE and VOI are 0.11 ± 0.08 mm and 96.7% ± 0.7%, respectively. In addition, for the clinical EPID image sequences, the proposed algorithm achieves 0.32 ± 0.77 mm in the CLE and 72.1% ± 5.5% in the VOI. These results demonstrate the effectiveness of the authors' proposed method both in tumor localization and boundary tracking in EPID images. In addition, compared with two existing tracking algorithms, the proposed method achieves a higher accuracy in tumor localization. In this paper, the authors presented a feasibility study of tracking tumor boundary in EPID images by using a LSM-based algorithm. Experimental results conducted on phantom and clinical EPID images demonstrated the effectiveness of the tracking algorithm for visible tumor target. Compared with previous tracking methods, the authors' algorithm has the potential to improve the tracking accuracy in radiation therapy. In addition, real-time tumor boundary information within the irradiation field will be potentially useful for further applications, such as adaptive beam delivery, dose evaluation.
Tracking tumor boundary in MV-EPID images without implanted markers: A feasibility study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Xiaoyong, E-mail: xiaoyong@ieee.org; Homma, Noriyasu, E-mail: homma@ieee.org; Ichiji, Kei, E-mail: ichiji@yoshizawa.ecei.tohoku.ac.jp
2015-05-15
Purpose: To develop a markerless tracking algorithm to track the tumor boundary in megavoltage (MV)-electronic portal imaging device (EPID) images for image-guided radiation therapy. Methods: A level set method (LSM)-based algorithm is developed to track tumor boundary in EPID image sequences. Given an EPID image sequence, an initial curve is manually specified in the first frame. Driven by a region-scalable energy fitting function, the initial curve automatically evolves toward the tumor boundary and stops on the desired boundary while the energy function reaches its minimum. For the subsequent frames, the tracking algorithm updates the initial curve by using the trackingmore » result in the previous frame and reuses the LSM to detect the tumor boundary in the subsequent frame so that the tracking processing can be continued without user intervention. The tracking algorithm is tested on three image datasets, including a 4-D phantom EPID image sequence, four digitally deformable phantom image sequences with different noise levels, and four clinical EPID image sequences acquired in lung cancer treatment. The tracking accuracy is evaluated based on two metrics: centroid localization error (CLE) and volume overlap index (VOI) between the tracking result and the ground truth. Results: For the 4-D phantom image sequence, the CLE is 0.23 ± 0.20 mm, and VOI is 95.6% ± 0.2%. For the digital phantom image sequences, the total CLE and VOI are 0.11 ± 0.08 mm and 96.7% ± 0.7%, respectively. In addition, for the clinical EPID image sequences, the proposed algorithm achieves 0.32 ± 0.77 mm in the CLE and 72.1% ± 5.5% in the VOI. These results demonstrate the effectiveness of the authors’ proposed method both in tumor localization and boundary tracking in EPID images. In addition, compared with two existing tracking algorithms, the proposed method achieves a higher accuracy in tumor localization. Conclusions: In this paper, the authors presented a feasibility study of tracking tumor boundary in EPID images by using a LSM-based algorithm. Experimental results conducted on phantom and clinical EPID images demonstrated the effectiveness of the tracking algorithm for visible tumor target. Compared with previous tracking methods, the authors’ algorithm has the potential to improve the tracking accuracy in radiation therapy. In addition, real-time tumor boundary information within the irradiation field will be potentially useful for further applications, such as adaptive beam delivery, dose evaluation.« less
Control method of Three-phase Four-leg converter based on repetitive control
NASA Astrophysics Data System (ADS)
Hui, Wang
2018-03-01
The research chose the magnetic levitation force of wind power generation system as the object. In order to improve the power quality problem caused by unbalanced load in power supply system, we combined the characteristics and repetitive control principle of magnetic levitation wind power generation system, and then an independent control strategy for three-phase four-leg converter was proposed. In this paper, based on the symmetric component method, the second order generalized integrator was used to generate the positive and negative sequence of signals, and the decoupling control was carried out under the synchronous rotating reference frame, in which the positive and negative sequence voltage is PI double closed loop, and a PI regulator with repetitive control was introduced to eliminate the static error regarding the fundamental frequency fluctuation characteristic of zero sequence component. The simulation results based on Matlab/Simulink show that the proposed control project can effectively suppress the disturbance caused by unbalanced loads and maintain the load voltage balance. The project is easy to be achieved and remarkably improves the quality of the independent power supply system.
Peng, Yousong; Yang, Lei; Li, Honglei; Zou, Yuanqiang; Deng, Lizong; Wu, Aiping; Du, Xiangjun; Wang, Dayan; Shu, Yuelong; Jiang, Taijiao
2016-08-15
Timely surveillance of the antigenic dynamics of the influenza virus is critical for accurate selection of vaccine strains, which is important for effective prevention of viral spread and infection. Here, we provide a computational platform, called PREDAC-H3, for antigenic surveillance of human influenza A(H3N2) virus based on the sequence of surface protein hemagglutinin (HA). PREDAC-H3 not only determines the antigenic variants and antigenic cluster (grouped for similar antigenicity) to which the virus belongs, based on HA sequences, but also allows visualization of the spatial distribution and temporal dynamics of antigenic clusters of viruses isolated from around the world, thus assisting in antigenic surveillance of human influenza A(H3N2) virus. It is publicly available from: http://biocloud.hnu.edu.cn/influ411/html/index.php : yshu@cnic.org.cn or taijiao@moon.ibp.ac.cn. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Measuring patterns in team interaction sequences using a discrete recurrence approach.
Gorman, Jamie C; Cooke, Nancy J; Amazeen, Polemnia G; Fouse, Shannon
2012-08-01
Recurrence-based measures of communication determinism and pattern information are described and validated using previously collected team interaction data. Team coordination dynamics has revealed that"mixing" team membership can lead to flexible interaction processes, but keeping a team "intact" can lead to rigid interaction processes. We hypothesized that communication of intact teams would have greater determinism and higher pattern information compared to that of mixed teams. Determinism and pattern information were measured from three-person Uninhabited Air Vehicle team communication sequences over a series of 40-minute missions. Because team members communicated using push-to-talk buttons, communication sequences were automatically generated during each mission. The Composition x Mission determinism effect was significant. Intact teams' determinism increased over missions, whereas mixed teams' determinism did not change. Intact teams had significantly higher maximum pattern information than mixed teams. Results from these new communication analysis methods converge with content-based methods and support our hypotheses. Because they are not content based, and because they are automatic and fast, these new methods may be amenable to real-time communication pattern analysis.
Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku
2014-01-01
Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.
Programming for Effective Concept Learning: Where Should the Branches Go and Why?
ERIC Educational Resources Information Center
Driscoll, Marcy P.; And Others
The two studies detailed in this paper investigated the effects of adaptive sequencing of examples and adaptive feedback on concept learning using computer-based instruction. In the first study, two groups of undergraduate students progressed through a set of five behavior management concepts presented in the rational set generator framework.…
Sargsyan, Ori
2012-05-25
Hitchhiking and severe bottleneck effects have impact on the dynamics of genetic diversity of a population by inducing homogenization at a single locus and at the genome-wide scale, respectively. As a result, identification and differentiation of the signatures of such events from DNA sequence data at a single locus is challenging. This study develops an analytical framework for identifying and differentiating recent homogenization events at multiple neutral loci in low recombination regions. The dynamics of genetic diversity at a locus after a recent homogenization event is modeled according to the infinite-sites mutation model and the Wright-Fisher model of reproduction withmore » constant population size. In this setting, I derive analytical expressions for the distribution, mean, and variance of the number of polymorphic sites in a random sample of DNA sequences from a locus affected by a recent homogenization event. Based on this framework, three likelihood-ratio based tests are presented for identifying and differentiating recent homogenization events at multiple loci. Lastly, I apply the framework to two data sets. First, I consider human DNA sequences from four non-coding loci on different chromosomes for inferring evolutionary history of modern human populations. The results suggest, in particular, that recent homogenization events at the loci are identifiable when the effective human population size is 50000 or greater in contrast to 10000, and the estimates of the recent homogenization events are agree with the “Out of Africa” hypothesis. Second, I use HIV DNA sequences from HIV-1-infected patients to infer the times of HIV seroconversions. The estimates are contrasted with other estimates derived as the mid-time point between the last HIV-negative and first HIV-positive screening tests. Finally, the results show that significant discrepancies can exist between the estimates.« less
Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia
2017-04-01
To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Mulkern, Robert; Haker, Steven; Mamata, Hatsuho; Lee, Edward; Mitsouras, Dimitrios; Oshio, Koichi; Balasubramanian, Mukund; Hatabu, Hiroto
2014-03-01
Lung parenchyma is challenging to image with proton MRI. The large air space results in ~l/5th as many signal-generating protons compared to other organs. Air/tissue magnetic susceptibility differences lead to strong magnetic field gradients throughout the lungs and to broad frequency distributions, much broader than within other organs. Such distributions have been the subject of experimental and theoretical analyses which may reveal aspects of lung microarchitecture useful for diagnosis. Their most immediate relevance to current imaging practice is to cause rapid signal decays, commonly discussed in terms of short T 2 * values of 1 ms or lower at typical imaging field strengths. Herein we provide a brief review of previous studies describing and interpreting proton lung spectra. We then link these broad frequency distributions to rapid signal decays, though not necessarily the exponential decays generally used to define T 2 * values. We examine how these decays influence observed signal intensities and spatial mapping features associated with the most prominent torso imaging sequences, including spoiled gradient and spin echo sequences. Effects of imperfect refocusing pulses on the multiple echo signal decays in single shot fast spin echo (SSFSE) sequences and effects of broad frequency distributions on balanced steady state free precession (bSSFP) sequence signal intensities are also provided. The theoretical analyses are based on the concept of explicitly separating the effects of reversible and irreversible transverse relaxation processes, thus providing a somewhat novel and more general framework from which to estimate lung signal intensity behavior in modern imaging practice.
MULKERN, ROBERT; HAKER, STEVEN; MAMATA, HATSUHO; LEE, EDWARD; MITSOURAS, DIMITRIOS; OSHIO, KOICHI; BALASUBRAMANIAN, MUKUND; HATABU, HIROTO
2014-01-01
Lung parenchyma is challenging to image with proton MRI. The large air space results in ~l/5th as many signal-generating protons compared to other organs. Air/tissue magnetic susceptibility differences lead to strong magnetic field gradients throughout the lungs and to broad frequency distributions, much broader than within other organs. Such distributions have been the subject of experimental and theoretical analyses which may reveal aspects of lung microarchitecture useful for diagnosis. Their most immediate relevance to current imaging practice is to cause rapid signal decays, commonly discussed in terms of short T2* values of 1 ms or lower at typical imaging field strengths. Herein we provide a brief review of previous studies describing and interpreting proton lung spectra. We then link these broad frequency distributions to rapid signal decays, though not necessarily the exponential decays generally used to define T2* values. We examine how these decays influence observed signal intensities and spatial mapping features associated with the most prominent torso imaging sequences, including spoiled gradient and spin echo sequences. Effects of imperfect refocusing pulses on the multiple echo signal decays in single shot fast spin echo (SSFSE) sequences and effects of broad frequency distributions on balanced steady state free precession (bSSFP) sequence signal intensities are also provided. The theoretical analyses are based on the concept of explicitly separating the effects of reversible and irreversible transverse relaxation processes, thus providing a somewhat novel and more general framework from which to estimate lung signal intensity behavior in modern imaging practice. PMID:25228852
Simulation-Based Evaluation of Learning Sequences for Instructional Technologies
ERIC Educational Resources Information Center
McEneaney, John E.
2016-01-01
Instructional technologies critically depend on systematic design, and learning hierarchies are a commonly advocated tool for designing instructional sequences. But hierarchies routinely allow numerous sequences and choosing an optimal sequence remains an unsolved problem. This study explores a simulation-based approach to modeling learning…
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.
Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred
2018-01-01
The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences
Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.
2012-01-01
ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136
Oh, Ja-Young; Do, Hyun Jung; Lee, Seungok; Jang, Ja-Hyun; Cho, Eun-Hae; Jang, Dae-Hyun
2016-12-01
Next-generation sequencing, such as whole-genome sequencing, whole-exome sequencing, and targeted panel sequencing have been applied for diagnosis of many genetic diseases, and are in the process of replacing the traditional methods of genetic analysis. Clinical exome sequencing (CES), which provides not only sequence variation data but also clinical interpretation, aids in reaching a final conclusion with regards to genetic diagnosis. Sequencing of genes with clinical relevance rather than whole exome sequencing might be more suitable for the diagnosis of known hereditary disease with genetic heterogeneity. Here, we present the clinical usefulness of CES for the diagnosis of hereditary spastic paraplegia (HSP). We report a case of patient who was strongly suspected of having HSP based on her clinical manifestations. HSP is one of the diseases with high genetic heterogeneity, the 72 different loci and 59 discovered genes identified so far. Therefore, traditional approach for diagnosis of HSP with genetic analysis is very challenging and time-consuming. CES with TruSight One Sequencing Panel, which enriches about 4,800 genes with clinical relevance, revealed compound heterozygous mutations in SPG11 . One workflow and one procedure can provide the results of genetic analysis, and CES with enrichment of clinically relevant genes is a cost-effective and time-saving diagnostic tool for diseases with genetic heterogeneity, including HSP.
Effect of base sequence on the DNA cross-linking properties of pyrrolobenzodiazepine (PBD) dimers
Rahman, Khondaker M.; James, Colin H.; Thurston, David E.
2011-01-01
Pyrrolo[2,1-c][1,4]benzodiazepine (PBD) dimers are synthetic sequence-selective DNA minor-groove cross-linking agents that possess two electrophilic imine moieties (or their equivalent) capable of forming covalent aminal linkages with guanine C2-NH2 functionalities. The PBD dimer SJG-136, which has a C8–O–(CH2)3–O–C8′′ central linker joining the two PBD moieties, is currently undergoing phase II clinical trials and current research is focused on developing analogues of SJG-136 with different linker lengths and substitution patterns. Using a reversed-phase ion pair HPLC/MS method to evaluate interaction with oligonucleotides of varying length and sequence, we recently reported (JACS, 2009, 131, 13 756) that SJG-136 can form three different types of adducts: inter- and intrastrand cross-linked adducts, and mono-alkylated adducts. These studies have now been extended to include PBD dimers with a longer central linker (C8–O–(CH2)5–O–C8′), demonstrating that the type and distribution of adducts appear to depend on (i) the length of the C8/C8′-linker connecting the two PBD units, (ii) the positioning of the two reactive guanine bases on the same or opposite strands, and (iii) their separation (i.e. the number of base pairs, usually ATs, between them). Based on these data, a set of rules are emerging that can be used to predict the DNA–interaction behaviour of a PBD dimer of particular C8–C8′ linker length towards a given DNA sequence. These observations suggest that it may be possible to design PBD dimers to target specific DNA sequences. PMID:21427082
Buchmueller, Karen L; Staples, Andrew M; Howard, Cameron M; Horick, Sarah M; Uthe, Peter B; Le, N Minh; Cox, Kari K; Nguyen, Binh; Pacheco, Kimberly A O; Wilson, W David; Lee, Moses
2005-01-19
Pyrrole (Py) and imidazole (Im) polyamides can be designed to target specific DNA sequences. The effect that the pyrrole and imidazole arrangement, plus DNA sequence, have on sequence specificity and binding affinity has been investigated using DNA melting (DeltaT(M)), circular dichroism (CD), and surface plasmon resonance (SPR) studies. SPR results obtained from a complete set of triheterocyclic polyamides show a dramatic difference in the affinity of f-ImPyIm for its cognate DNA (K(eq) = 1.9 x 10(8) M(-1)) and f-PyPyIm for its cognate DNA (K(eq) = 5.9 x 10(5) M(-1)), which could not have been anticipated prior to characterization of these compounds. Moreover, f-ImPyIm has a 10-fold greater affinity for CGCG than distamycin A has for its cognate, AATT. To understand this difference, the triamide dimers are divided into two structural groupings: central and terminal pairings. The four possible central pairings show decreasing selectivity and affinity for their respective cognate sequences: -ImPy > -PyPy- > -PyIm- approximately -ImIm-. These results extend the language of current design motifs for polyamide sequence recognition to include the use of "words" for recognizing two adjacent base pairs, rather than "letters" for binding to single base pairs. Thus, polyamides designed to target Watson-Crick base pairs should utilize the strength of -ImPy- and -PyPy- central pairings. The f/Im and f/Py terminal groups yielded no advantage for their respective C/G or T/A base pairs. The exception is with the -ImPy- central pairing, for which f/Im has a 10-fold greater affinity for C/G than f/Py has for T/A.
SOMKE: kernel density estimation over data streams by sequences of self-organizing maps.
Cao, Yuan; He, Haibo; Man, Hong
2012-08-01
In this paper, we propose a novel method SOMKE, for kernel density estimation (KDE) over data streams based on sequences of self-organizing map (SOM). In many stream data mining applications, the traditional KDE methods are infeasible because of the high computational cost, processing time, and memory requirement. To reduce the time and space complexity, we propose a SOM structure in this paper to obtain well-defined data clusters to estimate the underlying probability distributions of incoming data streams. The main idea of this paper is to build a series of SOMs over the data streams via two operations, that is, creating and merging the SOM sequences. The creation phase produces the SOM sequence entries for windows of the data, which obtains clustering information of the incoming data streams. The size of the SOM sequences can be further reduced by combining the consecutive entries in the sequence based on the measure of Kullback-Leibler divergence. Finally, the probability density functions over arbitrary time periods along the data streams can be estimated using such SOM sequences. We compare SOMKE with two other KDE methods for data streams, the M-kernel approach and the cluster kernel approach, in terms of accuracy and processing time for various stationary data streams. Furthermore, we also investigate the use of SOMKE over nonstationary (evolving) data streams, including a synthetic nonstationary data stream, a real-world financial data stream and a group of network traffic data streams. The simulation results illustrate the effectiveness and efficiency of the proposed approach.
Jiang, F; Jin, Q; Liang, L; Zhang, A B; Li, Z H
2014-11-01
Fruit flies in the family Tephritidae are the economically important pests that have many species complexes. DNA barcoding has gradually been verified as an effective tool for identifying species in a wide range of taxonomic groups, and there are several publications on rapid and accurate identification of fruit flies based on this technique; however, comprehensive analyses of large and new taxa for the effectiveness of DNA barcoding for fruit flies identification have been rare. In this study, we evaluated the COI barcode sequences for the diagnosis of fruit flies using 1426 sequences for 73 species of Bactrocera distributed worldwide. Tree-based [neighbour-joining (NJ)]; distance-based, such as Best Match (BM), Best Close Match (BCM) and Minimum Distance (MD); and character-based methods were used to evaluate the barcoding success rates obtained with maintaining the species complex in the data set, treating a species complex as a single taxon unit, and removing the species complex. Our results indicate that the average divergence between species was 14.04% (0.00-25.16%), whereas within a species this was 0.81% (0.00-9.71%); the existence of species complexes largely reduced the barcoding success for Tephritidae, for example relatively low success rates (74.4% based on BM and BCM and 84.8% based on MD) were obtained when the sequences from species complexes were included in the analysis, whereas significantly higher success rates were achieved if the species complexes were treated as a single taxon or removed from the data set - BM (98.9%), BCM (98.5%) and MD (97.5%), or BM (98.1%), BCM (97.4%) and MD (98.2%). © 2014 John Wiley & Sons Ltd.
MotionFlow: Visual Abstraction and Aggregation of Sequential Patterns in Human Motion Tracking Data.
Jang, Sujin; Elmqvist, Niklas; Ramani, Karthik
2016-01-01
Pattern analysis of human motions, which is useful in many research areas, requires understanding and comparison of different styles of motion patterns. However, working with human motion tracking data to support such analysis poses great challenges. In this paper, we propose MotionFlow, a visual analytics system that provides an effective overview of various motion patterns based on an interactive flow visualization. This visualization formulates a motion sequence as transitions between static poses, and aggregates these sequences into a tree diagram to construct a set of motion patterns. The system also allows the users to directly reflect the context of data and their perception of pose similarities in generating representative pose states. We provide local and global controls over the partition-based clustering process. To support the users in organizing unstructured motion data into pattern groups, we designed a set of interactions that enables searching for similar motion sequences from the data, detailed exploration of data subsets, and creating and modifying the group of motion patterns. To evaluate the usability of MotionFlow, we conducted a user study with six researchers with expertise in gesture-based interaction design. They used MotionFlow to explore and organize unstructured motion tracking data. Results show that the researchers were able to easily learn how to use MotionFlow, and the system effectively supported their pattern analysis activities, including leveraging their perception and domain knowledge.
Feliubadaló, Lídia; Lopez-Doriga, Adriana; Castellsagué, Ester; del Valle, Jesús; Menéndez, Mireia; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Gómez, Carolina; Campos, Olga; Pineda, Marta; González, Sara; Moreno, Victor; Brunet, Joan; Blanco, Ignacio; Serra, Eduard; Capellá, Gabriel; Lázaro, Conxi
2013-01-01
Next-generation sequencing (NGS) is changing genetic diagnosis due to its huge sequencing capacity and cost-effectiveness. The aim of this study was to develop an NGS-based workflow for routine diagnostics for hereditary breast and ovarian cancer syndrome (HBOCS), to improve genetic testing for BRCA1 and BRCA2. A NGS-based workflow was designed using BRCA MASTR kit amplicon libraries followed by GS Junior pyrosequencing. Data analysis combined Variant Identification Pipeline freely available software and ad hoc R scripts, including a cascade of filters to generate coverage and variant calling reports. A BRCA homopolymer assay was performed in parallel. A research scheme was designed in two parts. A Training Set of 28 DNA samples containing 23 unique pathogenic mutations and 213 other variants (33 unique) was used. The workflow was validated in a set of 14 samples from HBOCS families in parallel with the current diagnostic workflow (Validation Set). The NGS-based workflow developed permitted the identification of all pathogenic mutations and genetic variants, including those located in or close to homopolymers. The use of NGS for detecting copy-number alterations was also investigated. The workflow meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches. PMID:23249957
Improving protein complex classification accuracy using amino acid composition profile.
Huang, Chien-Hung; Chou, Szu-Yu; Ng, Ka-Lok
2013-09-01
Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain. Copyright © 2013 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
A bacterial artificial chromosome (BAC) library and BAC-end sequences for Gossypium hirsutum L. have recently been developed. Here we report on genomic-based genome-wide SNP mining utilizing re-sequencing data with a BAC-end sequence reference for twelve G. hirsutum L. lines, one G. barbadense L. li...
Edwards, Jan; Beckman, Mary E; Munson, Benjamin
2004-04-01
Adults' performance on a variety of tasks suggests that phonological processing of nonwords is grounded in generalizations about sublexical patterns over all known words. A small body of research suggests that children's phonological acquisition is similarly based on generalizations over the lexicon. To test this account, production accuracy and fluency were examined in nonword repetitions by 104 children and 22 adults. Stimuli were 22 pairs of nonwords, in which one nonword contained a low-frequency or unattested two-phoneme sequence and the other contained a high-frequency sequence. For a subset of these nonword pairs, segment durations were measured. The same sound was produced with a longer duration (less fluently) when it appeared in a low-frequency sequence, as compared to a high-frequency sequence. Low-frequency sequences were also repeated with lower accuracy than high-frequency sequences. Moreover, children with smaller vocabularies showed a larger influence of frequency on accuracy than children with larger vocabularies. Taken together, these results provide support for a model of phonological acquisition in which knowledge of sublexical units emerges from generalizations made over lexical items.
Jain, Vipin; Hilton, Benjamin; Lin, Bin; Jain, Anshu; MacKerell, Alexander D.; Zou, Yue; Cho, Bongsup P.
2014-01-01
Cluster DNA damage refers to two or more lesions in a single turn of the DNA helix. Such clustering may occur with bulky DNA lesions, which may be responsible for their sequence dependent repair and mutational outcomes. Here we prepared three 16-mer cluster duplexes in which two fluoroacetylaminofluorene adducts (dG-FAAF) are separated by none, one and two nucleotides in the E. coli NarI mutational hot spot (5'-CTCTCG1G2CG3CCATCAC-3'): i.e. 5'-- CG1*G2*CG3CC--3', 5'--CG1G2*CG3*CC--3', and 5'--CG1*G2CG3*CC--3' [G*=dG-FAAF], respectively. We conducted spectroscopic, thermodynamic, and molecular dynamics studies of these di-FAAF duplexes and the results were compared with those of the corresponding mono- FAAF adducts in the same NarI sequence (Nucleic Acids Res. 2012, 3939–3951). Our nucleotide excision repair results showed greater reparability of the di-adducts in comparison to the corresponding mono-adducts. Moreover, we observed dramatic flanking base sequence effects on their repair efficiency in the order of NarI-G2G3 > -G1G3 > -G1G2. The NMR/CD/UV-melting and MD-simulation results revealed that in contrast to the mono-adducts, di-adducts produced synergistic effect on duplex destabilization. In addition, dG-FAAF at G2G3 and G1G3 destack the neighboring bases with greater destabilization occurring with the former. Overall, the results indicate the importance of base stacking and related thermal/thermodynamic destabilization in the repair of bulky cluster arylamine DNA adducts. PMID:23841451
2014-01-01
Background Foxtail millet (Setaria italica (L.) Beauv.) is an important gramineous grain-food and forage crop. It is grown worldwide for human and livestock consumption. Its small genome and diploid nature have led to foxtail millet fast becoming a novel model for investigating plant architecture, drought tolerance and C4 photosynthesis of grain and bioenergy crops. Therefore, cost-effective, reliable and highly polymorphic molecular markers covering the entire genome are required for diversity, mapping and functional genomics studies in this model species. Result A total of 5,020 highly repetitive microsatellite motifs were isolated from the released genome of the genotype 'Yugu1’ by sequence scanning. Based on sequence comparison between S. italica and S. viridis, a set of 788 SSR primer pairs were designed. Of these primers, 733 produced reproducible amplicons and were polymorphic among 28 Setaria genotypes selected from diverse geographical locations. The number of alleles detected by these SSR markers ranged from 2 to 16, with an average polymorphism information content of 0.67. The result obtained by neighbor-joining cluster analysis of 28 Setaria genotypes, based on Nei’s genetic distance of the SSR data, showed that these SSR markers are highly polymorphic and effective. Conclusions A large set of highly polymorphic SSR markers were successfully and efficiently developed based on genomic sequence comparison between different genotypes of the genus Setaria. The large number of new SSR markers and their placement on the physical map represent a valuable resource for studying diversity, constructing genetic maps, functional gene mapping, QTL exploration and molecular breeding in foxtail millet and its closely related species. PMID:24472631
Zhang, Shuo; Tang, Chanjuan; Zhao, Qiang; Li, Jing; Yang, Lifang; Qie, Lufeng; Fan, Xingke; Li, Lin; Zhang, Ning; Zhao, Meicheng; Liu, Xiaotong; Chai, Yang; Zhang, Xue; Wang, Hailong; Li, Yingtao; Li, Wen; Zhi, Hui; Jia, Guanqing; Diao, Xianmin
2014-01-28
Foxtail millet (Setaria italica (L.) Beauv.) is an important gramineous grain-food and forage crop. It is grown worldwide for human and livestock consumption. Its small genome and diploid nature have led to foxtail millet fast becoming a novel model for investigating plant architecture, drought tolerance and C4 photosynthesis of grain and bioenergy crops. Therefore, cost-effective, reliable and highly polymorphic molecular markers covering the entire genome are required for diversity, mapping and functional genomics studies in this model species. A total of 5,020 highly repetitive microsatellite motifs were isolated from the released genome of the genotype 'Yugu1' by sequence scanning. Based on sequence comparison between S. italica and S. viridis, a set of 788 SSR primer pairs were designed. Of these primers, 733 produced reproducible amplicons and were polymorphic among 28 Setaria genotypes selected from diverse geographical locations. The number of alleles detected by these SSR markers ranged from 2 to 16, with an average polymorphism information content of 0.67. The result obtained by neighbor-joining cluster analysis of 28 Setaria genotypes, based on Nei's genetic distance of the SSR data, showed that these SSR markers are highly polymorphic and effective. A large set of highly polymorphic SSR markers were successfully and efficiently developed based on genomic sequence comparison between different genotypes of the genus Setaria. The large number of new SSR markers and their placement on the physical map represent a valuable resource for studying diversity, constructing genetic maps, functional gene mapping, QTL exploration and molecular breeding in foxtail millet and its closely related species.
Model-based quality assessment and base-calling for second-generation sequencing data.
Bravo, Héctor Corrada; Irizarry, Rafael A
2010-09-01
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Popovtzer, Aron; Ibrahim, Mohannad; Tatro, Daniel; Feng, Felix Y; Ten Haken, Randall K; Eisbruch, Avraham
2014-09-01
Magnetic resonance imaging (MRI) has been found to be better than computed tomography for defining the extent of primary gross tumor volume (GTV) in advanced nasopharyngeal cancer. It is routinely applied for target delineation in planning radiotherapy. However, the specific MRI sequences/planes that should be used are unknown. Twelve patients with nasopharyngeal cancer underwent primary GTV evaluation with gadolinium-enhanced axial T1 weighted image (T1) and T2 weighted image (T2), coronal T1, and sagittal T1 sequences. Each sequence was registered with the planning computed tomography scans. Planning target volumes (PTVs) were derived by uniform expansions of the GTVs. The volumes encompassed by the various sequences/planes, and the volumes common to all sequences/planes, were compared quantitatively and anatomically to the volume delineated by the commonly used axial T1-based dataset. Addition of the axial T2 sequence increased the axial T1-based GTV by 12% on average (p = 0.004), and composite evaluations that included the coronal T1 and sagittal T1 planes increased the axial T1-based GTVs by 30% on average (p = 0.003). The axial T1-based PTVs were increased by 20% by the additional sequences (p = 0.04). Each sequence/plane added unique volume extensions. The GTVs common to all the T1 planes accounted for 38% of the total volumes of all the T1 planes. Anatomically, addition of the coronal and sagittal-based GTVs extended the axial T1-based GTV caudally and cranially, notably to the base of the skull. Adding MRI planes and sequences to the traditional axial T1 sequence yields significant quantitative and anatomically important extensions of the GTVs and PTVs. For accurate target delineation in nasopharyngeal cancer, we recommend that GTVs be outlined in all MRI sequences/planes and registered with the planning computed tomography scans.
Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo
2012-01-01
In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
Spring-Connell, Alexander M.; Evich, Marina G.; Debelak, Harald; Seela, Frank; Germann, Markus W.
2016-01-01
A truly universal nucleobase enables a host of novel applications such as simplified templates for PCR primers, randomized sequencing and DNA based devices. A universal base must pair indiscriminately to each of the canonical bases with little or preferably no destabilization of the overall duplex. In reality, many candidates either destabilize the duplex or do not base pair indiscriminatingly. The novel base 8-aza-7-deazaadenine (pyrazolo[3,4-d]pyrimidin- 4-amine) N8-(2′deoxyribonucleoside), a deoxyadenosine analog (UB), pairs with each of the natural DNA bases with little sequence preference. We have utilized NMR complemented with molecular dynamic calculations to characterize the structure and dynamics of a UB incorporated into a DNA duplex. The UB participates in base stacking with little to no perturbation of the local structure yet forms an unusual base pair that samples multiple conformations. These local dynamics result in the complete disappearance of a single UB proton resonance under native conditions. Accommodation of the UB is additionally stabilized via heightened backbone conformational sampling. NMR combined with various computational techniques has allowed for a comprehensive characterization of both structural and dynamic effects of the UB in a DNA duplex and underlines that the UB as a strong candidate for universal base applications. PMID:27566150
Xie, Jing; Lu, Xiongxiong; Wu, Xue; Lin, Xiaoyi; Zhang, Chao; Huang, Xiaofang; Chang, Zhili; Wang, Xinjing; Wen, Chenlei; Tang, Xiaomei; Shi, Minmin; Zhan, Qian; Chen, Hao; Deng, Xiaxing; Peng, Chenghong; Li, Hongwei; Fang, Yuan; Shao, Yang; Shen, Baiyong
2016-05-01
Targeted therapies including monoclonal antibodies and small molecule inhibitors have dramatically changed the treatment of cancer over past 10 years. Their therapeutic advantages are more tumor specific and with less side effects. For precisely tailoring available targeted therapies to each individual or a subset of cancer patients, next-generation sequencing (NGS) has been utilized as a promising diagnosis tool with its advantages of accuracy, sensitivity, and high throughput. We developed and validated a NGS-based cancer genomic diagnosis targeting 115 prognosis and therapeutics relevant genes on multiple specimen including blood, tumor tissue, and body fluid from 10 patients with different cancer types. The sequencing data was then analyzed by the clinical-applicable analytical pipelines developed in house. We have assessed analytical sensitivity, specificity, and accuracy of the NGS-based molecular diagnosis. Also, our developed analytical pipelines were capable of detecting base substitutions, indels, and gene copy number variations (CNVs). For instance, several actionable mutations of EGFR,PIK3CA,TP53, and KRAS have been detected for indicating drug susceptibility and resistance in the cases of lung cancer. Our study has shown that NGS-based molecular diagnosis is more sensitive and comprehensive to detect genomic alterations in cancer, and supports a direct clinical use for guiding targeted therapy.
Kutyavin, Igor V.
2013-01-01
Described in the article is a new approach for the sequence-specific detection of nucleic acids in real-time polymerase chain reaction (PCR) using fluorescently labeled oligonucleotide probes. The method is based on the production of PCR amplicons, which fold into dumbbell-like secondary structures carrying a specially designed ‘probe-luring’ sequence at their 5′ ends. Hybridization of this sequence to a complementary ‘anchoring’ tail introduced at the 3′ end of a fluorescent probe enables the probe to bind to its target during PCR, and the subsequent probe cleavage results in the florescence signal. As it has been shown in the study, this amplicon-endorsed and guided formation of the probe-target duplex allows the use of extremely short oligonucleotide probes, up to tetranucleotides in length. In particular, the short length of the fluorescent probes makes possible the development of a ‘universal’ probe inventory that is relatively small in size but represents all possible sequence variations. The unparalleled cost-effectiveness of the inventory approach is discussed. Despite the short length of the probes, this new method, named Angler real-time PCR, remains highly sequence specific, and the results of the study indicate that it can be effectively used for quantitative PCR and the detection of polymorphic variations. PMID:24013564
SNP discovery by high-throughput sequencing in soybean
2010-01-01
Background With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology. Results A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp. Conclusions We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops. PMID:20701770