Science.gov

Sample records for novo sequencing approach

  1. A hybrid approach for de novo human genome sequence assembly and phasing.

    PubMed

    Mostovoy, Yulia; Levy-Sakin, Michal; Lam, Jessica; Lam, Ernest T; Hastie, Alex R; Marks, Patrick; Lee, Joyce; Chu, Catherine; Lin, Chin; Džakula, Željko; Cao, Han; Schlebusch, Stephen A; Giorda, Kristina; Schnall-Levin, Michael; Wall, Jeffrey D; Kwok, Pui-Yan

    2016-07-01

    Despite tremendous progress in genome sequencing, the basic goal of producing a phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe an approach to performing de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics linked-read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.

  2. A Machine Learning Based Approach to de novo Sequencing of Glycans from Tandem Mass Spectrometry Spectrum.

    PubMed

    Kumozaki, Shotaro; Sato, Kengo; Sakakibara, Yasubumi

    2015-01-01

    Recently, glycomics has been actively studied and various technologies for glycomics have been rapidly developed. Currently, tandem mass spectrometry (MS/MS) is one of the key experimental tools for identification of structures of oligosaccharides. MS/MS can observe MS/MS peaks of fragmented glycan ions including cross-ring ions resulting from internal cleavages, which provide valuable information to infer glycan structures. Thus, the aim of de novo sequencing of glycans is to find the most probable assignments of observed MS/MS peaks to glycan substructures without databases. However, there are few satisfiable algorithms for glycan de novo sequencing from MS/MS spectra. We present a machine learning based approach to de novo sequencing of glycans from MS/MS spectrum. First, we build a suitable model for the fragmentation of glycans including cross-ring ions, and implement a solver that employs Lagrangian relaxation with a dynamic programming technique. Then, to optimize scores for the algorithm, we introduce a machine learning technique called structured support vector machines that enable us to learn parameters including scores for cross-ring ions from training data, i.e., known glycan mass spectra. Furthermore, we implement additional constraints for core structures of well-known glycan types including N-linked glycans and O-linked glycans. This enables us to predict more accurate glycan structures if the glycan type of given spectra is known. Computational experiments show that our algorithm performs accurate de novo sequencing of glycans. The implementation of our algorithm and the datasets are available at http://glyfon.dna.bio.keio.ac.jp/. PMID:26671799

  3. Antilope--a Lagrangian relaxation approach to the de novo peptide sequencing problem.

    PubMed

    Andreotti, Sandro; Klau, Gunnar W; Reinert, Knut

    2012-01-01

    Peptide sequencing from mass spectrometry data is a key step in proteome research. Especially de novo sequencing, the identification of a peptide from its spectrum alone, is still a challenge even for state-of-the-art algorithmic approaches. In this paper, we present ANTILOPE, a new fast and flexible approach based on mathematical programming. It builds on the spectrum graph model and works with a variety of scoring schemes. ANTILOPE combines Lagrangian relaxation for solving an integer linear programming formulation with an adaptation of Yen’s k shortest paths algorithm. It shows a significant improvement in running time compared to mixed integer optimization and performs at the same speed like other state-of-the-art tools. We also implemented a generic probabilistic scoring scheme that can be trained automatically for a data set of annotated spectra and is independent of the mass spectrometer type. Evaluations on benchmark data show that ANTILOPE is competitive to the popular state-of-the-art programs PepNovo and NovoHMM both in terms of runtime and accuracy. Furthermore, it offers increased flexibility in the number of considered ion types. ANTILOPE will be freely available as part of the open source proteomics library OpenMS.

  4. New Approaches and Technologies to Sequence de novo Plant reference Genomes (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Schmutz, Jeremy

    2013-03-01

    Jeremy Schmutz of the HudsonAlpha Institute for Biotechnology on "New approaches and technologies to sequence de novo plant reference genomes" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  5. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    SciTech Connect

    Pan, Chongle; Park, Byung H; McDonald, W Hayes; Carey, Patricia A; Banfield, Jillian F.; Verberkmoes, Nathan C; Hettich, Robert {Bob} L; Samatova, Nagiza F

    2010-01-01

    Background High-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms. Results In this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy. Conclusions Here, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.

  6. Two different high throughput sequencing approaches identify thousands of de novo genomic markers for the genetically depleted Bornean elephant.

    PubMed

    Sharma, Reeta; Goossens, Benoit; Kun-Rodrigues, Célia; Teixeira, Tatiana; Othman, Nurzhafarina; Boone, Jason Q; Jue, Nathaniel K; Obergfell, Craig; O'Neill, Rachel J; Chikhi, Lounès

    2012-01-01

    High throughput sequencing technologies are being applied to an increasing number of model species with a high-quality reference genome. The application and analyses of whole-genome sequence data in non-model species with no prior genomic information are currently under way. Recent sequencing technologies provide new opportunities for gathering genomic data in natural populations, laying the empirical foundation for future research in the field of conservation and population genomics. Here we present the case study of the Bornean elephant, which is the most endangered subspecies of Asian elephant and exhibits very low genetic diversity. We used two different sequencing platforms, the Roche 454 FLX (shotgun) and Illumina, GAIIx (Restriction site associated DNA, RAD) to evaluate the feasibility of the two methodologies for the discovery of de novo markers (single nucleotide polymorphism, SNPs and microsatellites) using low coverage data. Approximately, 6,683 (shotgun) and 14,724 (RAD) SNPs were detected within our elephant sequence dataset. Genotyping of a representative sample of 194 SNPs resulted in a SNP validation rate of ~83 to 94% and 17% of the loci were polymorphic with a low diversity (H(o)=0.057). Different numbers of microsatellites were identified through shotgun (27,226) and RAD (868) techniques. Out of all di-, tri-, and tetra-microsatellite loci, 1,706 loci had sufficient flanking regions (shotgun) while only 7 were found with RAD. All microsatellites were monomorphic in the Bornean but polymorphic in another elephant subspecies. Despite using different sample sizes, and the well known differences in the two platforms used regarding sequence length and throughput, the two approaches showed high validation rate. The approaches used here for marker development in a threatened species demonstrate the utility of high throughput sequencing technologies as a starting point for the development of genomic tools in a non-model species and in particular for a

  7. A combined de novo protein sequencing and cDNA library approach to the venomic analysis of Chinese spider Araneus ventricosus.

    PubMed

    Duan, Zhigui; Cao, Rui; Jiang, Liping; Liang, Songping

    2013-01-14

    In past years, spider venoms have attracted increasing attention due to their extraordinary chemical and pharmacological diversity. The recently popularized proteomic method highly improved our ability to analyze the proteins in the venom. However, the lack of information about isolated venom proteins sequences dramatically limits the ability to confidently identify venom proteins. In the present paper, the venom from Araneus ventricosus was analyzed using two complementary approaches: 2-DE/Shotgun-LC-MS/MS coupled to MASCOT search and 2-DE/Shotgun-LC-MS/MS coupled to manual de novo sequencing followed by local venom protein database (LVPD) search. The LVPD was constructed with toxin-like protein sequences obtained from the analysis of cDNA library from A. ventricosus venom glands. Our results indicate that a total of 130 toxin-like protein sequences were unambiguously identified by manual de novo sequencing coupled to LVPD search, accounting for 86.67% of all toxin-like proteins in LVPD. Thus manual de novo sequencing coupled to LVPD search was proved an extremely effective approach for the analysis of venom proteins. In addition, the approach displays impeccable advantage in validating mutant positions of isoforms from the same toxin-like family. Intriguingly, methyl esterifcation of glutamic acid was discovered for the first time in animal venom proteins by manual de novo sequencing.

  8. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Hixson, Kim K.; Purvine, Samuel O.; Anderson, Gordon A.; Smith, Richard D.

    2008-10-15

    De novo sequencing has a promise to discover the protein post-translation modifications; however, such approach is still in their infancy and not widely applied for proteomics practices due to its limited reliability. In this work, we describe a de novo sequencing approach for discovery of protein modifications through identification of the UStags (Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry for peptides and polypeptides in a yeast lysate, and the de novo sequences obtained were filtered to define a more limited set of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags’ prefix and suffix sequences and the UStags themselves) were used to infer the possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances of yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. Random matching of the de novo sequences to the predicted sequences were examined with use of two random (false) databases, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity are described. The de novo-UStag complements the UStag method previously reported by enabling discovery of new protein modifications.

  9. A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

    SciTech Connect

    Pan, Chongle; Park, Byung H; McDonald, W Hayes; Banfield, Jillian F.; Verberkmoes, Nathan C; Hettich, Robert {Bob} L; Samatova, Nagiza F

    2010-01-01

    High-resolution tandem mass spectra can now readily be acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. In this study, a new de novo sequencing algorithm, Vonode, has been developed specifically for such high-resolution tandem mass spectra. To fully exploit the high mass accuracy, sparse noise, and low background of these spectra, a unique scoring system is used to evaluate sequence tags based mainly on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. The performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, especially in term of the number of de novo sequenced spectra.

  10. Identification of Disulfide Bonds in Protein Proteolytic Degradation Products Using de Novo-Protein Unique Sequence Tags Approach

    SciTech Connect

    Shen, Yufeng; Tolic, Nikola; Purvine, Samuel O.; Smith, Richard D.

    2010-08-01

    Disulfide bonds are a form of posttranslational modification that often determines protein structure(s) and function(s). In this work, we report a mass spectrometry method for identification of disulfides in degradation products of proteins, and specifically endogenous peptides in the human blood plasma peptidome. LC-Fourier transform tandem mass spectrometry (FT MS/MS) was used for acquiring mass spectra that were de novo sequenced and then searched against the IPI human protein database. Through the use of unique sequence tags (UStags) we unambiguously correlated the spectra to specific database proteins. Examination of the UStags’ prefix and/or suffix sequences that contain cysteine(s) in conjunction with sequences of the UStags-specified database proteins is shown to enable the unambigious determination of disulfide bonds. Using this method, we identified the intermolecular and intramolecular disulfides in human blood plasma peptidome peptides that have molecular weights of up to ~10 kDa.

  11. Identification of disulfide bonds in protein proteolytic degradation products using de novo-protein unique sequence tags approach.

    PubMed

    Shen, Yufeng; Tolić, Nikola; Purvine, Samuel O; Smith, Richard D

    2010-08-01

    Disulfide bonds are a form of post-translational modification that often determines protein structure(s) and function(s). In this work, we report a mass spectrometry method for identification of disulfides in degradation products of proteins, specifically endogenous peptides in the human blood plasma peptidome. LC-Fourier transform tandem mass spectrometry (FT MS/MS) was used for acquiring mass spectra that were de novo sequenced and then searched against the IPI human protein database. Through the use of unique sequence tags (UStags), we unambiguously correlated the spectra to specific database proteins. Examination of the UStags' prefix and/or suffix sequences that contain cysteine(s) in conjunction with sequences of the UStags-specified database proteins is shown to enable the unambigious determination of disulfide bonds. Using this method, we identified the intermolecular and intramolecular disulfides in human blood plasma peptidome peptides that have molecular weights of up to approximately 10 kDa. PMID:20590115

  12. RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

    PubMed Central

    Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

    2013-01-01

    Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859

  13. De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins.

    PubMed

    Shen, Yufeng; Tolić, Nikola; Hixson, Kim K; Purvine, Samuel O; Anderson, Gordon A; Smith, Richard D

    2008-10-15

    De novo sequencing is a spectrum analysis approach for mass spectrometry data to discover post-translational modifications in proteins; however, such an approach is still in its infancy and is still not widely applied to proteomic practices due to its limited reliability. In this work, we describe a de novo sequencing approach for the discovery of protein modifications based on identification of the proteome UStags (Shen, Y.; Tolić, N.; Hixson, K. K.; Purvine, S. O.; Pasa-Tolić, L.; Qian, W. J.; Adkins, J. N.; Moore, R. J.; Smith, R. D. Anal. Chem. 2008, 80, 1871-1882). The de novo information was obtained from Fourier-transform tandem mass spectrometry data for peptides and polypeptides from a yeast lysate, and the de novo sequences obtained were selected based on filter levels designed to provide a limited yet high quality subset of UStags. The DNA-predicted database protein sequences were then compared to the UStags, and the differences observed across or in the UStags (i.e., the UStags' prefix and suffix sequences and the UStags themselves) were used to infer possible sequence modifications. With this de novo-UStag approach, we uncovered some unexpected variances within several yeast protein sequences due to amino acid mutations and/or multiple modifications to the predicted protein sequences. To determine false discovery rates, two random (false) databases were independently used for sequence matching, and ~3% false discovery rates were estimated for the de novo-UStag approach. The factors affecting the reliability (e.g., existence of de novo sequencing noise residues and redundant sequences) and the sensitivity of the approach were investigated and described. The combined de novo-UStag approach complements the UStag method previously reported by enabling the discovery of new protein modifications. PMID:18783246

  14. Novor: Real-Time Peptide de Novo Sequencing Software

    NASA Astrophysics Data System (ADS)

    Ma, Bin

    2015-11-01

    De novo sequencing software has been widely used in proteomics to sequence new peptides from tandem mass spectrometry data. This study presents a new software tool, Novor, to greatly improve both the speed and accuracy of today's peptide de novo sequencing analyses. To improve the accuracy, Novor's scoring functions are based on two large decision trees built from a peptide spectral library with more than 300,000 spectra with machine learning. Important knowledge about peptide fragmentation is extracted automatically from the library and incorporated into the scoring functions. The decision tree model also enables efficient score calculation and contributes to the speed improvement. To further improve the speed, a two-stage algorithmic approach, namely dynamic programming and refinement, is used. The software program was also carefully optimized. On the testing datasets, Novor sequenced 7%-37% more correct residues than the state-of-the-art de novo sequencing tool, PEAKS, while being an order of magnitude faster. Novor can de novo sequence more than 300 MS/MS spectra per second on a laptop computer. The speed surpasses the acquisition speed of today's mass spectrometer and, therefore, opens a new possibility to de novo sequence in real time while the spectrometer is acquiring the spectral data.

  15. Novor: real-time peptide de novo sequencing software.

    PubMed

    Ma, Bin

    2015-11-01

    De novo sequencing software has been widely used in proteomics to sequence new peptides from tandem mass spectrometry data. This study presents a new software tool, Novor, to greatly improve both the speed and accuracy of today's peptide de novo sequencing analyses. To improve the accuracy, Novor's scoring functions are based on two large decision trees built from a peptide spectral library with more than 300,000 spectra with machine learning. Important knowledge about peptide fragmentation is extracted automatically from the library and incorporated into the scoring functions. The decision tree model also enables efficient score calculation and contributes to the speed improvement. To further improve the speed, a two-stage algorithmic approach, namely dynamic programming and refinement, is used. The software program was also carefully optimized. On the testing datasets, Novor sequenced 7%-37% more correct residues than the state-of-the-art de novo sequencing tool, PEAKS, while being an order of magnitude faster. Novor can de novo sequence more than 300 MS/MS spectra per second on a laptop computer. The speed surpasses the acquisition speed of today's mass spectrometer and, therefore, opens a new possibility to de novo sequence in real time while the spectrometer is acquiring the spectral data. Graphical Abstract ᅟ.

  16. SPIDER: software for protein identification from sequence tags with de novo sequencing error.

    PubMed

    Han, Yonghua; Ma, Bin; Zhang, Kaizhong

    2005-06-01

    For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software. PMID:16108090

  17. SPIDER: software for protein identification from sequence tags with de novo sequencing error.

    PubMed

    Han, Yonghua; Ma, Bin; Zhang, Kaizhong

    2004-01-01

    For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software. PMID:16448014

  18. Application of de Novo Sequencing to Large-Scale Complex Proteomics Data Sets.

    PubMed

    Devabhaktuni, Arun; Elias, Joshua E

    2016-03-01

    Dependent on concise, predefined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large-scale proteomics data sets and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) that leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to that of other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches. PMID:26743026

  19. Multiplex De Novo Sequencing of Peptide Antibiotics

    NASA Astrophysics Data System (ADS)

    Mohimani, Hosein; Liu, Wei-Ting; Yang, Yu-Liang; Gaudêncio, Susana P.; Fenical, William; Dorrestein, Pieter C.; Pevzner, Pavel A.

    Proliferation of drug-resistant diseases raises the challenge of searching for new, more efficient antibiotics. Currently, some of the most effective antibiotics (i.e., Vancomycin and Daptomycin) are cyclic peptides produced by non-ribosomal biosynthetic pathways. The isolation and sequencing of cyclic peptide antibiotics, unlike the same activity with linear peptides, is time-consuming and error-prone. The dominant technique for sequencing cyclic peptides is NMR-based and requires large amounts (milligrams) of purified materials that, for most compounds, are not possible to obtain. Given these facts, there is a need for new tools to sequence cyclic NRPs using picograms of material. Since nearly all cyclic NRPs are produced along with related analogs, we develop a mass spectrometry approach for sequencing all related peptides at once (in contrast to the existing approach that analyzes individual peptides). Our results suggest that instead of attempting to isolate and NMR-sequence the most abundant compound, one should acquire spectra of many related compounds and sequence all of them simultaneously using tandem mass spectrometry. We illustrate applications of this approach by sequencing new variants of cyclic peptide antibiotics from Bacillus brevis, as well as sequencing a previously unknown familiy of cyclic NRPs produced by marine bacteria.

  20. DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

    PubMed

    Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald

    2014-02-01

    De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .

  1. De Novo Sequencing and Homology Searching‡‡*

    PubMed Central

    Ma, Bin; Johnson, Richard

    2012-01-01

    In proteomics, de novo sequencing is the process of deriving peptide sequences from tandem mass spectra without the assistance of a sequence database. Such analyses have traditionally been performed manually by human experts, and more recently by computer programs that have been developed because of the need for higher throughput. Although powerful, de novo sequencing often can only determine partially correct sequence tags because of imperfect tandem mass spectra. However, these sequence tags can then be searched in a sequence database to identify the exact or a homologous peptide. Homology searches are particularly useful for the study of organisms whose genomes have not been sequenced. This tutorial will present background important to understanding de novo sequencing, suggestions on how to do this manually, plus descriptions of computer algorithms used to automate this process and to subsequently carryout homology-based database searches. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 1). PMID:22090170

  2. Current challenges in de novo plant genome sequencing and assembly

    PubMed Central

    2012-01-01

    Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community. PMID:22546054

  3. Ameliorated de novo transcriptome assembly using Illumina paired end sequence data with Trinity Assembler

    PubMed Central

    Bankar, Kiran Gopinath; Todur, Vivek Nagaraj; Shukla, Rohit Nandan; Vasudevan, Madavan

    2015-01-01

    Advent of Next Generation Sequencing has led to possibilities of de novo transcriptome assembly of organisms without availability of complete genome sequence. Among various sequencing platforms available, Illumina is the most widely used platform based on data quality, quantity and cost. Various de novo transcriptome assemblers are also available today for construction of de novo transcriptome. In this study, we aimed at obtaining an ameliorated de novo transcriptome assembly with sequence reads obtained from Illumina platform and assembled using Trinity Assembler. We found that, primary transcriptome assembly obtained as a result of Trinity can be ameliorated on the basis of transcript length, coverage, and depth and protein homology. Our approach to ameliorate is reproducible and could enhance the sensitivity and specificity of the assembled transcriptome which could be critical for validation of the assembled transcripts and for planning various downstream biological assays. PMID:26484285

  4. Complete De Novo Assembly of Monoclonal Antibody Sequences

    PubMed Central

    Tran, Ngoc Hieu; Rahman, M. Ziaur; He, Lin; Xin, Lei; Shan, Baozhen; Li, Ming

    2016-01-01

    De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216–441 AA, at 100% coverage, and 96.64–100% accuracy. PMID:27562653

  5. Complete De Novo Assembly of Monoclonal Antibody Sequences.

    PubMed

    Tran, Ngoc Hieu; Rahman, M Ziaur; He, Lin; Xin, Lei; Shan, Baozhen; Li, Ming

    2016-01-01

    De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216-441 AA, at 100% coverage, and 96.64-100% accuracy. PMID:27562653

  6. Considering Transposable Element Diversification in De Novo Annotation Approaches

    PubMed Central

    Flutre, Timothée; Duprat, Elodie; Feuillet, Catherine; Quesneville, Hadi

    2011-01-01

    Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species. PMID:21304975

  7. Exome sequencing supports a de novo mutational paradigm for schizophrenia.

    PubMed

    Xu, Bin; Roos, J Louw; Dexheimer, Phillip; Boone, Braden; Plummer, Brooks; Levy, Shawn; Gogos, Joseph A; Karayiorgou, Maria

    2011-09-01

    Despite its high heritability, a large fraction of individuals with schizophrenia do not have a family history of the disease (sporadic cases). Here we examined the possibility that rare de novo protein-altering mutations contribute to the genetic component of schizophrenia by sequencing the exomes of 53 sporadic cases, 22 unaffected controls and their parents. We identified 40 de novo mutations in 27 cases affecting 40 genes, including a potentially disruptive mutation in DGCR2, a gene located in the schizophrenia-predisposing 22q11.2 microdeletion region. A comparison to rare inherited variants indicated that the identified de novo mutations show a large excess of non-synonymous changes in schizophrenia cases, as well as a greater potential to affect protein structure and function. Our analyses suggest a major role for de novo mutations in schizophrenia as well as a large mutational target, which together provide a plausible explanation for the high global incidence and persistence of the disease. PMID:21822266

  8. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-03-19

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data.

  9. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  10. Identification of a novel Plasmopara halstedii elicitor protein combining de novo peptide sequencing algorithms and RACE-PCR

    PubMed Central

    2010-01-01

    Background Often high-quality MS/MS spectra of tryptic peptides do not match to any database entry because of only partially sequenced genomes and therefore, protein identification requires de novo peptide sequencing. To achieve protein identification of the economically important but still unsequenced plant pathogenic oomycete Plasmopara halstedii, we first evaluated the performance of three different de novo peptide sequencing algorithms applied to a protein digests of standard proteins using a quadrupole TOF (QStar Pulsar i). Results The performance order of the algorithms was PEAKS online > PepNovo > CompNovo. In summary, PEAKS online correctly predicted 45% of measured peptides for a protein test data set. All three de novo peptide sequencing algorithms were used to identify MS/MS spectra of tryptic peptides of an unknown 57 kDa protein of P. halstedii. We found ten de novo sequenced peptides that showed homology to a Phytophthora infestans protein, a closely related organism of P. halstedii. Employing a second complementary approach, verification of peptide prediction and protein identification was performed by creation of degenerate primers for RACE-PCR and led to an ORF of 1,589 bp for a hypothetical phosphoenolpyruvate carboxykinase. Conclusions Our study demonstrated that identification of proteins within minute amounts of sample material improved significantly by combining sensitive LC-MS methods with different de novo peptide sequencing algorithms. In addition, this is the first study that verified protein prediction from MS data by also employing a second complementary approach, in which RACE-PCR led to identification of a novel elicitor protein in P. halstedii. PMID:20459704

  11. LESSONS IN DE NOVO PEPTIDE SEQUENCING BY TANDEM MASS SPECTROMETRY

    PubMed Central

    Medzihradszky, Katalin F.; Chalkley, Robert J.

    2015-01-01

    Mass spectrometry has become the method of choice for the qualitative and quantitative characterization of protein mixtures isolated from all kinds of living organisms. The raw data in these studies are MS/MS spectra, usually of peptides produced by proteolytic digestion of a protein. These spectra are “translated” into peptide sequences, normally with the help of various search engines. Data acquisition and interpretation have both been automated, and most researchers look only at the summary of the identifications without ever viewing the underlying raw data used for assignments. Automated analysis of data is essential due to the volume produced. However, being familiar with the finer intricacies of peptide fragmentation processes, and experiencing the difficulties of manual data interpretation allow a researcher to be able to more critically evaluate key results, particularly because there are many known rules of peptide fragmentation that are not incorporated into search engine scoring. Since the most commonly used MS/MS activation method is collision-induced dissociation (CID), in this article we present a brief review of the history of peptide CID analysis. Next, we provide a detailed tutorial on how to determine peptide sequences from CID data. Although the focus of the tutorial is de novo sequencing, the lessons learned and resources supplied are useful for data interpretation in general. PMID:25667941

  12. Streamlined analysis of duplex sequencing data with Du Novo.

    PubMed

    Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton

    2016-01-01

    Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex . PMID:27566673

  13. Streamlined analysis of duplex sequencing data with Du Novo.

    PubMed

    Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton

    2016-08-26

    Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex .

  14. Evaluation and validation of de novo and hybrid assembly techniques to derive high quality genome sequences

    SciTech Connect

    Utturkar, Sagar M.; Klingeman, Dawn Marie

    2014-06-14

    Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as an additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.

  15. Evaluation and validation of de novo and hybrid assembly techniques to derive high quality genome sequences

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn Marie; Land, Miriam L.; Schadt, Christopher Warren; Doktycz, Mitchel John; Pelletier, Dale A.; Brown, Steven D.

    2014-06-14

    Our motivation with this work was to assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences. Our results show Illumina, 454 and PacBio sequencing technologies were used to generate de novo and hybrid genome assemblies for four different bacteria, which were assessed for quality using summary statistics (e.g. number of contigs, N50) and in silico evaluation tools. Differences in predictions of multiple copies of rDNA operons for each respective bacterium were evaluated by PCR and Sanger sequencing, and then the validated results were applied as anmore » additional criterion to rank assemblies. In general, assemblies using longer PacBio reads were better able to resolve repetitive regions. In this study, the combination of Illumina and PacBio sequence data assembled through the ALLPATHS-LG algorithm gave the best summary statistics and most accurate rDNA operon number predictions. This study will aid others looking to improve existing draft genome assemblies. As to availability and implementation–all assembly tools except CLC Genomics Workbench are freely available under GNU General Public License.« less

  16. The de novo assembly of mitochondrial genomes of the extinct passenger pigeon (Ectopistes migratorius) with next generation sequencing.

    PubMed

    Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien

    2013-01-01

    The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.

  17. The De Novo Assembly of Mitochondrial Genomes of the Extinct Passenger Pigeon (Ectopistes migratorius) with Next Generation Sequencing

    PubMed Central

    Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien

    2013-01-01

    The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111

  18. Identifying wrong assemblies in de novo short read primary sequence assembly contigs.

    PubMed

    Chawla, Vandna; Kumar, Rajnish; Shankar, Ravi

    2016-09-01

    With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequenced all over the world. Most of these assemblies are done using some de novo short read assemblers and other related approaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuous dearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted or wrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing and assembling have been assessed for their role in causing mis-assembly by using different genome sequencing data. Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembled primary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simple unsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performing reasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that the proposed methodology may work as a complementary system to the existing tools to enhance their accuracy. PMID:27581937

  19. De Novo Sequencing of Heparan Sulfate Oligosaccharides by Electron-Activated Dissociation

    PubMed Central

    Huang, Yu; Yu, Xiang; Mao, Yang; Costello, Catherine E.; Zaia, Joseph; Lin, Cheng

    2014-01-01

    Structural characterization of highly sulfated glycosaminoglycans (GAGs) by collisionally activated dissociation (CAD) is challenging because of the extensive sulfate losses mediated by free protons. While removal of the free protons may be achieved through the use of derivatization, metal cation adducts, and/or electrospray supercharging reagents, these steps add complexity to the experimental workflow. It is therefore desirable to develop an analytical approach for GAG sequencing that does not require derivatization or addition of reagents to the electrospray solution. Electron detachment dissociation (EDD) can produce extensive and informative fragmentation for GAGs without the need to remove free protons from the precursor ions. However, EDD is an inefficient process, often requiring consumption of large sample quantities (typically several micrograms), particularly for highly sulfated GAG ions. Here, we report that with improved instrumentation, optimization of the ionization and ion transfer parameters, and enhanced EDD efficiency, it is possible to generate highly informative EDD spectra of highly sulfated GAGs on the liquid chromatography (LC) time-scale, with consumption of only a few nanograms of sample. We further show that negative electron transfer dissociation (NETD) is an even more effective fragmentation technique for GAG sequencing, producing fewer sulfate losses while consuming smaller amount of samples. Finally, a simple algorithm was developed for de novo HS sequencing based on their high resolution tandem mass spectra. These results demonstrate the potential of EDD and NETD as sensitive analytical tools for detailed, high-throughput, de novo structural analyses of highly sulfated GAGs. PMID:24224699

  20. Combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies.

    PubMed

    Rickert, Keith W; Grinberg, Luba; Woods, Robert M; Wilson, Susan; Bowen, Michael A; Baca, Manuel

    2016-01-01

    The enormous diversity created by gene recombination and somatic hypermutation makes de novo protein sequencing of monoclonal antibodies a uniquely challenging problem. Modern mass spectrometry-based sequencing will rarely, if ever, provide a single unambiguous sequence for the variable domains. A more likely outcome is computation of an ensemble of highly similar sequences that can satisfy the experimental data. This outcome can result in the need for empirical testing of many candidate sequences, sometimes iteratively, to identity one which can replicate the activity of the parental antibody. Here we describe an improved approach to antibody protein sequencing by using phage display technology to generate a combinatorial library of sequences that satisfy the mass spectrometry data, and selecting for functional candidates that bind antigen. This approach was used to reverse engineer 2 commercially-obtained monoclonal antibodies against murine CD137. Proteomic data enabled us to assign the majority of the variable domain sequences, with the exception of 3-5% of the sequence located within or adjacent to complementarity-determining regions. To efficiently resolve the sequence in these regions, small phage-displayed libraries were generated and subjected to antigen binding selection. Following enrichment of antigen-binding clones, 2 clones were selected for each antibody and recombinantly expressed as antigen-binding fragments (Fabs). In both cases, the reverse-engineered Fabs exhibited identical antigen binding affinity, within error, as Fabs produced from the commercial IgGs. This combination of proteomic and protein engineering techniques provides a useful approach to simplifying the technically challenging process of reverse engineering monoclonal antibodies from protein material.

  1. Genomic Resources for Water Yam (Dioscorea alata L.): Analyses of EST-Sequences, De Novo Sequencing and GBS Libraries

    PubMed Central

    Saski, Christopher A.; Bhattacharjee, Ranjana; Scheffler, Brian E.; Asiedu, Robert

    2015-01-01

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

  2. Genomic Resources for Water Yam (Dioscorea alata L.): Analyses of EST-Sequences, De Novo Sequencing and GBS Libraries.

    PubMed

    Saski, Christopher A; Bhattacharjee, Ranjana; Scheffler, Brian E; Asiedu, Robert

    2015-01-01

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

  3. Genomic Resources for Water Yam (Dioscorea alata L.): Analyses of EST-Sequences, De Novo Sequencing and GBS Libraries.

    PubMed

    Saski, Christopher A; Bhattacharjee, Ranjana; Scheffler, Brian E; Asiedu, Robert

    2015-01-01

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

  4. Partial de novo sequencing and unusual CID fragmentation of a 7 kDa, disulfide-bridged toxin.

    PubMed

    Medzihradszky, Katalin F; Bohlen, Christopher J

    2012-05-01

    A 7 kDa toxin isolated from the venom of the Texas coral snake (Micrurus tener tener) was subjected to collision-induced dissociation (CID) and electron-transfer dissociation (ETD) analyses both before and after reduction at low pH. Manual and automated approaches to de novo sequencing are compared in detail. Manual de novo sequencing utilizing the combination of high accuracy CID and ETD data and an acid-related cleavage yielded the N-terminal half of the sequence from the reduced species. The intact polypeptide, containing 3 disulfide bridges produced a series of unusual fragments in ion trap CID experiments: abundant internal amino acid losses were detected, and also one of the disulfide-linkage positions could be determined from fragments formed by the cleavage of two bonds. In addition, internal and c-type fragments were also observed.

  5. Using Illumina next generation sequencing technologies to sequence multigene families in de novo species.

    PubMed

    Hughes, Graham M; Gang, Li; Murphy, William J; Higgins, Desmond G; Teeling, Emma C

    2013-05-01

    The advent of Next Generation Sequencing Technology (NGST) has revolutionized molecular biology research, allowing for rapid gene/genome sequencing from a multitude of diverse species. As high throughput sequencing becomes more accessible, more efficient workflows must be developed to deal with the amounts of data produced and better assemble the genomes of de novo lineages. We combine traditional laboratory methods with Illumina NGST to amplify and sequence the largest mammalian multigene family, the Olfactory Receptor gene family, for species with and without a reference genome. We develop novel assembly methods to annotate and filter these data, which can be utilized for any gene family or any species. We find no significant difference between the ratio of genes within their respective gene families of our data compared with available genomic data. Using simulated data we explore the limitations of short-read sequence data and our assembly in recovering this gene family. We highlight the benefits and shortcomings of these methods. Compared with data generated from traditional polymerase chain reaction, cloning and Sanger sequencing methodologies, sequence data generated using our pipeline increases yield and sequencing efficiency without reducing the number of unique genes amplified. A cloning step is not required, therefore shortening data generation time. The novel downstream methodologies and workflows described provide a tool to be utilized by many fields of biology, to access and analyze the vast quantities of data generated. By combining laboratory and in silico methods, we provide a means of extracting genomic information for multigene families without complete genome sequencing. PMID:23480365

  6. A general approach for discriminative de novo motif discovery from high-throughput data

    PubMed Central

    Grau, Jan; Posch, Stefan; Grosse, Ivo; Keilwagen, Jens

    2013-01-01

    De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research. PMID:24057214

  7. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  8. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data

    PubMed Central

    Deng, Xutao; Naccache, Samia N.; Ng, Terry; Federman, Scot; Li, Linlin; Chiu, Charles Y.; Delwart, Eric L.

    2015-01-01

    Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches. PMID:25586223

  9. De novo sequencing of peptides from top-down tandem mass spectra

    SciTech Connect

    Vyatkina, Kira; Wu, Si; Dekker, Leendert J.; vanDuijn, Martijn M.; Liu, Xiaowen; Tolic, Nikola; Dvorkin, Mikhail; Alexandrova, Sonya; Luider, Theo N.; Pasa-Tolic, Ljiljana; Pevzner, Pavel A.

    2015-09-28

    De novo sequencing of proteins and peptides is one of the most important problems in mass spectrometry-driven proteomics. A variety of methods have been developed to accomplish this task from a set of bottom-up tandem (MS/MS) mass spectra. However, a more recently emerged top-down technology, now gaining more and more popularity, opens new perspectives for protein analysis and characterization, implying a need in efficient algorithms for processing this kind of MS/MS data. Here we describe a method that allows to retrieve from a set of top-down MS/MS spectra long and accurate sequence fragments of the proteins contained in a sample. To this end, we outline a strategy for generating high-quality sequence tags from top-down spectra, and introduce the concept of a T-Bruijn graph by adapting to the case of tags the notion of an A-Bruijn graph widely used in genomics. The output of the proposed approach represents the set of amino acid strings spelled out by optimal paths in the connected components of a T-Bruijn graph. We illustrate its performance on top-down datasets acquired from carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab.

  10. REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

    PubMed Central

    Chu, Chong; Nielsen, Rasmus; Wu, Yufeng

    2016-01-01

    Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo. PMID:26977803

  11. De novo sequencing and variant calling with nanopores using PoreSeq

    PubMed Central

    Szalay, Tamas; Golovchenko, Jene A.

    2016-01-01

    The single-molecule accuracy of nanopore sequencing has been an area of rapid academic and commercial advancement, but remains challenging for the de novo analysis of genomes. We introduce here a novel algorithm for the error correction of nanopore data, utilizing statistical models of the physical system in order to obtain high accuracy de novo sequences at a range of coverage depths. We demonstrate the technique by sequencing M13 bacteriophage DNA to 99% accuracy at moderate coverage as well as its use in an assembly pipeline by sequencing E. coli and λ DNA at a range of coverages. We also show the algorithm’s ability to accurately classify sequence variants at far lower coverage than existing methods. PMID:26352647

  12. De Novo Genome Sequence of Yersinia aleksiciae Y159T

    PubMed Central

    Neubauer, Heinrich

    2015-01-01

    We report here on the genome sequence of Yersinia aleksiciae Y159T, isolated in Finland in 1981. The genome has a size of 4 Mb, a G+C content of 49%, and is predicted to contain 3,423 coding sequences. PMID:26383649

  13. DIME: a novel framework for de novo metagenomic sequence assembly.

    PubMed

    Guo, Xuan; Yu, Ning; Ding, Xiaojun; Wang, Jianxin; Pan, Yi

    2015-02-01

    The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with low coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly framework, DIME, by taking the DIvide, conquer, and MErge strategies. In addition, we give two MapReduce implementations of DIME, DIME-cap3 and DIME-genovo, on Apache Hadoop platform. For a systematic comparison of the performance of the assembly tasks, we tested DIME and five other popular short read assembly programs, Cap3, Genovo, MetaVelvet, SOAPdenovo, and SPAdes on four synthetic and three real metagenomic sequence datasets with various reads from fifty thousand to a couple million in size. The experimental results demonstrate that our method not only partitions the sequence reads with an extremely high accuracy, but also reconstructs more bases, generates higher quality assembled consensus, and yields higher assembly scores, including corrected N50 and BLAST-score-per-base, than other tools with a nearly theoretical speed-up. Results indicate that DIME offers great improvement in assembly across a range of sequence abundances and thus is robust to decreasing coverage. PMID:25684202

  14. DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly

    PubMed Central

    Guo, Xuan; Yu, Ning; Ding, Xiaojun; Wang, Jianxin

    2015-01-01

    Abstract The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with low coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly framework, DIME, by taking the DIvide, conquer, and MErge strategies. In addition, we give two MapReduce implementations of DIME, DIME-cap3 and DIME-genovo, on Apache Hadoop platform. For a systematic comparison of the performance of the assembly tasks, we tested DIME and five other popular short read assembly programs, Cap3, Genovo, MetaVelvet, SOAPdenovo, and SPAdes on four synthetic and three real metagenomic sequence datasets with various reads from fifty thousand to a couple million in size. The experimental results demonstrate that our method not only partitions the sequence reads with an extremely high accuracy, but also reconstructs more bases, generates higher quality assembled consensus, and yields higher assembly scores, including corrected N50 and BLAST-score-per-base, than other tools with a nearly theoretical speed-up. Results indicate that DIME offers great improvement in assembly across a range of sequence abundances and thus is robust to decreasing coverage. PMID:25684202

  15. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2014-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  16. Proteomics of Soil and Sediment: Protein Identification by De Novo Sequencing of Mass Spectra Complements Traditional Database Searching

    NASA Astrophysics Data System (ADS)

    Miller, S.; Rizzo, A. I.; Waldbauer, J.

    2015-12-01

    Proteomics has the potential to elucidate the metabolic pathways and taxa responsible for in situ biogeochemical transformations. However, low rates of protein identification from high resolution mass spectra have been a barrier to the development of proteomics in complex environmental samples. Much of the difficulty lies in the computational challenge of linking mass spectra to their corresponding proteins. Traditional database search methods for matching peptide sequences to mass spectra are often inadequate due to the complexity of environmental proteomes and the large database search space, as we demonstrate with soil and sediment proteomes generated via a range of extraction methods. One alternative to traditional database searching is de novo sequencing, which identifies peptide sequences without the need for a database. BLAST can then be used to match de novo sequences to similar genetic sequences. Assigning confidence to putative identifications has been one hurdle for the implementation of de novo sequencing. We found that accurate de novo sequences can be screened by quality score and length. Screening criteria are verified by comparing the results of de novo sequencing and traditional database searching for well-characterized proteomes from simple biological systems. The BLAST hits of screened sequences are interrogated for taxonomic and functional information. We applied de novo sequencing to organic topsoil and marine sediment proteomes. Peak-rich proteomes, which can result from various extraction techniques, yield thousands of high-confidence protein identifications, an improvement over previous proteomic studies of soil and sediment. User-friendly software tools for de novo metaproteomics analysis have been developed. This "De Novo Analysis" Pipeline is also a faster method of data analysis than constructing a tailored sequence database for traditional database searching.

  17. De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing.

    PubMed

    Santos, Leonardo N; Silva, Eduardo S; Santos, André S; De Sá, Pablo H; Ramos, Rommel T; Silva, Artur; Cooper, Philip J; Barreto, Maurício L; Loureiro, Sebastião; Pinheiro, Carina S; Alcantara-Neves, Neuza M; Pacheco, Luis G C

    2016-07-01

    Infection with helminthic parasites, including the soil-transmitted helminth Trichuris trichiura (human whipworm), has been shown to modulate host immune responses and, consequently, to have an impact on the development and manifestation of chronic human inflammatory diseases. De novo derivation of helminth proteomes from sequencing of transcriptomes will provide valuable data to aid identification of parasite proteins that could be evaluated as potential immunotherapeutic molecules in near future. Herein, we characterized the transcriptome of the adult stage of the human whipworm T. trichiura, using next-generation sequencing technology and a de novo assembly strategy. Nearly 17.6 million high-quality clean reads were assembled into 6414 contiguous sequences, with an N50 of 1606bp. In total, 5673 protein-encoding sequences were confidentially identified in the T. trichiura adult worm transcriptome; of these, 1013 sequences represent potential newly discovered proteins for the species, most of which presenting orthologs already annotated in the related species T. suis. A number of transcripts representing probable novel non-coding transcripts for the species T. trichiura were also identified. Among the most abundant transcripts, we found sequences that code for proteins involved in lipid transport, such as vitellogenins, and several chitin-binding proteins. Through a cross-species expression analysis of gene orthologs shared by T. trichiura and the closely related parasites T. suis and T. muris it was possible to find twenty-six protein-encoding genes that are consistently highly expressed in the adult stages of the three helminth species. Additionally, twenty transcripts could be identified that code for proteins previously detected by mass spectrometry analysis of protein fractions of the whipworm somatic extract that present immunomodulatory activities. Five of these transcripts were amongst the most highly expressed protein-encoding sequences in the T

  18. De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing.

    PubMed

    Santos, Leonardo N; Silva, Eduardo S; Santos, André S; De Sá, Pablo H; Ramos, Rommel T; Silva, Artur; Cooper, Philip J; Barreto, Maurício L; Loureiro, Sebastião; Pinheiro, Carina S; Alcantara-Neves, Neuza M; Pacheco, Luis G C

    2016-07-01

    Infection with helminthic parasites, including the soil-transmitted helminth Trichuris trichiura (human whipworm), has been shown to modulate host immune responses and, consequently, to have an impact on the development and manifestation of chronic human inflammatory diseases. De novo derivation of helminth proteomes from sequencing of transcriptomes will provide valuable data to aid identification of parasite proteins that could be evaluated as potential immunotherapeutic molecules in near future. Herein, we characterized the transcriptome of the adult stage of the human whipworm T. trichiura, using next-generation sequencing technology and a de novo assembly strategy. Nearly 17.6 million high-quality clean reads were assembled into 6414 contiguous sequences, with an N50 of 1606bp. In total, 5673 protein-encoding sequences were confidentially identified in the T. trichiura adult worm transcriptome; of these, 1013 sequences represent potential newly discovered proteins for the species, most of which presenting orthologs already annotated in the related species T. suis. A number of transcripts representing probable novel non-coding transcripts for the species T. trichiura were also identified. Among the most abundant transcripts, we found sequences that code for proteins involved in lipid transport, such as vitellogenins, and several chitin-binding proteins. Through a cross-species expression analysis of gene orthologs shared by T. trichiura and the closely related parasites T. suis and T. muris it was possible to find twenty-six protein-encoding genes that are consistently highly expressed in the adult stages of the three helminth species. Additionally, twenty transcripts could be identified that code for proteins previously detected by mass spectrometry analysis of protein fractions of the whipworm somatic extract that present immunomodulatory activities. Five of these transcripts were amongst the most highly expressed protein-encoding sequences in the T

  19. De novo assembly and characterization of the carrot mitochondrial genome using next generation sequencing data from whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequence analysis of organelle genomes has revealed important aspects of plant cell evolution. The scope of this study was to develop an approach for de novo assembly of the carrot mitochondrial genome using next generation sequence data from total genomic DNA. Sequencing data from a carrot 454 whol...

  20. De Novo Centromere Formation and Centromeric Sequence Expansion in Wheat and its Wide Hybrids

    PubMed Central

    Fu, Shulan; Wang, Jing; Zhang, Xiangqi; Hu, Zanmin; Han, Fangpu

    2016-01-01

    Centromeres typically contain tandem repeat sequences, but centromere function does not necessarily depend on these sequences. We identified functional centromeres with significant quantitative changes in the centromeric retrotransposons of wheat (CRW) contents in wheat aneuploids (Triticum aestivum) and the offspring of wheat wide hybrids. The CRW signals were strongly reduced or essentially lost in some wheat ditelosomic lines and in the addition lines from the wide hybrids. The total loss of the CRW sequences but the presence of CENH3 in these lines suggests that the centromeres were formed de novo. In wheat and its wide hybrids, which carry large complex genomes or no sequenced genome, we performed CENH3-ChIP-dot-blot methods alone or in combination with CENH3-ChIP-seq and identified the ectopic genomic sequences present at the new centromeres. In adcdition, the transcription of the identified DNA sequences was remarkably increased at the new centromere, suggesting that the transcription of the corresponding sequences may be associated with de novo centromere formation. Stable alien chromosomes with two and three regions containing CRW sequences induced by centromere breakage were observed in the wheat-Th. elongatum hybrid derivatives, but only one was a functional centromere. In wheat-rye (Secale cereale) hybrids, the rye centromere-specific sequences spread along the chromosome arms and may have caused centromere expansion. Frequent and significant quantitative alterations in the centromere sequence via chromosomal rearrangement have been systematically described in wheat wide hybridizations, which may affect the retention or loss of the alien chromosomes in the hybrids. Thus, the centromere behavior in wide crosses likely has an important impact on the generation of biodiversity, which ultimately has implications for speciation. PMID:27110907

  1. Long-read sequencing and de novo assembly of a Chinese genome

    PubMed Central

    Shi, Lingling; Guo, Yunfei; Dong, Chengliang; Huddleston, John; Yang, Hui; Han, Xiaolu; Fu, Aisi; Li, Quan; Li, Na; Gong, Siyi; Lintner, Katherine E.; Ding, Qiong; Wang, Zou; Hu, Jiang; Wang, Depeng; Wang, Feng; Wang, Lin; Lyon, Gholson J.; Guan, Yongtao; Shen, Yufeng; Evgrafov, Oleg V.; Knowles, James A.; Thibaud-Nissen, Francoise; Schneider, Valerie; Yu, Chack-Yung; Zhou, Libing; Eichler, Evan E.; So, Kwok-Fai; Wang, Kai

    2016-01-01

    Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations. PMID:27356984

  2. Long-read sequencing and de novo assembly of a Chinese genome.

    PubMed

    Shi, Lingling; Guo, Yunfei; Dong, Chengliang; Huddleston, John; Yang, Hui; Han, Xiaolu; Fu, Aisi; Li, Quan; Li, Na; Gong, Siyi; Lintner, Katherine E; Ding, Qiong; Wang, Zou; Hu, Jiang; Wang, Depeng; Wang, Feng; Wang, Lin; Lyon, Gholson J; Guan, Yongtao; Shen, Yufeng; Evgrafov, Oleg V; Knowles, James A; Thibaud-Nissen, Francoise; Schneider, Valerie; Yu, Chack-Yung; Zhou, Libing; Eichler, Evan E; So, Kwok-Fai; Wang, Kai

    2016-01-01

    Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations. PMID:27356984

  3. A Proteomic Workflow Using High-Throughput De Novo Sequencing Towards Complementation of Genome Information for Improved Comparative Crop Science.

    PubMed

    Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie

    2016-01-01

    The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.

  4. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  5. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    PubMed

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher  .

  6. Sequencing, de novo assembly and comparative analysis of Raphanus sativus transcriptome.

    PubMed

    Wu, Gang; Zhang, Libin; Yin, Yongtai; Wu, Jiangsheng; Yu, Longjiang; Zhou, Yanhong; Li, Maoteng

    2015-01-01

    Raphanus sativus is an important Brassicaceae plant and also an edible vegetable with great economic value. However, currently there is not enough transcriptome information of R. sativus tissues, which impedes further functional genomics research on R. sativus. In this study, RNA-seq technology was employed to characterize the transcriptome of leaf tissues. Approximately 70 million clean pair-end reads were obtained and used for de novo assembly by Trinity program, which generated 68,086 unigenes with an average length of 576 bp. All the unigenes were annotated against GO and KEGG databases. In the meanwhile, we merged leaf sequencing data with existing root sequencing data and obtained better de novo assembly of R. sativus using Oases program. Accordingly, potential simple sequence repeats (SSRs), transcription factors (TFs) and enzyme codes were identified in R. sativus. Additionally, we detected a total of 3563 significantly differentially expressed genes (DEGs, P = 0.05) and tissue-specific biological processes between leaf and root tissues. Furthermore, a TFs-based regulation network was constructed using Cytoscape software. Taken together, these results not only provide a comprehensive genomic resource of R. sativus but also shed light on functional genomic and proteomic research on R. sativus in the future. PMID:26029219

  7. CYCLONE—A Utility for De Novo Sequencing of Microbial Cyclic Peptides

    NASA Astrophysics Data System (ADS)

    Kavan, Daniel; Kuzma, Marek; Lemr, Karel; Schug, Kevin A.; Havlicek, Vladimir

    2013-08-01

    We have developed a de novo sequencing software tool (CYCLONE) and applied it for determination of cyclic peptides. The program uses a non-redundant database of 312 nonribosomal building blocks identified to date in bacteria and fungi (more than 230 additional residues in the database list were isobaric). The software was used to fully characterize the tandem mass spectrum of several cyclic peptides and provide sequence tags. The general strategy of the script was based on fragment ion pre-characterization to accomplish unambiguous b-ion series assignments. Showcase examples were a cyclic tetradepsipeptide beauverolide, a cyclic hexadepsipeptide roseotoxin A, a lasso-like hexapeptide pseudacyclin A, and a cyclic undecapeptide cyclosporin A. The extent of ion scrambling in smaller peptides was as low as 5 % of total ion current; this demonstrated the feasibility of CYCLONE de novo sequencing. The robustness of the script was also tested against database sets of various sizes and isotope-containing data. It can be downloaded from the http://ms.biomed.cas.cz/MSTools/.

  8. Sequence analysis of two de novo mutation alleles at the DXS10011 locus.

    PubMed

    Tamura, Akiyoshi; Iwata, Misa; Takase, Izumi; Miyazaki, Tokiko; Matsui, Kiyoshi; Nishio, Hajime; Suzuki, Koichi

    2003-09-01

    We have detected two unusual alleles at the DXS10011 locus in two paternity trio cases. In one case, one allele of the daughter was found not to have been derived from the mother but the other allele was shared with the father. In the other case, the mother and the son shared no bands. Paternity in both cases was established using conventional polymorphic markers in addition to DNA markers (probabilities: >0.999999). Sequencing showed that the two de novo alleles of the children acquired a single unit (GAAA).

  9. DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data

    PubMed Central

    Tsuji, Junko; Weng, Zhiping

    2016-01-01

    With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were “ready-to-map” small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies. PMID:27736901

  10. Sequencing and de novo transcriptome assembly of Brachypodium sylvaticum (Poaceae)1

    PubMed Central

    Fox, Samuel E.; Preece, Justin; Kimbrel, Jeffrey A.; Marchini, Gina L.; Sage, Abigail; Youens-Clark, Ken; Cruzan, Mitchell B.; Jaiswal, Pankaj

    2013-01-01

    • Premise of the study: We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome) accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. • Methods and Results: More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genotypes were assembled into 120,091 (Corvallis), 104,950 (Spain), and 177,682 (Greece) transcript contigs. In comparison with the B. distachyon Bd21 reference genome and GenBank protein sequences, we estimate >90% exome coverage for B. sylvaticum. The transcripts were assigned Gene Ontology and InterPro annotations. Brachypodium sylvaticum sequence reads aligned against the Bd21 genome revealed 394,654 single-nucleotide polymorphisms (SNPs) and >20,000 simple sequence repeat (SSR) DNA sites. • Conclusions: To our knowledge, this is the first report of transcriptome sequencing of invasive plant species with a closely related sequenced reference genome. The sequences and identified SNP variant and SSR sites will provide tools for developing novel genetic markers for use in genotyping and characterization of invasive behavior of B. sylvaticum. PMID:25202520

  11. CycloBranch: De Novo Sequencing of Nonribosomal Peptides from Accurate Product Ion Mass Spectra

    NASA Astrophysics Data System (ADS)

    Novák, Jiří; Lemr, Karel; Schug, Kevin A.; Havlíček, Vladimír

    2015-07-01

    Nonribosomal peptides have a wide range of biological and medical applications. Their identification by tandem mass spectrometry remains a challenging task. A new open-source de novo peptide identification engine CycloBranch was developed and successfully applied in identification or detailed characterization of 11 linear, cyclic, branched, and branch-cyclic peptides. CycloBranch is based on annotated building block databases the size of which is defined by the user according to ribosomal or nonribosomal peptide origin. The current number of involved nonisobaric and isobaric building blocks is 287 and 521, respectively. Contrary to all other peptide sequencing tools utilizing either peptide libraries or peptide fragment libraries, CycloBranch represents a true de novo sequencing engine developed for accurate mass spectrometric data. It is a stand-alone and cross-platform application with a graphical and user-friendly interface; it supports mzML, mzXML, mgf, txt, and baf file formats and can be run in parallel on multiple threads. It can be downloaded for free from http://ms.biomed.cas.cz/cyclobranch/, where the User's manual and video tutorials can be found.

  12. De novo sequences of Haloquadratum walsbyi from Lake Tyrrell, Australia, reveal a variable genomic landscape.

    PubMed

    Tully, Benjamin J; Emerson, Joanne B; Andrade, Karen; Brocks, Jochen J; Allen, Eric E; Banfield, Jillian F; Heidelberg, Karla B

    2015-01-01

    Hypersaline systems near salt saturation levels represent an extreme environment, in which organisms grow and survive near the limits of life. One of the abundant members of the microbial communities in hypersaline systems is the square archaeon, Haloquadratum walsbyi. Utilizing a short-read metagenome from Lake Tyrrell, a hypersaline ecosystem in Victoria, Australia, we performed a comparative genomic analysis of H. walsbyi to better understand the extent of variation between strains/subspecies. Results revealed that previously isolated strains/subspecies do not fully describe the complete repertoire of the genomic landscape present in H. walsbyi. Rearrangements, insertions, and deletions were observed for the Lake Tyrrell derived Haloquadratum genomes and were supported by environmental de novo sequences, including shifts in the dominant genomic landscape of the two most abundant strains. Analysis pertaining to halomucins indicated that homologs for this large protein are not a feature common for all species of Haloquadratum. Further, we analyzed ATP-binding cassette transporters (ABC-type transporters) for evidence of niche partitioning between different strains/subspecies. We were able to identify unique and variable transporter subunits from all five genomes analyzed and the de novo environmental sequences, suggesting that differences in nutrient and carbon source acquisition may play a role in maintaining distinct strains/subspecies.

  13. A Real-Time de novo DNA Sequencing Assembly Platform Based on an FPGA Implementation.

    PubMed

    Hu, Yuanqi; Georgiou, Pantelis

    2016-01-01

    This paper presents an FPGA based DNA comparison platform which can be run concurrently with the sensing phase of DNA sequencing and shortens the overall time needed for de novo DNA assembly. A hybrid overlap searching algorithm is applied which is scalable and can deal with incremental detection of new bases. To handle the incomplete data set which gradually increases during sequencing time, all-against-all comparisons are broken down into successive window-against-window comparison phases and executed using a novel dynamic suffix comparison algorithm combined with a partitioned dynamic programming method. The complete system has been designed to facilitate parallel processing in hardware, which allows real-time comparison and full scalability as well as a decrease in the number of computations required. A base pair comparison rate of 51.2 G/s is achieved when implemented on an FPGA with successful DNA comparison when using data sets from real genomes.

  14. A Real-Time de novo DNA Sequencing Assembly Platform Based on an FPGA Implementation.

    PubMed

    Hu, Yuanqi; Georgiou, Pantelis

    2016-01-01

    This paper presents an FPGA based DNA comparison platform which can be run concurrently with the sensing phase of DNA sequencing and shortens the overall time needed for de novo DNA assembly. A hybrid overlap searching algorithm is applied which is scalable and can deal with incremental detection of new bases. To handle the incomplete data set which gradually increases during sequencing time, all-against-all comparisons are broken down into successive window-against-window comparison phases and executed using a novel dynamic suffix comparison algorithm combined with a partitioned dynamic programming method. The complete system has been designed to facilitate parallel processing in hardware, which allows real-time comparison and full scalability as well as a decrease in the number of computations required. A base pair comparison rate of 51.2 G/s is achieved when implemented on an FPGA with successful DNA comparison when using data sets from real genomes. PMID:27045828

  15. Exome sequencing of case-unaffected-parents trios reveals recessive and de novo genetic variants in sporadic ALS

    PubMed Central

    Steinberg, Karyn Meltz; Yu, Bing; Koboldt, Daniel C.; Mardis, Elaine R.; Pamphlett, Roger

    2015-01-01

    The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease. PMID:25773295

  16. Exome sequencing for bipolar disorder points to roles of de novo loss-of-function and protein-altering mutations.

    PubMed

    Kataoka, M; Matoba, N; Sawada, T; Kazuno, A-A; Ishiwata, M; Fujii, K; Matsuo, K; Takata, A; Kato, T

    2016-07-01

    Although numerous genetic studies have been conducted for bipolar disorder (BD), its genetic architecture remains elusive. Here we perform, to the best of our knowledge, the first trio-based exome sequencing study for BD to investigate potential roles of de novo mutations in the disease etiology. We identified 71 de novo point mutations and one de novo copy-number mutation in 79 BD probands. Among the genes hit by de novo loss-of-function (LOF; nonsense, splice site or frameshift) or protein-altering (LOF, missense and inframe indel) mutations, we found significant enrichment of genes highly intolerant (first percentile of intolerant genes assessed by Residual Variation Intolerance Score) to protein-altering variants in general population, an observation that is also reported in autism and schizophrenia. When we performed a joint analysis using the data of schizoaffective disorder in published studies, we found global enrichment of de novo LOF and protein-altering mutations in the combined group of bipolar I and schizoaffective disorders. Considering relationship between de novo mutations and clinical phenotypes, we observed significantly earlier disease onset among the BD probands with de novo protein-altering mutations when compared with non-carriers. Gene ontology enrichment analysis of genes hit by de novo protein-altering mutations in bipolar I and schizoaffective disorders did not identify any significant enrichment. These results of exploratory analyses collectively point to the roles of de novo LOF and protein-altering mutations in the etiology of bipolar disorder and warrant further large-scale studies. PMID:27217147

  17. Exome sequencing for bipolar disorder points to roles of de novo loss-of-function and protein-altering mutations.

    PubMed

    Kataoka, M; Matoba, N; Sawada, T; Kazuno, A-A; Ishiwata, M; Fujii, K; Matsuo, K; Takata, A; Kato, T

    2016-07-01

    Although numerous genetic studies have been conducted for bipolar disorder (BD), its genetic architecture remains elusive. Here we perform, to the best of our knowledge, the first trio-based exome sequencing study for BD to investigate potential roles of de novo mutations in the disease etiology. We identified 71 de novo point mutations and one de novo copy-number mutation in 79 BD probands. Among the genes hit by de novo loss-of-function (LOF; nonsense, splice site or frameshift) or protein-altering (LOF, missense and inframe indel) mutations, we found significant enrichment of genes highly intolerant (first percentile of intolerant genes assessed by Residual Variation Intolerance Score) to protein-altering variants in general population, an observation that is also reported in autism and schizophrenia. When we performed a joint analysis using the data of schizoaffective disorder in published studies, we found global enrichment of de novo LOF and protein-altering mutations in the combined group of bipolar I and schizoaffective disorders. Considering relationship between de novo mutations and clinical phenotypes, we observed significantly earlier disease onset among the BD probands with de novo protein-altering mutations when compared with non-carriers. Gene ontology enrichment analysis of genes hit by de novo protein-altering mutations in bipolar I and schizoaffective disorders did not identify any significant enrichment. These results of exploratory analyses collectively point to the roles of de novo LOF and protein-altering mutations in the etiology of bipolar disorder and warrant further large-scale studies.

  18. De Novo Sequencing and Characterization of the Floral Transcriptome of Dendrocalamus latiflorus (Poaceae: Bambusoideae)

    PubMed Central

    Li, De-Zhu; Guo, Zhen-Hua

    2012-01-01

    Background Transcriptome sequencing can be used to determine gene sequences and transcript abundance in non-model species, and the advent of next-generation sequencing (NGS) technologies has greatly decreased the cost and time required for this process. Transcriptome data are especially desirable in bamboo species, as certain members constitute an economically and culturally important group of mostly semelparous plants with remarkable flowering features, yet little bamboo genomic research has been performed. Here we present, for the first time, extensive sequence and transcript abundance data for the floral transcriptome of a key bamboo species, Dendrocalamus latiflorus, obtained using the Illumina GAII sequencing platform. Our further goal was to identify patterns of gene expression during bamboo flower development. Results Approximately 96 million sequencing reads were generated and assembled de novo, yielding 146,395 high quality unigenes with an average length of 461 bp. Of these, 80,418 were identified as putative homologs of annotated sequences in the public protein databases, of which 290 were associated with the floral transition and 47 were related to flower development. Digital abundance analysis identified 26,529 transcripts differentially enriched between two developmental stages, young flower buds and older developing flowers. Unigenes found at each stage were categorized according to their putative functional categories. These sequence and putative function data comprise a resource for future investigation of the floral transition and flower development in bamboo species. Conclusions Our results present the first broad survey of a bamboo floral transcriptome. Although it will be necessary to validate the functions carried out by these genes, these results represent a starting point for future functional research on D. latiflorus and related species. PMID:22916120

  19. De novo Sequencing, Characterization, and Comparison of Inflorescence Transcriptomes of Cornus canadensis and C. florida (Cornaceae)

    PubMed Central

    Zhang, Jian; Franks, Robert G.; Liu, Xiang; Kang, Ming; Keebler, Jonathan E. M.; Schaff, Jennifer E.; Huang, Hong-Wen; Xiang, Qiu-Yun (Jenny)

    2013-01-01

    Background Transcriptome sequencing analysis is a powerful tool in molecular genetics and evolutionary biology. Here we report the results of de novo 454 sequencing, characterization, and comparison of inflorescence transcriptomes of two closely related dogwood species, Cornus canadensis and C. florida (Cornaceae). Our goals were to build a preliminary source of genome sequence data, and to identify genes potentially expressed differentially between the inflorescence transcriptomes for these important horticultural species. Results The sequencing of cDNAs from inflorescence buds of C. canadensis (cc) and C. florida (cf), and normalized cDNAs from leaves of C. canadensis resulted in 251799 (ccBud), 96245 (ccLeaf) and 114648 (cfBud) raw reads, respectively. The de novo assembly of the high quality (HQ) reads resulted in 36088, 17802 and 21210 unigenes for ccBud, ccLeaf and cfBud. A reference transcriptome for C. canadensis was built by assembling HQ reads of ccBud and ccLeaf, containing 40884 unigenes. Reference mapping and comparative analyses found 10926 sequences were putatively specific to ccBud, and 6979 putatively specific to cfBud. Putative differentially expressed genes between ccBud and cfBud that are related to flower development and/or stress response were identified among 7718 shared sequences by ccBud and cfBud. Bi-directional BLAST found 87 (41.83% of 208) of Arabidopsis genes related to inflorescence development had putative orthologs in the dogwood transcriptomes. Comparisons of the shared sequences by ccBud and cfBud yielded 65931 high quality SNPs between two species. The twenty unigenes with the most SNPs are listed as potential genetic markers for evolutionary studies. Conclusions The data provide an important, although preliminary, information platform for functional genomics and evolutionary developmental biology in Cornus. The study identified putative candidates potentially involved in the genetic regulation of inflorescence evolution and

  20. De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics.

    PubMed

    Adamidi, Catherine; Wang, Yongbo; Gruen, Dominic; Mastrobuoni, Guido; You, Xintian; Tolle, Dominic; Dodt, Matthias; Mackowiak, Sebastian D; Gogol-Doering, Andreas; Oenal, Pinar; Rybak, Agnieszka; Ross, Eric; Sánchez Alvarado, Alejandro; Kempa, Stefan; Dieterich, Christoph; Rajewsky, Nikolaus; Chen, Wei

    2011-07-01

    Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.

  1. De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics

    PubMed Central

    Adamidi, Catherine; Wang, Yongbo; Gruen, Dominic; Mastrobuoni, Guido; You, Xintian; Tolle, Dominic; Dodt, Matthias; Mackowiak, Sebastian D.; Gogol-Doering, Andreas; Oenal, Pinar; Rybak, Agnieszka; Ross, Eric; Alvarado, Alejandro Sánchez; Kempa, Stefan; Dieterich, Christoph; Rajewsky, Nikolaus; Chen, Wei

    2011-01-01

    Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5′ and 3′ ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%–13%, 0%–3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database. PMID:21536722

  2. A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny

    PubMed Central

    Pucker, Boas; Holtgräwe, Daniela; Rosleff Sörensen, Thomas; Stracke, Ralf; Viehöver, Prisca

    2016-01-01

    Arabidopsis thaliana is the most important model organism for fundamental plant biology. The genome diversity of different accessions of this species has been intensively studied, for example in the 1001 genome project which led to the identification of many small nucleotide polymorphisms (SNPs) and small insertions and deletions (InDels). In addition, presence/absence variation (PAV), copy number variation (CNV) and mobile genetic elements contribute to genomic differences between A. thaliana accessions. To address larger genome rearrangements between the A. thaliana reference accession Columbia-0 (Col-0) and another accession of about average distance to Col-0, we created a de novo next generation sequencing (NGS)-based assembly from the accession Niederzenz-1 (Nd-1). The result was evaluated with respect to assembly strategy and synteny to Col-0. We provide a high quality genome sequence of the A. thaliana accession (Nd-1, LXSY01000000). The assembly displays an N50 of 0.590 Mbp and covers 99% of the Col-0 reference sequence. Scaffolds from the de novo assembly were positioned on the basis of sequence similarity to the reference. Errors in this automatic scaffold anchoring were manually corrected based on analyzing reciprocal best BLAST hits (RBHs) of genes. Comparison of the final Nd-1 assembly to the reference revealed duplications and deletions (PAV). We identified 826 insertions and 746 deletions in Nd-1. Randomly selected candidates of PAV were experimentally validated. Our Nd-1 de novo assembly allowed reliable identification of larger genic and intergenic variants, which was difficult or error-prone by short read mapping approaches alone. While overall sequence similarity as well as synteny is very high, we detected short and larger (affecting more than 100 bp) differences between Col-0 and Nd-1 based on bi-directional comparisons. The de novo assembly provided here and additional assemblies that will certainly be published in the future will allow to

  3. Transcriptome Sequencing and De Novo Analysis for Yesso Scallop (Patinopecten yessoensis) Using 454 GS FLX

    PubMed Central

    Hou, Rui; Bao, Zhenmin; Wang, Shan; Su, Hailin; Li, Yan; Du, Huixia; Hu, Jingjie; Wang, Shi; Hu, Xiaoli

    2011-01-01

    Background Bivalves comprise 30,000 extant species, constituting the second largest group of mollusks. However, limited genetic research has focused on this group of animals so far, which is, in part, due to the lack of genomic resources. The advent of high-throughput sequencing technologies enables generation of genomic resources in a short time and at a minimal cost, and therefore provides a turning point for bivalve research. In the present study, we performed de novo transcriptome sequencing to first produce a comprehensive expressed sequence tag (EST) dataset for the Yesso scallop (Patinopecten yessoensis). Results In a single 454 sequencing run, 805,330 reads were produced and then assembled into 32,590 contigs, with about six-fold sequencing coverage. A total of 25,237 unique protein-coding genes were identified from a variety of developmental stages and adult tissues based on sequence similarities with known proteins. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Transcripts putatively involved in growth, reproduction and stress/immune-response were identified. More than 49,000 single nucleotide polymorphisms (SNPs) and 2,700 simple sequence repeats (SSRs) were also detected. Conclusion Our data provide the most comprehensive transcriptomic resource currently available for P. yessoensis. Candidate genes potentially involved in growth, reproduction, and stress/immunity-response were identified, and are worthy of further investigation. A large number of SNPs and SSRs were also identified and ready for marker development. This resource should lay an important foundation for future genetic or genomic studies on this species. PMID:21720557

  4. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

    PubMed

    Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  5. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

    PubMed

    Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768

  6. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    PubMed Central

    Meena, Seema; Kumar, Sarma R.; Venkata Rao, D. K.; Dwivedi, Varun; Shilpashree, H. B.; Rastogi, Shubhra; Shasany, Ajit K.; Nagegowda, Dinesh A.

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768

  7. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes.

    PubMed

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-12-10

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant-pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future.

  8. Sequencing and De Novo Assembly of the Gonadal Transcriptome of the Endangered Chinese Sturgeon (Acipenser sinensis)

    PubMed Central

    Du, Hao; Zhang, Shuhuan; Wei, Qiwei

    2015-01-01

    Background The Chinese sturgeon (Acipenser sinensis) is endangered through anthropogenic activities including over-fishing, damming, shipping, and pollution. Controlled reproduction has been adopted and successfully conducted for conservation. However, little information is available on the reproductive regulation of the species. In this study, we conducted de novo transcriptome assembly of the gonad tissue to create a comprehensive dataset for A. sinensis. Results The Illumina sequencing platform was adopted to obtain 47,333,701 and 47,229,705 high quality reads from testis and ovary cDNA libraries generated from three-year-old A. sinensis. We identified 86,027 unigenes of which 30,268 were annotated in the NCBI non-redundant protein database and 28,281 were annotated in the Swiss-prot database. Among the annotated unigenes, 26,152 and 7,734 unigenes, respectively, were assigned to gene ontology categories and clusters of orthologous groups. In addition, 12,557 unigenes were mapped to 231 pathways in the Kyoto Encyclopedia of Genes and Genomes Pathway database. A total of 1,896 unigenes, potentially differentially expressed between the two gonad types, were found, with 1,894 predicted to be up-regulated in ovary and only two in testis. Fifty-five potential gametogenesis-related genes were screened in the transcriptome and 34 genes with significant matches were found. Besides, more paralogs of 11 genes in three gene families (sox, apolipoprotein and cyclin) were found in A. sinensis compared to their orthologs in the diploid Danio rerio. In addition, 12,151 putative simple sequence repeats (SSRs) were detected. Conclusions This study provides the first de novo transcriptome analysis currently available for A. sinensis. The transcriptomic data represents the fundamental resource for future research on the mechanism of early gametogenesis in sturgeons. The SSRs identified in this work will be valuable for assessment of genetic diversity of wild fish and genealogy

  9. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes

    PubMed Central

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-01-01

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant–pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future. PMID:26690414

  10. Mining Novel Allergens from Coconut Pollen Employing Manual De Novo Sequencing and Homology-Driven Proteomics.

    PubMed

    Saha, Bodhisattwa; Sircar, Gaurab; Pandey, Naren; Gupta Bhattacharya, Swati

    2015-11-01

    Coconut pollen, one of the major palm pollen grains is an important constituent among vectors of inhalant allergens in India and a major sensitizer for respiratory allergy in susceptible patients. To gain insight into its allergenic components, pollen proteins were analyzed by two-dimensional electrophoresis, immunoblotted with coconut pollen sensitive patient sera, followed by mass spectrometry of IgE reactive proteins. Coconut being largely unsequenced, a proteomic workflow has been devised that combines the conventional database-dependent analysis of tandem mass spectral data and manual de novo sequencing followed by a homology-based search for identifying the allergenic proteins. N-terminal acetylation helped to distinguish "b" ions from others, facilitating reliable sequencing. This led to the identification of 12 allergenic proteins. Cluster analysis with individual patient sera recognized vicilin-like protein as a major allergen, which was purified to assess its in vitro allergenicity and then partially sequenced. Other IgE-sensitive spots showed significant homology with well-known allergenic proteins such as 11S globulin, enolase, and isoflavone reductase along with a few which are reported as novel allergens. The allergens identified can be used as potential candidates to develop hypoallergenic vaccines, to design specific immunotherapy trials, and to enrich the repertoire of existing IgE reactive proteins.

  11. Mining Novel Allergens from Coconut Pollen Employing Manual De Novo Sequencing and Homology-Driven Proteomics.

    PubMed

    Saha, Bodhisattwa; Sircar, Gaurab; Pandey, Naren; Gupta Bhattacharya, Swati

    2015-11-01

    Coconut pollen, one of the major palm pollen grains is an important constituent among vectors of inhalant allergens in India and a major sensitizer for respiratory allergy in susceptible patients. To gain insight into its allergenic components, pollen proteins were analyzed by two-dimensional electrophoresis, immunoblotted with coconut pollen sensitive patient sera, followed by mass spectrometry of IgE reactive proteins. Coconut being largely unsequenced, a proteomic workflow has been devised that combines the conventional database-dependent analysis of tandem mass spectral data and manual de novo sequencing followed by a homology-based search for identifying the allergenic proteins. N-terminal acetylation helped to distinguish "b" ions from others, facilitating reliable sequencing. This led to the identification of 12 allergenic proteins. Cluster analysis with individual patient sera recognized vicilin-like protein as a major allergen, which was purified to assess its in vitro allergenicity and then partially sequenced. Other IgE-sensitive spots showed significant homology with well-known allergenic proteins such as 11S globulin, enolase, and isoflavone reductase along with a few which are reported as novel allergens. The allergens identified can be used as potential candidates to develop hypoallergenic vaccines, to design specific immunotherapy trials, and to enrich the repertoire of existing IgE reactive proteins. PMID:26426307

  12. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome

    PubMed Central

    Goodwin, Sara; Gurtowski, James; Ethe-Sayers, Scott; Deshpande, Panchajanya; Schatz, Michael C.; McCombie, W. Richard

    2015-01-01

    Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5–50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly. PMID:26447147

  13. De novo Ixodes ricinus salivary gland transcriptome analysis using two next-generation sequencing methodologies

    PubMed Central

    Schwarz, Alexandra; von Reumont, Björn M.; Erhart, Jan; Chagas, Andrezza C.; Ribeiro, José M. C.; Kotsyfakis, Michalis

    2013-01-01

    Tick salivary gland (SG) proteins possess powerful pharmacologic properties that facilitate tick feeding and pathogen transmission. For the first time, SG transcriptomes of Ixodes ricinus, an important disease vector for humans and animals, were analyzed using next-generation sequencing. SGs were collected from different tick life stages fed on various animal species, including cofeeding of nymphs and adults on the same host. Four cDNA samples were sequenced, discriminating tick SG transcriptomes of early- and late-feeding nymphs or adults. In total, 441,381,454 pyrosequencing reads and 67,703,183 Illumina reads were assembled into 272,220 contigs, of which 34,560 extensively annotated coding sequences are disclosed; 8686 coding sequences were submitted to GenBank. Overall, 13% of contigs were classified as secreted proteins that showed significant differences in the transcript representation among the 4 SG samples, including high numbers of sample-specific transcripts. Detailed phylogenetic reconstructions of two relatively abundant SG-secreted protein families demonstrated how this study improves our understanding of the molecular evolution of hematophagy in arthropods. Our data significantly increase the available genomic information for I. ricinus and form a solid basis for future tick genome/transcriptome assemblies and the functional analysis of effectors that mediate the feeding physiology and parasite-vector interaction of I. ricinus.—Schwarz, A., von Reumont, B.M., Erhart, J., Chagas, A.C., Ribeiro, J.M.C., Kotsyfakis, M. De novo Ixodes ricinus salivary gland transcriptome analysis using two next-generation sequencing methodologies. PMID:23964076

  14. Sequencing and De Novo Assembly of the Transcriptome of the Glassy-Winged Sharpshooter (Homalodisca vitripennis)

    PubMed Central

    Nandety, Raja Sekhar; Kamita, Shizuo G.; Hammock, Bruce D.; Falk, Bryce W.

    2013-01-01

    Background The glassy-winged sharpshooter Homalodisca vitripennis (Hemiptera: Cicadellidae), is a xylem-feeding leafhopper and important vector of the bacterium Xylella fastidiosa; the causal agent of Pierce’s disease of grapevines. The functional complexity of the transcriptome of H. vitripennis has not been elucidated thus far. It is a necessary blueprint for an understanding of the development of H. vitripennis and for designing efficient biorational control strategies including those based on RNA interference. Results Here we elucidate and explore the transcriptome of adult H. vitripennis using high-throughput paired end deep sequencing and de novo assembly. A total of 32,803,656 paired-end reads were obtained with an average transcript length of 624 nucleotides. We assembled 32.9 Mb of the transcriptome of H. vitripennis that spanned across 47,265 loci and 52,708 transcripts. Comparison of our non-redundant database showed that 45% of the deduced proteins of H. vitripennis exhibit identity (e-value ≤1−5) with known proteins. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript isoform. In order to gain insight into the molecular basis of key regulatory genes of H. vitripennis, we characterized predicted proteins involved in the metabolism of juvenile hormone, and biogenesis of small RNAs (Dicer and Piwi sequences) from the transcriptomic sequences. Analysis of transposable element sequences of H. vitripennis indicated that the genome is less expanded in comparison to many other insects with approximately 1% of the transcriptome carrying transposable elements. Conclusions Our data significantly enhance the molecular resources available for future study and control of this economically important hemipteran. This transcriptional information not only provides a more nuanced understanding of the underlying biological and physiological mechanisms that govern H

  15. Bromine isotopic signature facilitates de novo sequencing of peptides in free-radical-initiated peptide sequencing (FRIPS) mass spectrometry.

    PubMed

    Nam, Jungjoo; Kwon, Hyuksu; Jang, Inae; Jeon, Aeran; Moon, Jingyu; Lee, Sun Young; Kang, Dukjin; Han, Sang Yun; Moon, Bongjin; Oh, Han Bin

    2015-02-01

    We recently showed that free-radical-initiated peptide sequencing mass spectrometry (FRIPS MS) assisted by the remarkable thermochemical stability of (2,2,6,6-tetramethyl-piperidin-1-yl)oxyl (TEMPO) is another attractive radical-driven peptide fragmentation MS tool. Facile homolytic cleavage of the bond between the benzylic carbon and the oxygen of the TEMPO moiety in o-TEMPO-Bz-C(O)-peptide and the high reactivity of the benzylic radical species generated in •Bz-C(O)-peptide are key elements leading to extensive radical-driven peptide backbone fragmentation. In the present study, we demonstrate that the incorporation of bromine into the benzene ring, i.e. o-TEMPO-Bz(Br)-C(O)-peptide, allows unambiguous distinction of the N-terminal peptide fragments from the C-terminal fragments through the unique bromine doublet isotopic signature. Furthermore, bromine substitution does not alter the overall radical-driven peptide backbone dissociation pathways of o-TEMPO-Bz-C(O)-peptide. From a practical perspective, the presence of the bromine isotopic signature in the N-terminal peptide fragments in TEMPO-assisted FRIPS MS represents a useful and cost-effective opportunity for de novo peptide sequencing.

  16. De Novo Assembly and Transcriptome Characterization of Canine Retina Using High-Throughput Sequencing

    PubMed Central

    Reddy, Bhaskar; Patel, Amrutlal K.; Singh, Krishna M.; Patil, Deepak B.; Parikh, Pinesh V.; Kelawala, Divyesh N.; Koringa, Prakash G.; Bhatt, Vaibhav D.; Rao, Mandava V.; Joshi, Chaitanya G.

    2015-01-01

    We performed transcriptome sequencing of canine retinal tissue by 454 GS-FLX and Ion Torrent PGM platforms. RNA-Seq analysis by CLC Genomics Workbench mapped expression of 10,360 genes. Gene ontology analysis of retinal transcriptome revealed abundance of transcripts known to be involved in vision associated processes. The de novo assembly of the sequences using CAP3 generated 29,683 contigs with mean length of 560.9 and N50 of 619 bases. Further analysis of contigs predicted 3,827 full-length cDNAs and 29,481 (99%) open reading frames (ORFs). In addition, 3,782 contigs were assigned to 316 KEGG pathways which included melanogenesis, phototransduction, and retinol metabolism with 33, 15, and 11 contigs, respectively. Among the identified microsatellites, dinucleotide repeats were 68.84%, followed by trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides in proportions of 25.76, 9.40, 2.52, and 0.96%, respectively. This study will serve as a valuable resource for understanding the biology and function of canine retina. PMID:26788372

  17. De novo sequencing and characterization of Picrorhiza kurrooa transcriptome at two temperatures showed major transcriptome adjustments

    PubMed Central

    2012-01-01

    Background Picrorhiza kurrooa Royle ex Benth. is an endangered plant species of medicinal importance. The medicinal property is attributed to monoterpenoids picroside I and II, which are modulated by temperature. The transcriptome information of this species is limited with the availability of few hundreds of expressed sequence tags (ESTs) in the public databases. In order to gain insight into temperature mediated molecular changes, high throughput de novo transcriptome sequencing and analyses were carried out at 15°C and 25°C, the temperatures known to modulate picrosides content. Results Using paired-end (PE) Illumina sequencing technology, a total of 20,593,412 and 44,229,272 PE reads were obtained after quality filtering for 15°C and 25°C, respectively. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 74,336 assembled transcript sequences were obtained, with an average coverage of 76.6 and average length of 439.5. Guanine-cytosine (GC) content was observed to be 44.6%, while the transcriptome exhibited abundance of trinucleotide simple sequence repeat (SSR; 45.63%) markers. Large scale expression profiling through "read per exon kilobase per million (RPKM)", showed changes in several biological processes and metabolic pathways including cytochrome P450s (CYPs), UDP-glycosyltransferases (UGTs) and those associated with picrosides biosynthesis. RPKM data were validated by reverse transcriptase-polymerase chain reaction using a set of 19 genes, wherein 11 genes behaved in accordance with the two expression methods. Conclusions Study generated transcriptome of P. kurrooa at two different temperatures. Large scale expression profiling through RPKM showed major transcriptome changes in response to temperature reflecting alterations in major biological processes and metabolic pathways, and provided insight of GC content and SSR markers. Analysis also identified

  18. Transcriptome Sequencing and De Novo Assembly of Golden Cuttlefish Sepia esculenta Hoyle

    PubMed Central

    Liu, Changlin; Zhao, Fazhen; Yan, Jingping; Liu, Chunsheng; Liu, Siwei; Chen, Siqing

    2016-01-01

    Golden cuttlefish Sepia esculenta Hoyle is an economically important cephalopod species. However, artificial hatching is currently challenged by low survival rate of larvae due to abnormal embryonic development. Dissecting the genetic foundation and regulatory mechanisms in embryonic development requires genomic background knowledge. Therefore, we carried out a transcriptome sequencing on Sepia embryos and larvae via mRNA-Seq. 32,597,241 raw reads were filtered and assembled into 98,615 unigenes (N50 length at 911 bp) which were annotated in NR database, GO and KEGG databases respectively. Digital gene expression analysis was carried out on cleavage stage embryos, healthy larvae and malformed larvae. Unigenes functioning in cell proliferation exhibited higher transcriptional levels at cleavage stage while those related to animal disease and organ development showed increased transcription in malformed larvae. Homologs of key genes in regulatory pathways related to early development of animals were identified in Sepia. Most of them exhibit higher transcriptional levels in cleavage stage than larvae, suggesting their potential roles in embryonic development of Sepia. The de novo assembly of Sepia transcriptome is fundamental genetic background for further exploration in Sepia research. Our demonstration on the transcriptional variations of genes in three developmental stages will provide new perspectives in understanding the molecular mechanisms in early embryonic development of cuttlefish. PMID:27782082

  19. A highly conserved sequence is a novel gene involved in de novo vitamin B6 biosynthesis

    PubMed Central

    Ehrenshaft, Marilyn; Bilski, Piotr; Li, Ming Y.; Chignell, Colin F.; Daub, Margaret E.

    1999-01-01

    The Cercospora nicotianae SOR1 (singlet oxygen resistance) gene was identified previously as a gene involved in resistance of this fungus to singlet-oxygen-generating phototoxins. Although homologues to SOR1 occur in organisms in four kingdoms and encode one of the most highly conserved proteins yet identified, the precise function of this protein has, until now, remained unknown. We show that SOR1 is essential in pyridoxine (vitamin B6) synthesis in C. nicotianae and Aspergillus flavus, although it shows no homology to previously identified pyridoxine synthesis genes identified in Escherichia coli. Sequence database analysis demonstrated that organisms encode either SOR1 or E. coli pyridoxine biosynthesis genes, but not both, suggesting that there are two divergent pathways for de novo pyridoxine biosynthesis in nature. Pathway divergence appears to have occurred during the evolution of the eubacteria. We also present data showing that pyridoxine quenches singlet oxygen at a rate comparable to that of vitamins C and E, two of the most highly efficient biological antioxidants, suggesting a previously unknown role for pyridoxine in active oxygen resistance. PMID:10430950

  20. De novo transcriptome sequencing and discovery of genes related to copper tolerance in Paeonia ostii.

    PubMed

    Wang, Yanjie; Dong, Chunlan; Xue, Zeyun; Jin, Qijiang; Xu, Yingchun

    2016-01-15

    Paeonia ostii, an important ornamental and medicinal plant, grows normally on copper (Cu) mines with widespread Cu contamination of soils, and it has the ability to lower Cu contents in the Cu-contaminated soils. However, very little molecular information concerned with Cu resistance of P. ostii is available. In this study, high-throughput de novo transcriptome sequencing was carried out for P. ostii with and without Cu treatment using Illumina HiSeq 2000 platform. A total of 77,704 All-unigenes were obtained with a mean length of 710 bp. Of these unigenes, 47,461 were annotated with public databases based on sequence similarities. Comparative transcript profiling allowed the discovery of 4324 differentially expressed genes (DEGs), with 2207 up-regulated and 2117 down-regulated unigenes in Cu-treated library as compared to the control counterpart. Based on these DEGs, Gene Ontology (GO) enrichment analysis indicated Cu stress-relevant terms, such as 'membrane' and 'antioxidant activity'. Meanwhile, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis uncovered some important pathways, including 'biosynthesis of secondary metabolites' and 'metabolic pathways'. In addition, expression patterns of 12 selected DEGs derived from quantitative real-time polymerase chain reaction (qRT-PCR) were consistent with their transcript abundance changes obtained by transcriptomic analyses, suggesting that all the 12 genes were authentically involved in Cu tolerance in P. ostii. This is the first report to identify genes related to Cu stress responses in P. ostii, which could offer valuable information on the molecular mechanisms of Cu resistance, and provide a basis for further genomics research on this and related ornamental species for phytoremediation.

  1. Stable isotope N-phosphorylation labeling for Peptide de novo sequencing and protein quantification based on organic phosphorus chemistry.

    PubMed

    Gao, Xiang; Wu, Hanzhi; Lee, Kim-Chung; Liu, Hongxia; Zhao, Yufen; Cai, Zongwei; Jiang, Yuyang

    2012-12-01

    In this paper, we describe the development of a novel stable isotope N-phosphorylation labeling (SIPL) strategy for peptide de novo sequencing and protein quantification based on organic phosphorus chemistry. The labeling reaction could be performed easily and completed within 40 min in a one-pot reaction without additional cleanup procedures. It was found that N-phosphorylation labeling reagents were activated in situ to form labeling intermediates with high reactivity targeting on N-terminus and ε-amino groups of lysine under mild reaction conditions. The introduction of N-terminal-labeled phosphoryl group not only improved the ionization efficiency of peptides and increased the protein sequence coverage for peptide mass fingerprints but also greatly enhanced the intensities of b ions, suppressed the internal fragments, and reduced the complexity of the tandem mass spectrometry (MS/MS) fragmentation patterns of peptides. By using nano liquid chromatography chip/time-of-flight mass spectrometry (nano LC-chip/TOF MS) for the protein quantification, the obtained results showed excellent correlation of the measured ratios to theoretical ratios with relative errors ranging from 0.5% to 6.7% and relative standard deviation of less than 10.6%, indicating that the developed method was reproducible and precise. The isotope effect was negligible because of the deuterium atoms were placed adjacent to the neutral phosphoryl group with high electrophilicity and moderately small size. Moreover, the SIPL approach used inexpensive reagents and was amenable to samples from various sources, including cell culture, biological fluids, and tissues. The method development based on organic phosphorus chemistry offered a new approach for quantitative proteomics by using novel stable isotope labeling reagents.

  2. De Novo Transcriptome Assembly of the Chinese Swamp Buffalo by RNA Sequencing and SSR Marker Discovery

    PubMed Central

    Lu, Xingrong; Zhu, Peng; Duan, Anqin; Tan, Zhengzhun; Huang, Jian; Li, Hui; Chen, Mingtan; Liang, Xianwei

    2016-01-01

    The Chinese swamp buffalo (Bubalis bubalis) is vital to the lives of small farmers and has tremendous economic importance. However, a lack of genomic information has hampered research on augmenting marker assisted breeding programs in this species. Thus, a high-throughput transcriptomic sequencing of B. bubalis was conducted to generate transcriptomic sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing generated a total of 54,109,173 raw reads. After trimming, de novo assembly was performed, which yielded 86,017 unigenes, with an average length of 972.41 bp, an N50 of 1,505 bp, and an average GC content of 49.92%. A total of 62,337 unigenes were successfully annotated. Among the annotated unigenes, 27,025 (43.35%) and 23,232 (37.27%) unigenes showed significant similarity to known proteins in NCBI non-redundant protein and Swiss-Prot databases (E-value < 1.0E-5), respectively. Of these annotated unigenes, 14,439 and 15,813 unigenes were assigned to the Gene Ontology (GO) categories and EuKaryotic Ortholog Group (KOG) cluster, respectively. In addition, a total of 14,167 unigenes were assigned to 331 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Furthermore, 17,401 simple sequence repeats (SSRs) were identified as potential molecular markers. One hundred and fifteen primer pairs were randomly selected for amplification to detect polymorphisms. The results revealed that 110 primer pairs (95.65%) yielded PCR amplicons and 69 primer pairs (60.00%) presented polymorphisms in 35 individual buffaloes. A phylogenetic analysis showed that the five swamp buffalo populations were clustered together, whereas two river buffalo breeds clustered separately. In the present study, the Illumina RNA-seq technology was utilized to perform transcriptome analysis and SSR marker discovery in the swamp buffalo without using a reference genome. Our findings will enrich the current SSR markers resources and help spearhead molecular

  3. An integer programming approach to DNA sequence assembly.

    PubMed

    Chang, Youngjung; Sahinidis, Nikolaos V

    2011-08-10

    De novo sequence assembly is a ubiquitous combinatorial problem in all DNA sequencing technologies. In the presence of errors in the experimental data, the assembly problem is computationally challenging, and its solution may not lead to a unique reconstruct. The enumeration of all alternative solutions is important in drawing a reliable conclusion on the target sequence, and is often overlooked in the heuristic approaches that are currently available. In this paper, we develop an integer programming formulation and global optimization solution strategy to solve the sequence assembly problem with errors in the data. We also propose an efficient technique to identify all alternative reconstructs. When applied to examples of sequencing-by-hybridization, our approach dramatically increases the length of DNA sequences that can be handled with global optimality certificate to over 10,000, which is more than 10 times longer than previously reported. For some problem instances, alternative solutions exhibited a wide range of different ability in reproducing the target DNA sequence. Therefore, it is important to utilize the methodology proposed in this paper in order to obtain all alternative solutions to reliably infer the true reconstruct. These alternative solutions can be used to refine the obtained results and guide the design of further experiments to correctly reconstruct the target DNA sequence. PMID:21864794

  4. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sakakibara, Yasumbumi [Keio University

    2016-07-12

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. De novo sequencing and transcriptome analysis of Ustilaginoidea virens by using Illumina paired-end sequencing and development of simple sequence repeat markers.

    PubMed

    Yu, Mina; Yu, Junjie; Gu, Chenhao; Nie, Yafeng; Chen, Zhiyi; Yin, Xiaole; Liu, Yongfeng

    2014-09-01

    Ustilaginoidea virens is the causal agent of rice false smut, which is a rice disease of increasing importance worldwide that has caused with the quantitative and qualitative rice losses. However, research on the pathogenic mechanism of U. virens is limited. In this study, we reported a de novo assembling, annotation, and characterization of the transcriptome and developed simple sequence repeat (SSR) markers of U. virens. U. virens transcripts of the mycelia and conidia mixture were sequenced using Illumina RNA-seq technology. A total of 52,554,142 clean reads were assembled into 36,496 transcripts representing 18,534 unigenes. Assembled unigenes were annotated through sequence comparison with known protein databases, and 48.48% of the unigenes were without hits in any of these databases. Clusters of orthologous groups for eukaryotic complete genome analysis identified the largest set of genes associated with posttranslational modification, protein turnover and chaperones. Kyoto Encyclopedia of Genes and Genomes pathway analyses identified the number of genes associated with mitogen-activated protein kinase and calcium-calcineurin pathways. The study also identified several putative pathogenicity determinants and candidate effectors in U. virens by using the pathogen-host interaction database. In addition, bioinformatics analysis revealed the presence of 12,298 SSR markers. This study provides a better understanding of the biology of U. virens and is an excellent resource for candidate genes required for pathogenesis discovery.

  6. Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

    PubMed Central

    Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

    2010-01-01

    Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877

  7. De novo sequencing and analysis of root transcriptome using 454 pyrosequencing to discover putative genes associated with drought tolerance in Ammopiptanthus mongolicus

    PubMed Central

    2012-01-01

    Background De novo assembly of transcript sequences produced by next-generation sequencing technologies offers a rapid approach to obtain expressed gene sequences for non-model organisms. Ammopiptanthus mongolicus, a super-xerophytic broadleaf evergreen wood, is an ecologically important foundation species in desert ecosystems and exhibits substantial drought tolerance in Mid-Asia desert. Root plays an important role in water absorption of plant. There are insufficient transcriptomic and genomic data in public databases for understanding of the molecular mechanism underlying the drought tolerance of A. mongolicus. Thus, high throughput transcriptome sequencing from A. mongolicus root is helpful to generate a large amount of transcript sequences for gene discovery and molecular marker development. Results A total of 672,002 sequencing reads were obtained from a 454 GS XLR70 Titanium pyrosequencer with a mean length of 279 bp. These reads were assembled into 29,056 unique sequences including 15,173 contigs and 13,883 singlets. In our assembled sequences, 1,827 potential simple sequence repeats (SSR) molecular markers were discovered. Based on sequence similarity with known plant proteins, the assembled sequences represent approximately 9,771 proteins in PlantGDB. Based on the Gene ontology (GO) analysis, hundreds of drought stress-related genes were found. We further analyzed the gene expression profiles of 27 putative genes involved in drought tolerance using quantitative real-time PCR (qRT-PCR) assay. Conclusions Our sequence collection represents a major transcriptomic resource for A. mongolicus, and the large number of genetic markers predicted should contribute to future research in Ammopiptanthus genus. The potential drought stress related transcripts identified in this study provide a good start for further investigation into the drought adaptation in Ammopiptanthus. PMID:22721448

  8. Hybrid error correction and de novo assembly of single-molecule sequencing reads.

    PubMed

    Koren, Sergey; Schatz, Michael C; Walenz, Brian P; Martin, Jeffrey; Howard, Jason T; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A; McCombie, W Richard; Jarvis, Erich D; Adam M Phillippy

    2012-07-01

    Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884

  9. Identification of a De Novo Heterozygous Missense FLNB Mutation in Lethal Atelosteogenesis Type I by Exome Sequencing

    PubMed Central

    Jeon, Ga Won; Lee, Mi-Na; Jung, Ji Mi; Hong, Seong Yeon; Kim, Young Nam; Sin, Jong Beom

    2014-01-01

    Background Atelosteogenesis type I (AO-I) is a rare lethal skeletal dysplastic disorder characterized by severe short-limbed dwarfism and dislocated hips, knees, and elbows. AO-I is caused by mutations in the filamin B (FLNB) gene; however, several other genes can cause AO-like lethal skeletal dysplasias. Methods In order to screen all possible genes associated with AO-like lethal skeletal dysplasias simultaneously, we performed whole-exome sequencing in a female newborn having clinical features of AO-I. Results Exome sequencing identified a novel missense variant (c.517G>A; p.Ala173Thr) in exon 2 of the FLNB gene in the patient. Sanger sequencing validated this variant, and genetic analysis of the patient's parents suggested a de novo occurrence of the variant. Conclusions This study shows that exome sequencing can be a useful tool for the identification of causative mutations in lethal skeletal dysplasia patients. PMID:24624349

  10. Rapid genome mapping in nano channel array for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences...

  11. Sequencing crop genomes: approaches and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  12. Somatic mutations and germline sequence variants in the expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia

    PubMed Central

    Xiang, Zhifu; Walgren, Richard; Zhao, Yu; Kasai, Yumi; Miner, Tracie; Ries, Rhonda E.; Lubman, Olga; Fremont, Daved H.; McLellan, Michael D.; Payton, Jacqueline E.; Westervelt, Peter; DiPersio, John F.; Link, Daniel C.; Walter, Matthew J.; Graubert, Timothy A.; Watson, Mark; Baty, Jack; Heath, Sharon; Shannon, William D.; Nagarajan, Rakesh; Bloomfield, Clara D.; Mardis, Elaine R.; Wilson, Richard K.; Ley, Timothy J.

    2008-01-01

    Activating mutations in tyrosine kinase (TK) genes (eg, FLT3 and KIT) are found in more than 30% of patients with de novo acute myeloid leukemia (AML); many groups have speculated that mutations in other TK genes may be present in the remaining 70%. We performed high-throughput resequencing of the kinase domains of 26 TK genes (11 receptor TK; 15 cytoplasmic TK) expressed in most AML patients using genomic DNA from the bone marrow (tumor) and matched skin biopsy samples (“germline”) from 94 patients with de novo AML; sequence variants were validated in an additional 94 AML tumor samples (14.3 million base pairs of sequence were obtained and analyzed). We identified known somatic mutations in FLT3, KIT, and JAK2 TK genes at the expected frequencies and found 4 novel somatic mutations, JAK1V623A, JAK1T478S, DDR1A803V, and NTRK1S677N, once each in 4 respective patients of 188 tested. We also identified novel germline sequence changes encoding amino acid substitutions (ie, nonsynonymous changes) in 14 TK genes, including TYK2, which had the largest number of nonsynonymous sequence variants (11 total detected). Additional studies will be required to define the roles that these somatic and germline TK gene variants play in AML pathogenesis. PMID:18270328

  13. De Novo Designed Proteins from a Library of Artificial Sequences Function in Escherichia Coli and Enable Cell Growth

    PubMed Central

    Fisher, Michael A.; McKinley, Kara L.; Bradley, Luke H.; Viola, Sara R.; Hecht, Michael H.

    2011-01-01

    A central challenge of synthetic biology is to enable the growth of living systems using parts that are not derived from nature, but designed and synthesized in the laboratory. As an initial step toward achieving this goal, we probed the ability of a collection of >106 de novo designed proteins to provide biological functions necessary to sustain cell growth. Our collection of proteins was drawn from a combinatorial library of 102-residue sequences, designed by binary patterning of polar and nonpolar residues to fold into stable 4-helix bundles. We probed the capacity of proteins from this library to function in vivo by testing their abilities to rescue 27 different knockout strains of Escherichia coli, each deleted for a conditionally essential gene. Four different strains – ΔserB, ΔgltA, ΔilvA, and Δfes – were rescued by specific sequences from our library. Further experiments demonstrated that a strain simultaneously deleted for all four genes was rescued by co-expression of four novel sequences. Thus, cells deleted for ∼0.1% of the E. coli genome (and ∼1% of the genes required for growth under nutrient-poor conditions) can be sustained by sequences designed de novo. PMID:21245923

  14. Sequencing and de novo Analysis of Crassostrea angulata (Fujian Oyster) from 8 Different Developing Phases Using 454 GSFlx

    PubMed Central

    Chen, Jun; Zou, Quan; You, Weiwei; Ke, Caihuan

    2012-01-01

    Research on the mechanism for early development of shellfish, such as body plan, shell formation, settlement and metamorphosis is currently an active research field. However, studies were still limited and not deep enough because of the lack of genomic resources such as genome or transcriptome sequences. In the present research, de novo transcriptome sequencing was performed for Crassostrea angulata, the most economically important cultured oyster species in China, at eight early developmental stages using the 454 sequencing technology. A total of 555,215 reads were produced with an average length of 309 nucleotides that were then assembled into 10,462 contigs. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Six unique sequences related to settlement, metamorphosis and growth were subsequently analyzed by real-time PCR. Given the lack of whole genome information for oysters, transcriptome and de novo analysis of C. angulata from the eight different developing phases will provide important and useful information on early development mechanism and help genetic breeding of shellfish. PMID:22952730

  15. Comparison of Illumina de novo assembled and Sanger sequenced viral genomes: A case study for RNA viruses recovered from the plant pathogenic fungus Sclerotinia sclerotiorum.

    PubMed

    Khalifa, Mahmoud E; Varsani, Arvind; Ganley, Austen R D; Pearson, Michael N

    2016-07-01

    The advent of 'next generation sequencing' (NGS) technologies has led to the discovery of many novel mycoviruses, the majority of which are sufficiently different from previously sequenced viruses that there is no appropriate reference sequence on which to base the sequence assembly. Although many new genome sequences are generated by NGS, confirmation of the sequence by Sanger sequencing is still essential for formal classification by the International Committee for the Taxonomy of Viruses (ICTV), although this is currently under review. To empirically test the validity of de novo assembled mycovirus genomes from dsRNA extracts, we compared the results from Illumina sequencing with those from random cloning plus targeted PCR coupled with Sanger sequencing for viruses from five Sclerotinia sclerotiorum isolates. Through Sanger sequencing we detected nine viral genomes while through Illumina sequencing we detected the same nine viruses plus one additional virus from the same samples. Critically, the Illumina derived sequences share >99.3 % identity to those obtained by cloning and Sanger sequencing. Although, there is scope for errors in de novo assembled viral genomes, our results demonstrate that by maximising the proportion of viral sequence in the data and using sufficiently rigorous quality controls, it is possible to generate de novo genome sequences of comparable accuracy from Illumina sequencing to those obtained by Sanger sequencing. PMID:26581665

  16. PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM and Look Ahead Approach

    PubMed Central

    Chin, Francis Y. L.; Yiu, Siu Ming; Quan, Guangri; Liu, Bo; Wang, Yadong

    2014-01-01

    Since the read lengths of high throughput sequencing (HTS) technologies are short, de novo assembly which plays significant roles in many applications remains a great challenge. Most of the state-of-the-art approaches base on de Bruijn graph strategy and overlap-layout strategy. However, these approaches which depend on k-mers or read overlaps do not fully utilize information of paired-end and single-end reads when resolving branches. Since they treat all single-end reads with overlapped length larger than a fix threshold equally, they fail to use the more confident long overlapped reads for assembling and mix up with the relative short overlapped reads. Moreover, these approaches have not been special designed for handling tandem repeats (repeats occur adjacently in the genome) and they usually break down the contigs near the tandem repeats. We present PERGA (Paired-End Reads Guided Assembler), a novel sequence-reads-guided de novo assembly approach, which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds using paired-end reads and different read overlap size ranging from Omax to Omin to resolve the gaps and branches. By constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. When the correct extension cannot be determined, PERGA will try to extend the contig by all feasible extensions and determine the correct extension by using look-ahead approach. Many difficult-resolved branches are due to tandem repeats which are close in the genome. PERGA detects such different copies of the repeats to resolve the branches to make the extension much longer and more accurate. We evaluated PERGA on both Illumina real and simulated datasets ranging from small bacterial genomes to large human chromosome, and it constructed longer and more accurate contigs and scaffolds than other state-of-the-art assemblers. PERGA can be freely downloaded at https

  17. De novo Sequencing of the Leaf Transcriptome Reveals Complex Light-Responsive Regulatory Networks in Camellia sinensis cv. Baijiguan.

    PubMed

    Wu, Quanjin; Chen, Zhidan; Sun, Weijiang; Deng, Tingting; Chen, Mingjie

    2016-01-01

    Tea plants (Camellia sinensis L.) possess high genetic diversity that is important for breeding. One cultivar, Baijiguan, exhibits a yellow leaf phenotype, reduced chlorophyll (Chl) content, and aberrant chloroplast structures under high light intensity. In contrast, under low light intensity, the flush shoot from Baijiguan becomes green, the Chl content increases significantly, and the chloroplasts exhibit normal structures. To understand the underlying molecular mechanisms for these observations, we performed de novo transcriptome sequencing and digital gene expression (DGE) profiling using Illumina sequencing technology. De novo transcriptome assembly identified 88,788 unigenes, including 1652 transcription factors from 25 families. In total, 1993 and 2576 differentially expressed genes (DEGs) were identified in Baijiguan plants treated with 3 and 6 days of shade, respectively. Gene Ontology (GO) and pathway enrichment analyses indicated that the DEGs are predominantly involved in the ROS scavenging system, chloroplast development, photosynthetic pigment synthesis, secondary metabolism, and circadian systems. The light-responsive gene POR (protochlorophyllide oxidoreductase) and transcription factor HY5 were identified. Quantitative real-time PCR (qRT-PCR) analysis of 20 selected DEGs confirmed the RNA-sequencing (RNA-Seq) results. Overall, these findings suggest that high light intensity inhibits the expression of photosystem II 10-kDa protein (PsbR) in Baijiguan, thus affecting PSII stability, chloroplast development and chlorophyll biosynthesis.

  18. Exome sequencing identified a novel de novo OPA1 mutation in a consanguineous family presenting with optic atrophy.

    PubMed

    Cohen, Lior; Tzur, Shay; Goldenberg-Cohen, Nitza; Bormans, Concetta; Behar, Doron M; Reinstein, Eyal

    2016-01-01

    Inherited optic neuropathies are a heterogeneous group of disorders characterized by mild to severe visual loss, colour vision deficit, central or paracentral visual field defects and optic disc pallor. Optic atrophies can be classified into isolated or non-syndromic and syndromic forms. While multiple modes of inheritance have been reported, autosomal dominant optic atrophy and mitochondrial inherited Leber's hereditary optic neuropathy are the most common forms. Optic atrophy type 1, caused by mutations in the OPA1 gene is believed to be the most common hereditary optic neuropathy, and most patients inherit a mutation from an affected parent. In this study we used whole-exome sequencing to investigate the genetic aetiology in a patient affected with isolated optic atrophy. Since the proband was the only affected individual in his extended family, and was a product of consanguineous marriage, homozygosity mapping followed by whole-exome sequencing were pursued. Exome results identified a novel de novo OPA1 mutation in the proband. We conclude, that though de novo OPA1 mutations are uncommon, testing of common optic atrophy-associated genes such as mitochondrial mutations and OPA1 gene sequencing should be performed first in single individuals presenting with optic neuropathy, even when dominant inheritance is not apparent.

  19. De novo Sequencing of the Leaf Transcriptome Reveals Complex Light-Responsive Regulatory Networks in Camellia sinensis cv. Baijiguan

    PubMed Central

    Wu, Quanjin; Chen, Zhidan; Sun, Weijiang; Deng, Tingting; Chen, Mingjie

    2016-01-01

    Tea plants (Camellia sinensis L.) possess high genetic diversity that is important for breeding. One cultivar, Baijiguan, exhibits a yellow leaf phenotype, reduced chlorophyll (Chl) content, and aberrant chloroplast structures under high light intensity. In contrast, under low light intensity, the flush shoot from Baijiguan becomes green, the Chl content increases significantly, and the chloroplasts exhibit normal structures. To understand the underlying molecular mechanisms for these observations, we performed de novo transcriptome sequencing and digital gene expression (DGE) profiling using Illumina sequencing technology. De novo transcriptome assembly identified 88,788 unigenes, including 1652 transcription factors from 25 families. In total, 1993 and 2576 differentially expressed genes (DEGs) were identified in Baijiguan plants treated with 3 and 6 days of shade, respectively. Gene Ontology (GO) and pathway enrichment analyses indicated that the DEGs are predominantly involved in the ROS scavenging system, chloroplast development, photosynthetic pigment synthesis, secondary metabolism, and circadian systems. The light-responsive gene POR (protochlorophyllide oxidoreductase) and transcription factor HY5 were identified. Quantitative real-time PCR (qRT-PCR) analysis of 20 selected DEGs confirmed the RNA-sequencing (RNA-Seq) results. Overall, these findings suggest that high light intensity inhibits the expression of photosystem II 10-kDa protein (PsbR) in Baijiguan, thus affecting PSII stability, chloroplast development and chlorophyll biosynthesis. PMID:27047513

  20. A Quantitative Tool to Distinguish Isobaric Leucine and Isoleucine Residues for Mass Spectrometry-Based De Novo Monoclonal Antibody Sequencing

    NASA Astrophysics Data System (ADS)

    Poston, Chloe N.; Higgs, Richard E.; You, Jinsam; Gelfanova, Valentina; Hale, John E.; Knierman, Michael D.; Siegel, Robert; Gutierrez, Jesus A.

    2014-07-01

    De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.

  1. Functional categorization of unique expressed sequence tags obtained from the yeast-like growth phase of the elm pathogen Ophiostoma novo-ulmi

    PubMed Central

    2011-01-01

    Background The highly aggressive pathogenic fungus Ophiostoma novo-ulmi continues to be a serious threat to the American elm (Ulmus americana) in North America. Extensive studies have been conducted in North America to understand the mechanisms of virulence of this introduced pathogen and its evolving population structure, with a view to identifying potential strategies for the control of Dutch elm disease. As part of a larger study to examine the genomes of economically important Ophiostoma spp. and the genetic basis of virulence, we have constructed an expressed sequence tag (EST) library using total RNA extracted from the yeast-like growth phase of O. novo-ulmi (isolate H327). Results A total of 4,386 readable EST sequences were annotated by determining their closest matches to known or theoretical sequences in public databases by BLASTX analysis. Searches matched 2,093 sequences to entries found in Genbank, including 1,761 matches with known proteins and 332 matches with unknown (hypothetical/predicted) proteins. Known proteins included a collection of 880 unique transcripts which were categorized to obtain a functional profile of the transcriptome and to evaluate physiological function. These assignments yielded 20 primary functional categories (FunCat), the largest including Metabolism (FunCat 01, 20.28% of total), Sub-cellular localization (70, 10.23%), Protein synthesis (12, 10.14%), Transcription (11, 8.27%), Biogenesis of cellular components (42, 8.15%), Cellular transport, facilitation and routes (20, 6.08%), Classification unresolved (98, 5.80%), Cell rescue, defence and virulence (32, 5.31%) and the unclassified category, or known sequences of unknown metabolic function (99, 7.5%). A list of specific transcripts of interest was compiled to initiate an evaluation of their impact upon strain virulence in subsequent studies. Conclusions This is the first large-scale study of the O. novo-ulmi transcriptome. The expression profile obtained from the yeast

  2. Whole Genome Sequencing Reveals a De Novo SHANK3 Mutation in Familial Autism Spectrum Disorder

    PubMed Central

    Nemirovsky, Sergio I.; Córdoba, Marta; Zaiat, Jonathan J.; Completa, Sabrina P.; Vega, Patricia A.; González-Morón, Dolores; Medina, Nancy M.; Fabbro, Mónica; Romero, Soledad; Brun, Bianca; Revale, Santiago; Ogara, María Florencia; Pecci, Adali; Marti, Marcelo; Vazquez, Martin; Turjanski, Adrián; Kauffman, Marcelo A.

    2015-01-01

    Introduction Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD). Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS) for the diagnostic approach to ASD. Methods We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents. Results Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6). Conclusions We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder. PMID:25646853

  3. Genome Walking by Next Generation Sequencing Approaches

    PubMed Central

    Volpicella, Mariateresa; Leoni, Claudia; Costanza, Alessandra; Fanizza, Immacolata; Placido, Antonio; Ceci, Luigi R.

    2012-01-01

    Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes. PMID:24832505

  4. Fast, cheap and out of control--Insights into thermodynamic and informatic constraints on natural protein sequences from de novo protein design.

    PubMed

    Brisendine, Joseph M; Koder, Ronald L

    2016-05-01

    The accumulated results of thirty years of rational and computational de novo protein design have taught us important lessons about the stability, information content, and evolution of natural proteins. First, de novo protein design has complicated the assertion that biological function is equivalent to biological structure - demonstrating the capacity to abstract active sites from natural contexts and paste them into non-native topologies without loss of function. The structure-function relationship has thus been revealed to be either a generality or strictly true only in a local sense. Second, the simplification to "maquette" topologies carried out by rational protein design also has demonstrated that even sophisticated functions such as conformational switching, cooperative ligand binding, and light-activated electron transfer can be achieved with low-information design approaches. This is because for simple topologies the functional footprint in sequence space is enormous and easily exceeds the number of structures which could have possibly existed in the history of life on Earth. Finally, the pervasiveness of extraordinary stability in designed proteins challenges accepted models for the "marginal stability" of natural proteins, suggesting that there must be a selection pressure against highly stable proteins. This can be explained using recent theories which relate non-equilibrium thermodynamics and self-replication. This article is part of a Special Issue entitled Biodesign for Bioenergetics--The design and engineering of electronc transfer cofactors, proteins and protein networks, edited by Ronald L. Koder and J.L. Ross Anderson.

  5. Fast, cheap and out of control--Insights into thermodynamic and informatic constraints on natural protein sequences from de novo protein design.

    PubMed

    Brisendine, Joseph M; Koder, Ronald L

    2016-05-01

    The accumulated results of thirty years of rational and computational de novo protein design have taught us important lessons about the stability, information content, and evolution of natural proteins. First, de novo protein design has complicated the assertion that biological function is equivalent to biological structure - demonstrating the capacity to abstract active sites from natural contexts and paste them into non-native topologies without loss of function. The structure-function relationship has thus been revealed to be either a generality or strictly true only in a local sense. Second, the simplification to "maquette" topologies carried out by rational protein design also has demonstrated that even sophisticated functions such as conformational switching, cooperative ligand binding, and light-activated electron transfer can be achieved with low-information design approaches. This is because for simple topologies the functional footprint in sequence space is enormous and easily exceeds the number of structures which could have possibly existed in the history of life on Earth. Finally, the pervasiveness of extraordinary stability in designed proteins challenges accepted models for the "marginal stability" of natural proteins, suggesting that there must be a selection pressure against highly stable proteins. This can be explained using recent theories which relate non-equilibrium thermodynamics and self-replication. This article is part of a Special Issue entitled Biodesign for Bioenergetics--The design and engineering of electronc transfer cofactors, proteins and protein networks, edited by Ronald L. Koder and J.L. Ross Anderson. PMID:26498191

  6. A multi-method approach toward de novo glycan characterization: a Man-5 case study.

    PubMed

    Prien, Justin M; Prater, Bradley D; Cockrill, Steven L

    2010-05-01

    Regulatory agencies' expectations for biotherapeutic approval are becoming more stringent with regard to product characterization, where minor species as low as 0.1% of a given profile are typically identified. The mission of this manuscript is to demonstrate a multi-method approach toward de novo glycan characterization and quantitation, including minor species at or approaching the 0.1% benchmark. Recently, unexpected isomers of the Man(5)GlcNAc(2) (M(5)) were reported (Prien JM, Ashline DJ, Lapadula AJ, Zhang H, Reinhold VN. 2009. The high mannose glycans from bovine ribonuclease B isomer characterization by ion trap mass spectrometry (MS). J Am Soc Mass Spectrom. 20:539-556). In the current study, quantitative analysis of these isomers found in commercial M(5) standard demonstrated that they are in low abundance (<1% of the total) and therefore an exemplary "litmus test" for minor species characterization. A simple workflow devised around three core well-established analytical procedures: (1) fluorescence derivatization; (2) online rapid resolution reversed-phase separation coupled with negative-mode sequential mass spectrometry (RRRP-(-)-MS(n)); and (3) permethylation derivatization with nanospray sequential mass spectrometry (NSI-MS(n)) provides comprehensive glycan structural determination. All methods have limitations; however, a multi-method workflow is an at-line stopgap/solution which mitigates each method's individual shortcoming(s) providing greater opportunity for more comprehensive characterization. This manuscript is the first to demonstrate quantitative chromatographic separation of the M(5) isomers and the use of a commercially available stable isotope variant of 2-aminobenzoic acid to detect and chromatographically resolve multiple M(5) isomers in bovine ribonuclease B. With this multi-method approach, we have the capabilities to comprehensively characterize a biotherapeutic's glycan array in a de novo manner, including structural isomers at >/=0

  7. De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum) Seedlings and Identification of Salt Tolerance Genes.

    PubMed

    Liu, Jiangtao; Zhou, Yuelong; Luo, Changxin; Xiang, Yun; An, Lizhe

    2016-01-01

    Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS) technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq) reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS). We further identified 1663 differentially-expressed genes (DEGs) between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO), using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including "oxidation reduction", "transcription factor activity", and "ion channel transporter". Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress. PMID:27023614

  8. De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum) Seedlings and Identification of Salt Tolerance Genes.

    PubMed

    Liu, Jiangtao; Zhou, Yuelong; Luo, Changxin; Xiang, Yun; An, Lizhe

    2016-01-01

    Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS) technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq) reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS). We further identified 1663 differentially-expressed genes (DEGs) between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO), using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including "oxidation reduction", "transcription factor activity", and "ion channel transporter". Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress.

  9. De Novo Transcriptome Sequencing of Desert Herbaceous Achnatherum splendens (Achnatherum) Seedlings and Identification of Salt Tolerance Genes

    PubMed Central

    Liu, Jiangtao; Zhou, Yuelong; Luo, Changxin; Xiang, Yun; An, Lizhe

    2016-01-01

    Achnatherum splendens is an important forage herb in Northwestern China. It has a high tolerance to salinity and is, thus, considered one of the most important constructive plants in saline and alkaline areas of land in Northwest China. However, the mechanisms of salt stress tolerance in A. splendens remain unknown. Next-generation sequencing (NGS) technologies can be used for global gene expression profiling. In this study, we examined sequence and transcript abundance data for the root/leaf transcriptome of A. splendens obtained using an Illumina HiSeq 2500. Over 35 million clean reads were obtained from the leaf and root libraries. All of the RNA sequencing (RNA-seq) reads were assembled de novo into a total of 126,235 unigenes and 36,511 coding DNA sequences (CDS). We further identified 1663 differentially-expressed genes (DEGs) between the salt stress treatment and control. Functional annotation of the DEGs by gene ontology (GO), using Arabidopsis and rice as references, revealed enrichment of salt stress-related GO categories, including “oxidation reduction”, “transcription factor activity”, and “ion channel transporter”. Thus, this global transcriptome analysis of A. splendens has provided an important genetic resource for the study of salt tolerance in this halophyte. The identified sequences and their putative functional data will facilitate future investigations of the tolerance of Achnatherum species to various types of abiotic stress. PMID:27023614

  10. Complete genome sequence of novel carbon monoxide oxidizing bacteria Citrobacter amalonaticus Y19, assembled de novo.

    PubMed

    Ainala, Satish Kumar; Seol, Eunhee; Park, Sunghoon

    2015-10-10

    We report here the complete genome sequence of Citrobacter amalonaticus Y19 isolated from an anaerobic digester. PacBio single-molecule real-time (SMRT) sequencing was employed, resulting in a single scaffold of 5.58Mb. The sequence of a mega plasmid of 291Kb size is also presented.

  11. De novo sequencing of Eucommia ulmoides flower bud transcriptomes for identification of genes related to floral development.

    PubMed

    Liu, Huimin; Fu, JianMin; Du, Hongyan; Hu, Jingjing; Wuyun, Tana

    2016-09-01

    Eucommia ulmoides Oliver is a woody perennial dioecious species native to China and has great economic value. However, little is known about flower bud development in this species. In this study, the transcriptomes of female and male flower buds were sequenced using the Illumina platform, a next-generation sequencing technology that provides cost-effective, highly efficient transcriptome profiling. In total, 11,558,188,080 clean reads were assembled into 75,065 unigenes with an average length of 1011 bp by de novo assembly using Trinity software. Through similarity comparisons with known protein databases, 47,071 unigenes were annotated, 146 of which were putatively related to the floral development of E. ulmoides. Fifteen of the 146 unigenes had significantly different expression levels between the two samples. Additionally, 24,346 simple sequence repeats were identified in 18,565 unigenes with 12,793 sequences suitable for the designed primers. In total, 67,447 and 58,236 single nucleotide polymorphisms were identified in male and female buds, respectively. This study provides a valuable resource for further conservation genetics and functional genomics research on E. ulmoides. PMID:27486566

  12. De novo Sequence Assembly and Characterization of Lycoris aurea Transcriptome Using GS FLX Titanium Platform of 454 Pyrosequencing

    PubMed Central

    Wang, Ren; Xu, Sheng; Jiang, Yumei; Jiang, Jingwei; Li, Xiaodan; Liang, Lijian; He, Jia; Peng, Feng; Xia, Bing

    2013-01-01

    Background Lycoris aurea, also called Golden Magic Lily, is an ornamentally and medicinally important species of the Amaryllidaceae family. To date, the sequencing of its whole genome is unavailable as a non-model organism. Transcriptomic information is also scarce for this species. In this study, we performed de novo transcriptome sequencing to produce the first comprehensive expressed sequence tag (EST) dataset for L. aurea using high-throughput sequencing technology. Methodology and Principal Findings Total RNA was isolated from leaves with sodium nitroprusside (SNP), salicylic acid (SA), or methyl jasmonate (MeJA) treatment, stems, and flowers at the bud, blooming, and wilting stages. Equal quantities of RNA from each tissue and stage were pooled to construct a cDNA library. Using 454 pyrosequencing technology, a total of 937,990 high quality reads (308.63 Mb) with an average read length of 329 bp were generated. Clustering and assembly of these reads produced a non-redundant set of 141,111 unique sequences, comprising 24,604 contigs and 116,507 singletons. All of the unique sequences were involved in the biological process, cellular component and molecular function categories by GO analysis. Potential genes and their functions were predicted by KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literatures, many putative genes involved in Amaryllidaceae alkaloids synthesis, including PAL, TYDC OMT, NMT, P450, and other potentially important candidate genes, were identified for the first time in this Lycoris. Furthermore, 6,386 SSRs and 18,107 high-confidence SNPs were identified in this EST dataset. Conclusions The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in L. aurea. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will provide useful information for functional

  13. De Novo Assembly, Gene Annotation and Marker Development Using Illumina Paired-End Transcriptome Sequences in Celery (Apium graveolens L.)

    PubMed Central

    Fu, Nan; Wang, Qian; Shen, Huo-Lin

    2013-01-01

    Background Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding. Principal Findings Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions. Conclusions This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding. PMID:23469050

  14. A population-based evolutionary search approach to the multiple minima problem in de novo protein structure prediction

    PubMed Central

    2013-01-01

    Background Elucidating the native structure of a protein molecule from its sequence of amino acids, a problem known as de novo structure prediction, is a long standing challenge in computational structural biology. Difficulties in silico arise due to the high dimensionality of the protein conformational space and the ruggedness of the associated energy surface. The issue of multiple minima is a particularly troublesome hallmark of energy surfaces probed with current energy functions. In contrast to the true energy surface, these surfaces are weakly-funneled and rich in comparably deep minima populated by non-native structures. For this reason, many algorithms seek to be inclusive and obtain a broad view of the low-energy regions through an ensemble of low-energy (decoy) conformations. Conformational diversity in this ensemble is key to increasing the likelihood that the native structure has been captured. Methods We propose an evolutionary search approach to address the multiple-minima problem in decoy sampling for de novo structure prediction. Two population-based evolutionary search algorithms are presented that follow the basic approach of treating conformations as individuals in an evolving population. Coarse graining and molecular fragment replacement are used to efficiently obtain protein-like child conformations from parents. Potential energy is used both to bias parent selection and determine which subset of parents and children will be retained in the evolving population. The effect on the decoy ensemble of sampling minima directly is measured by additionally mapping a conformation to its nearest local minimum before considering it for retainment. The resulting memetic algorithm thus evolves not just a population of conformations but a population of local minima. Results and conclusions Results show that both algorithms are effective in terms of sampling conformations in proximity of the known native structure. The additional minimization is shown to be

  15. Highly efficient de novo mutant identification in a sorghum bicolor tilling population using the ComSeq approach

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Screening large populations for carriers of known or de novo rare SNPs is required both in Targeting induced local lesions IN genomes (TILLING) experiments in plants and analogously in screening human populations. We formerly suggested an approach that combines the celebrated mathematical field of c...

  16. Analysis of alterations to the transcriptome of Loquat (Eriobotrya japonica Lindl.) under low temperature stress via de novo sequencing.

    PubMed

    Gong, R G; Lai, J; Yang, W; Liao, M A; Wang, Z H; Liang, G L

    2015-08-14

    Loquat (Eriobotrya japonica Lindl.), which originates from the cooler hill regions of southwestern China, is a typical subtropical evergreen tree. Loquat is one of the most important economic crops in China, but the available genomic information is very limited. Here, we present the first deep transcriptomic analysis of loquat. De novo assembly generated 116,723 contigs and 64,814 unigenes using Illumina sequencing technology. A total of 45,739 unigenes were annotated by Nr, GO, and COG datasets. In addition, we analyzed the gene expression profiles of loquat fruit under low temperature stress and 4017 differential expressed genes were identified. We found that the unigenes involved in the brassinosteroid biosynthesis and phosphatidylinositol signaling systems were upregulated, indicating that they have an important role in the resistance of plants to low temperature. Our results provide an invaluable resource for identification of specific genes and proteins involved in loquat development and response to low temperatures.

  17. De novo sequencing analysis of the Rosa roxburghii fruit transcriptome reveals putative ascorbate biosynthetic genes and EST-SSR markers.

    PubMed

    Yan, Xiuqin; Zhang, Xue; Lu, Min; He, Yong; An, Huaming

    2015-04-25

    Rosa roxburghii Tratt. is a well-known ornamental rose species native to China. In addition, the fruits of this species are valued for their nutritional and medicinal characteristics, especially their high ascorbic acid (AsA) levels. Nevertheless, AsA biosynthesis in R. roxburghii fruit has not been explored in detail because of a lack of genomic resources for this species. High-throughput transcriptomic sequencing generating large volumes of transcript sequence data can aid in gene discovery and molecular marker development. In this study, we generated more than 53 million clean reads using Illumina paired-end sequencing technology. De novo assembly yielded 106,590 unigenes, with an average length of 343 bp. On the basis of sequence similarity to known proteins, 9301 and 2393 unigenes were classified into Gene Ontology and Clusters of Orthologous Group categories, respectively. There were 7480 unigenes assigned to 124 pathways in the Kyoto Encyclopedia of Gene and Genome pathway database. BLASTx searches identified 498 unique putative transcripts encoding various transcription factors, some known to regulate fruit development. qRT-PCR validated the expressions of most of the genes encoding the main enzymes involved in ascorbate biosynthesis. In addition, 9131 potential simple sequence repeat (SSR) loci were identified among the unigenes. One hundred and two primer pairs were synthesized and 71 pairs produced an amplification product during initial screening. Among the amplified products, 30 were polymorphic in the 16 R. roxburghii germplasms tested. Our study was the first to produce a large volume of transcriptome data from R. roxburghii. The resulting sequence collection is a valuable resource for gene discovery and marker-assisted selective breeding in this rose species.

  18. De novo Assembly and Characterization of the Global Transcriptome for Rhyacionia leptotubula Using Illumina Paired-End Sequencing

    PubMed Central

    Zhu, Jia-Ying; Li, Yong-He; Yang, Song; Li, Qin-Wen

    2013-01-01

    Background The pine tip moth, Rhyacionia leptotubula (Lepidoptera: Tortricidae) is one of the most destructive forestry pests in Yunnan Province, China. Despite its importance, less is known regarding all aspects of this pest. Understanding the genetic information of it is essential for exploring the specific traits at the molecular level. Thus, we here sequenced the transcriptome of R. leptotubula with high-throughput Illumina sequencing. Methodology/Principal Findings In a single run, more than 60 million sequencing reads were generated. De novo assembling was performed to generate a collection of 46,910 unigenes with mean length of 642 bp. Based on Blastx search with an E-value cut-off of 10−5, 22,581 unigenes showed significant similarities to known proteins from National Center for Biotechnology Information (NCBI) non-redundant (Nr) protein database. Of these annotated unigenes, 10,360, 6,937 and 13,894 were assigned to Gene Ontology (GO), Clusters of Orthologous Group (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, respectively. A total of 5,926 unigenes were annotated with domain similarity derived functional information, of which 55 and 39 unigenes respectively encoding the insecticide resistance related enzymes, cytochrome P450 and carboxylesterase. Using the transcriptome data, 47 unigenes belonging to the typical “stress” genes of heat shock protein (Hsp) family were retrieved. Furthermore, 1,450 simple sequence repeats (SSRs) were detected; 3.09% of the unigenes contained SSRs. Large numbers of SSR primer pairs were designed and out of randomly verified primer pairs 80% were successfully yielded amplicons. Conclusions/Significance A large of putative R. leptotubula transcript sequences has been obtained from the deep sequencing, which extensively increases the comprehensive and integrated genomic resources of this pest. This large-scale transcriptome dataset will be an important information platform for promoting our

  19. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny.

    PubMed

    Scaglione, Davide; Reyes-Chin-Wo, Sebastian; Acquadro, Alberto; Froenicke, Lutz; Portis, Ezio; Beitel, Christopher; Tirone, Matteo; Mauro, Rosario; Lo Monaco, Antonino; Mauromicale, Giovanni; Faccioli, Primetta; Cattivelli, Luigi; Rieseberg, Loren; Michelmore, Richard; Lanteri, Sergio

    2016-01-01

    Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species. PMID:26786968

  20. De novo assembly, gene annotation, and simple sequence repeat marker development using Illumina paired-end transcriptome sequences in the pearl oyster Pinctada maxima.

    PubMed

    Deng, Yuewen; Lei, Qiannan; Tian, Qunli; Xie, Shaohe; Du, Xiaodong; Li, Junhui; Wang, Liqun; Xiong, Yuanxin

    2014-01-01

    We analyzed the mantle transcriptome of pearl oyster Pinctada maxima and developed EST-SSR markers using Illumina HiSeq 2000 paired-end sequencing technology. A total of 49,500,748 raw reads were generated. De novo assembly generated 108,704 unigenes with an average length of 407 bp. Sequence similarity search with known proteins or nucleotides revealed that 30,200 (27.78%) and 25,824 (23.76%) consensus sequences were homologous with the sequences in the non-redundant protein and Swiss-Prot databases, respectively, and that 19,701 (18.12%) of these unigenes were possibly involved in approximately 234 known signaling pathways in the Kyoto Encyclopedia of Genes and Genomes database. Ninety one biomineralization-related unigenes were detected. In a cultured stock, 1764 simple sequence repeats were identified and 56 primer pairs were randomly selected and tested. The rate of successful amplification was 68.3%. The developed molecular markers are helpful for further studies on genetic linkage analysis, gene localization, and quantitative trait loci mapping.

  1. The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny

    PubMed Central

    Scaglione, Davide; Reyes-Chin-Wo, Sebastian; Acquadro, Alberto; Froenicke, Lutz; Portis, Ezio; Beitel, Christopher; Tirone, Matteo; Mauro, Rosario; Lo Monaco, Antonino; Mauromicale, Giovanni; Faccioli, Primetta; Cattivelli, Luigi; Rieseberg, Loren; Michelmore, Richard; Lanteri, Sergio

    2016-01-01

    Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species. PMID:26786968

  2. De novo discovery of neuropeptides in the genomes of parasitic flatworms using a novel comparative approach.

    PubMed

    Koziol, Uriel; Koziol, Miguel; Preza, Matías; Costábile, Alicia; Brehm, Klaus; Castillo, Estela

    2016-10-01

    Neuropeptide mediated signalling is an ancient mechanism found in almost all animals and has been proposed as a promising target for the development of novel drugs against helminths. However, identification of neuropeptides from genomic data is challenging, and knowledge of the neuropeptide complement of parasitic flatworms is still fragmentary. In this work, we have developed an evolution-based strategy for the de novo discovery of neuropeptide precursors, based on the detection of localised sequence conservation between possible prohormone convertase cleavage sites. The method detected known neuropeptide precursors with good precision and specificity in the models Drosophila melanogaster and Caenorhabditis elegans. Furthermore, it identified novel putative neuropeptide precursors in nematodes, including the first description of allatotropin homologues in this phylum. Our search for neuropeptide precursors in the genomes of parasitic flatworms resulted in the description of 34 conserved neuropeptide precursor families, including 13 new ones, and of hundreds of new homologues of known neuropeptide precursor families. Most neuropeptide precursor families show a wide phylogenetic distribution among parasitic flatworms and show little similarity to neuropeptide precursors of other bilaterian animals. However, we could also find orthologs of some conserved bilaterian neuropeptides including pyrokinin, crustacean cardioactive peptide, myomodulin, neuropeptide-Y, neuropeptide KY and SIF-amide. Finally, we determined the expression patterns of seven putative neuropeptide precursor genes in the protoscolex of Echinococcus multilocularis. All genes were expressed in the nervous system with different patterns, indicating a hidden complexity of peptidergic signalling in cestodes.

  3. De novo discovery of neuropeptides in the genomes of parasitic flatworms using a novel comparative approach.

    PubMed

    Koziol, Uriel; Koziol, Miguel; Preza, Matías; Costábile, Alicia; Brehm, Klaus; Castillo, Estela

    2016-10-01

    Neuropeptide mediated signalling is an ancient mechanism found in almost all animals and has been proposed as a promising target for the development of novel drugs against helminths. However, identification of neuropeptides from genomic data is challenging, and knowledge of the neuropeptide complement of parasitic flatworms is still fragmentary. In this work, we have developed an evolution-based strategy for the de novo discovery of neuropeptide precursors, based on the detection of localised sequence conservation between possible prohormone convertase cleavage sites. The method detected known neuropeptide precursors with good precision and specificity in the models Drosophila melanogaster and Caenorhabditis elegans. Furthermore, it identified novel putative neuropeptide precursors in nematodes, including the first description of allatotropin homologues in this phylum. Our search for neuropeptide precursors in the genomes of parasitic flatworms resulted in the description of 34 conserved neuropeptide precursor families, including 13 new ones, and of hundreds of new homologues of known neuropeptide precursor families. Most neuropeptide precursor families show a wide phylogenetic distribution among parasitic flatworms and show little similarity to neuropeptide precursors of other bilaterian animals. However, we could also find orthologs of some conserved bilaterian neuropeptides including pyrokinin, crustacean cardioactive peptide, myomodulin, neuropeptide-Y, neuropeptide KY and SIF-amide. Finally, we determined the expression patterns of seven putative neuropeptide precursor genes in the protoscolex of Echinococcus multilocularis. All genes were expressed in the nervous system with different patterns, indicating a hidden complexity of peptidergic signalling in cestodes. PMID:27388856

  4. High throughput de novo RNA sequencing elucidates novel responses in Penicillium chrysogenum under microgravity.

    PubMed

    Sathishkumar, Yesupatham; Krishnaraj, Chandran; Rajagopal, Kalyanaraman; Sen, Dwaipayan; Lee, Yang Soo

    2016-02-01

    In this study, the transcriptional alterations in Penicillium chrysogenum under simulated microgravity conditions were analyzed for the first time using an RNA-Seq method. The increasing plethora of eukaryotic microbial flora inside the spaceship demands the basic understanding of fungal biology in the absence of gravity vector. Penicillium species are second most dominant fungal contaminant in International Space Station. Penicillium chrysogenum an industrially important organism also has the potential to emerge as an opportunistic pathogen for the astronauts during the long-term space missions. But till date, the cellular mechanisms underlying the survival and adaptation of Penicillium chrysogenum to microgravity conditions are not clearly elucidated. A reference genome for Penicillium chrysogenum is not yet available in the NCBI database. Hence, we performed comparative de novo transcriptome analysis of Penicillium chrysogenum grown under microgravity versus normal gravity. In addition, the changes due to microgravity are documented at the molecular level. Increased response to the environmental stimulus, changes in the cell wall component ABC transporter/MFS transporters are noteworthy. Interestingly, sustained increase in the expression of Acyl-coenzyme A: isopenicillin N acyltransferase (Acyltransferase) under microgravity revealed the significance of gravity in the penicillin production which could be exploited industrially. PMID:26603994

  5. High throughput de novo RNA sequencing elucidates novel responses in Penicillium chrysogenum under microgravity.

    PubMed

    Sathishkumar, Yesupatham; Krishnaraj, Chandran; Rajagopal, Kalyanaraman; Sen, Dwaipayan; Lee, Yang Soo

    2016-02-01

    In this study, the transcriptional alterations in Penicillium chrysogenum under simulated microgravity conditions were analyzed for the first time using an RNA-Seq method. The increasing plethora of eukaryotic microbial flora inside the spaceship demands the basic understanding of fungal biology in the absence of gravity vector. Penicillium species are second most dominant fungal contaminant in International Space Station. Penicillium chrysogenum an industrially important organism also has the potential to emerge as an opportunistic pathogen for the astronauts during the long-term space missions. But till date, the cellular mechanisms underlying the survival and adaptation of Penicillium chrysogenum to microgravity conditions are not clearly elucidated. A reference genome for Penicillium chrysogenum is not yet available in the NCBI database. Hence, we performed comparative de novo transcriptome analysis of Penicillium chrysogenum grown under microgravity versus normal gravity. In addition, the changes due to microgravity are documented at the molecular level. Increased response to the environmental stimulus, changes in the cell wall component ABC transporter/MFS transporters are noteworthy. Interestingly, sustained increase in the expression of Acyl-coenzyme A: isopenicillin N acyltransferase (Acyltransferase) under microgravity revealed the significance of gravity in the penicillin production which could be exploited industrially.

  6. Sequencing, De novo Assembly, Functional Annotation and Analysis of Phyllanthus amarus Leaf Transcriptome Using the Illumina Platform

    PubMed Central

    Bose Mazumdar, Aparupa; Chattopadhyay, Sharmila

    2016-01-01

    Phyllanthus amarus Schum. and Thonn., a widely distributed annual medicinal herb has a long history of use in the traditional system of medicine for over 2000 years. However, the lack of genomic data for P. amarus, a non-model organism hinders research at the molecular level. In the present study, high-throughput sequencing technology has been employed to enhance better understanding of this herb and provide comprehensive genomic information for future work. Here P. amarus leaf transcriptome was sequenced using the Illumina Miseq platform. We assembled 85,927 non-redundant (nr) “unitranscript” sequences with an average length of 1548 bp, from 18,060,997 raw reads. Sequence similarity analyses and annotation of these unitranscripts were performed against databases like green plants nr protein database, Gene Ontology (GO), Clusters of Orthologous Groups (COG), PlnTFDB, KEGG databases. As a result, 69,394 GO terms, 583 enzyme codes (EC), 134 KEGG maps, and 59 Transcription Factor (TF) families were generated. Functional and comparative analyses of assembled unitranscripts were also performed with the most closely related species like Populus trichocarpa and Ricinus communis using TRAPID. KEGG analysis showed that a number of assembled unitranscripts were involved in secondary metabolites, mainly phenylpropanoid, flavonoid, terpenoids, alkaloids, and lignan biosynthetic pathways that have significant medicinal attributes. Further, Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values of the identified secondary metabolite pathway genes were determined and Reverse Transcription PCR (RT-PCR) of a few of these genes were performed to validate the de novo assembled leaf transcriptome dataset. In addition 65,273 simple sequence repeats (SSRs) were also identified. To the best of our knowledge, this is the first transcriptomic dataset of P. amarus till date. Our study provides the largest genetic resource that will lead to drug development and pave

  7. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    PubMed

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  8. Comparative Transcriptomic Approaches Exploring Contamination Stress Tolerance in Salix sp. Reveal the Importance for a Metaorganismal de Novo Assembly Approach for Nonmodel Plants1[OPEN

    PubMed Central

    Brereton, Nicholas J. B.; Marleau, Julie; Nissim, Werther Guidi; Labrecque, Michel; Joly, Simon; Pitre, Frederic E.

    2016-01-01

    Metatranscriptomic study of nonmodel organisms requires strategies that retain the highly resolved genetic information generated from model organisms while allowing for identification of the unexpected. A real-world biological application of phytoremediation, the field growth of 10 Salix cultivars on polluted soils, was used as an exemplar nonmodel and multifaceted crop response well-disposed to the study of gene expression. Sequence reads were assembled de novo to create 10 independent transcriptomes, a global transcriptome, and were mapped against the Salix purpurea 94006 reference genome. Annotation of assembled contigs was performed without a priori assumption of the originating organism. Global transcriptome construction from 3.03 billion paired-end reads revealed 606,880 unique contigs annotated from 1588 species, often common in all 10 cultivars. Comparisons between transcriptomic and metatranscriptomic methodologies provide clear evidence that nonnative RNA can mistakenly map to reference genomes, especially to conserved regions of common housekeeping genes, such as actin, α/β-tubulin, and elongation factor 1-α. In Salix, Rubisco activase transcripts were down-regulated in contaminated trees across all 10 cultivars, whereas thiamine thizole synthase and CP12, a Calvin Cycle master regulator, were uniformly up-regulated. De novo assembly approaches, with unconstrained annotation, can improve data quality; care should be taken when exploring such plant genetics to reduce de facto data exclusion by mapping to a single reference genome alone. Salix gene expression patterns strongly suggest cultivar-wide alteration of specific photosynthetic apparatus and protection of the antenna complexes from oxidation damage in contaminated trees, providing an insight into common stress tolerance strategies in a real-world phytoremediation system. PMID:27002060

  9. Comparative Transcriptomic Approaches Exploring Contamination Stress Tolerance in Salix sp. Reveal the Importance for a Metaorganismal de Novo Assembly Approach for Nonmodel Plants.

    PubMed

    Brereton, Nicholas J B; Gonzalez, Emmanuel; Marleau, Julie; Nissim, Werther Guidi; Labrecque, Michel; Joly, Simon; Pitre, Frederic E

    2016-05-01

    Metatranscriptomic study of nonmodel organisms requires strategies that retain the highly resolved genetic information generated from model organisms while allowing for identification of the unexpected. A real-world biological application of phytoremediation, the field growth of 10 Salix cultivars on polluted soils, was used as an exemplar nonmodel and multifaceted crop response well-disposed to the study of gene expression. Sequence reads were assembled de novo to create 10 independent transcriptomes, a global transcriptome, and were mapped against the Salix purpurea 94006 reference genome. Annotation of assembled contigs was performed without a priori assumption of the originating organism. Global transcriptome construction from 3.03 billion paired-end reads revealed 606,880 unique contigs annotated from 1588 species, often common in all 10 cultivars. Comparisons between transcriptomic and metatranscriptomic methodologies provide clear evidence that nonnative RNA can mistakenly map to reference genomes, especially to conserved regions of common housekeeping genes, such as actin, α/β-tubulin, and elongation factor 1-α. In Salix, Rubisco activase transcripts were down-regulated in contaminated trees across all 10 cultivars, whereas thiamine thizole synthase and CP12, a Calvin Cycle master regulator, were uniformly up-regulated. De novo assembly approaches, with unconstrained annotation, can improve data quality; care should be taken when exploring such plant genetics to reduce de facto data exclusion by mapping to a single reference genome alone. Salix gene expression patterns strongly suggest cultivar-wide alteration of specific photosynthetic apparatus and protection of the antenna complexes from oxidation damage in contaminated trees, providing an insight into common stress tolerance strategies in a real-world phytoremediation system. PMID:27002060

  10. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.

    PubMed

    Rimmer, Andy; Phan, Hang; Mathieson, Iain; Iqbal, Zamin; Twigg, Stephen R F; Wilkie, Andrew O M; McVean, Gil; Lunter, Gerton

    2014-08-01

    High-throughput DNA sequencing technology has transformed genetic research and is starting to make an impact on clinical practice. However, analyzing high-throughput sequencing data remains challenging, particularly in clinical settings where accuracy and turnaround times are critical. We present a new approach to this problem, implemented in a software package called Platypus. Platypus achieves high sensitivity and specificity for SNPs, indels and complex polymorphisms by using local de novo assembly to generate candidate variants, followed by local realignment and probabilistic haplotype estimation. It is an order of magnitude faster than existing tools and generates calls from raw aligned read data without preprocessing. We demonstrate the performance of Platypus in clinically relevant experimental designs by comparing with SAMtools and GATK on whole-genome and exome-capture data, by identifying de novo variation in 15 parent-offspring trios with high sensitivity and specificity, and by estimating human leukocyte antigen genotypes directly from variant calls. PMID:25017105

  11. De novo assembly of the complete organelle genome sequences of azuki bean (Vigna angularis) using next-generation sequencers.

    PubMed

    Naito, Ken; Kaga, Akito; Tomooka, Norihiko; Kawase, Makoto

    2013-06-01

    Since chloroplasts and mitochondria are maternally inherited and have unique features in evolution, DNA sequences of those organelle genomes have been broadly used in phylogenetic studies. Thanks to recent progress in next-generation sequencer (NGS) technology, whole-genome sequencing can be easily performed. Here, using NGS data generated by Roche GS Titanium and Illumina Hiseq 2000, we performed a hybrid assembly of organelle genome sequences of Vigna angularis (azuki bean). Both the mitochondrial genome (mtDNA) and the chloroplast genome (cpDNA) of V. angularis have very similar size and gene content to those of V. radiata (mungbean). However, in structure, mtDNA sequences have undergone many recombination events after divergence from the common ancestor of V. angularis and V. radiata, whereas cpDNAs are almost identical between the two. The stability of cpDNAs and the variability of mtDNAs was further confirmed by comparative analysis of Vigna organelles with model plants Lotus japonicus and Arabidopsis thaliana.

  12. Sequencing and de novo draft assemblies of a fathead minnow (Pimephales promelas) reference genome.

    PubMed

    Burns, Frank R; Cogburn, Amarin L; Ankley, Gerald T; Villeneuve, Daniel L; Waits, Eric; Chang, Yun-Juan; Llaca, Victor; Deschamps, Stephane D; Jackson, Raymond E; Hoke, Robert Alan

    2016-01-01

    The present study was undertaken to provide the foundation for development of genome-scale resources for the fathead minnow (Pimephales promelas), an important model organism widely used in both aquatic toxicology research and regulatory testing. The authors report on the first sequencing and 2 draft assemblies for the reference genome of this species. Approximately 120× sequence coverage was achieved via Illumina sequencing of a combination of paired-end, mate-pair, and fosmid libraries. Evaluation and comparison of these assemblies demonstrate that they are of sufficient quality to be useful for genome-enabled studies, with 418 of 458 (91%) conserved eukaryotic genes mapping to at least 1 of the assemblies. In addition to its immediate utility, the present work provides a strong foundation on which to build further refinements of a reference genome for the fathead minnow.

  13. Transcriptome sequencing and de novo characterization of Korean endemic land snail, Koreanohadra kurodana for functional transcripts and SSR markers.

    PubMed

    Kang, Se Won; Patnaik, Bharat Bhusan; Hwang, Hee-Ju; Park, So Young; Chung, Jong Min; Song, Dae Kwon; Patnaik, Hongray Howrelia; Lee, Jae Bong; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Han, Yeon Soo; Lee, Jun Sang; Lee, Yong Seok

    2016-10-01

    The Korean endemic land snail Koreanohadra kurodana (Gastropoda: Bradybaenidae) found in humid areas of broadleaf forests and shrubs have been considered vulnerable as the number of individuals are declining in recent years. The species is poorly characterized at the genomic level that limits the understanding of functions at the molecular and genetics level. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset of visceral mass tissue of K. kurodana by the Illumina paired-end sequencing technology. Over 234 million quality reads were assembled to a total of 315,924 contigs and 191,071 unigenes, with an average and N50 length of 585.6 and 715 bp and 678 and 927 bp, respectively. Overall, 36.32 % of the unigenes found matches to known protein/nucleotide sequences in the public databases. The direction of the unigenes to functional categories was determined using COG, GO, KEGG, and InterProScan protein domain search. The GO analysis search resulted in 22,967 unigenes (12.02 %) being categorized into 40 functional groups. The KEGG annotation revealed that metabolism pathway genes were enriched. The most prominent protein motifs include the zinc finger, ribonuclease H, reverse transcriptase, and ankyrin repeat domains. The simple sequence repeats (SSRs) identified from >1 kb length of unigenes show a dominancy of dinucleotide repeat motifs followed with tri- and tetranucleotide motifs. A number of unigenes were putatively assessed to belong to adaptation and defense mechanisms including heat shock proteins 70, Toll-like receptor 4, AMP-activated protein kinase, aquaporin-2, etc. Our data provide a rich source for the identification and functional characterization of new genes and candidate polymorphic SSR markers in K. kurodana. The availability of transcriptome information ( http://bioinfo.sch.ac.kr/submission/ ) would promote the utilization of the resources for phylogenetics study and genetic diversity

  14. Transcriptome sequencing and de novo characterization of Korean endemic land snail, Koreanohadra kurodana for functional transcripts and SSR markers.

    PubMed

    Kang, Se Won; Patnaik, Bharat Bhusan; Hwang, Hee-Ju; Park, So Young; Chung, Jong Min; Song, Dae Kwon; Patnaik, Hongray Howrelia; Lee, Jae Bong; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Han, Yeon Soo; Lee, Jun Sang; Lee, Yong Seok

    2016-10-01

    The Korean endemic land snail Koreanohadra kurodana (Gastropoda: Bradybaenidae) found in humid areas of broadleaf forests and shrubs have been considered vulnerable as the number of individuals are declining in recent years. The species is poorly characterized at the genomic level that limits the understanding of functions at the molecular and genetics level. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset of visceral mass tissue of K. kurodana by the Illumina paired-end sequencing technology. Over 234 million quality reads were assembled to a total of 315,924 contigs and 191,071 unigenes, with an average and N50 length of 585.6 and 715 bp and 678 and 927 bp, respectively. Overall, 36.32 % of the unigenes found matches to known protein/nucleotide sequences in the public databases. The direction of the unigenes to functional categories was determined using COG, GO, KEGG, and InterProScan protein domain search. The GO analysis search resulted in 22,967 unigenes (12.02 %) being categorized into 40 functional groups. The KEGG annotation revealed that metabolism pathway genes were enriched. The most prominent protein motifs include the zinc finger, ribonuclease H, reverse transcriptase, and ankyrin repeat domains. The simple sequence repeats (SSRs) identified from >1 kb length of unigenes show a dominancy of dinucleotide repeat motifs followed with tri- and tetranucleotide motifs. A number of unigenes were putatively assessed to belong to adaptation and defense mechanisms including heat shock proteins 70, Toll-like receptor 4, AMP-activated protein kinase, aquaporin-2, etc. Our data provide a rich source for the identification and functional characterization of new genes and candidate polymorphic SSR markers in K. kurodana. The availability of transcriptome information ( http://bioinfo.sch.ac.kr/submission/ ) would promote the utilization of the resources for phylogenetics study and genetic diversity

  15. De novo sequencing transcriptome of endemic Gentiana straminea (Gentianaceae) to identify genes involved in the biosynthesis of active ingredients.

    PubMed

    Zhou, Dangwei; Gao, Shan; Wang, Huan; Lei, Tianxiang; Shen, Jianwei; Gao, Jie; Chen, Shilong; Yin, Jia; Liu, Jianquan

    2016-01-01

    Gentiana straminea is a popular Tibetan medicine that has been used for thousands of years in China to treat various diseases and conditions. Although it has multiple pharmaceutical purposes and important economic plant resource in China, transcriptome and molecular base still known limited. In flowering season, samples were collected from different tissues, using the NGS Illumina. Solexa platform, about 58.85 million sequencing reads were generated and assembled de novo, yielding 78,764 high quality unigenes with an average length of 1090bp. Gene Ontology (GO), KEGG pathway mapping showed that 49,033 of these were identified as putative homologs of annotated sequences in the protein databases. Among them, candidate genes associated with iridoid, flavonoid and anthocyanin were identified. Further the key enzymes involved to iridoid and flavonoid synthesis pathway were analyzed by quantitative real-time polymerase chain reaction (qRT-PCR) on different tissues, the flower and root had the higher expression than leaves. In addition, 7591 SSR markers were identified from the unigenes of the G. straminea transcriptome. The foundation of G. straminea provided the important resource for facilitating to study molecular and functional genomics of it and related this species on the Qinghai-Tibet Plateau.

  16. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    PubMed

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications. PMID:26259198

  17. De Novo Transcriptome Sequencing of Low Temperature-Treated Phlox subulata and Analysis of the Genes Involved in Cold Stress

    PubMed Central

    Qu, Yanting; Zhou, Aimin; Zhang, Xing; Tang, Huanwei; Liang, Ming; Han, Hui; Zuo, Yuhu

    2015-01-01

    Phlox subulata, a perennial herbaceous flower, can survive during the winter of northeast China, where the temperature can drop to −30 °C, suggesting that P. subulata is an ideal model for studying the molecular mechanisms of cold acclimation in plants. However, little is known about the gene expression profile of P. subulata under cold stress. Here, we examined changes in cold stress-related genes in P. subulata. We sequenced three cold-treated (CT) and control (CK) samples of P. subulata. After de novo assembly and quantitative assessment of the obtained reads, 99,174 unigenes were generated. Based on similarity searches with known proteins in public protein databases, 59,994 unigenes were functionally annotated. Among all differentially expressed genes (DEGs), 8302, 10,638 and 11,021 up-regulated genes and 9898, 17,876, and 12,358 down-regulated genes were identified after treatment at 4, 0, and −10 °C, respectively. Furthermore, 3417 up-regulated unigenes were expressed only in CT samples. Twenty major cold-related genes, including transcription factors, antioxidant enzymes, osmoregulation proteins, and Ca2+ and ABA signaling components, were identified, and their expression levels were estimated. Overall, this is the first transcriptome sequencing of this plant species under cold stress. Studies of DEGs involved in cold-related metabolic pathways may facilitate the discovery of cold-resistance genes. PMID:25938968

  18. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines.

    PubMed

    Shukla, Anjali; Singh, V K; Bharadwaj, D R; Kumar, Rajesh; Rai, Ashutosh; Rai, A K; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P S

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd. PMID:26047102

  19. De Novo Transcriptome Sequencing of Low Temperature-Treated Phlox subulata and Analysis of the Genes Involved in Cold Stress.

    PubMed

    Qu, Yanting; Zhou, Aimin; Zhang, Xing; Tang, Huanwei; Liang, Ming; Han, Hui; Zuo, Yuhu

    2015-04-29

    Phlox subulata, a perennial herbaceous flower, can survive during the winter of northeast China, where the temperature can drop to -30 °C, suggesting that P. subulata is an ideal model for studying the molecular mechanisms of cold acclimation in plants. However, little is known about the gene expression profile of P. subulata under cold stress. Here, we examined changes in cold stress-related genes in P. subulata. We sequenced three cold-treated (CT) and control (CK) samples of P. subulata. After de novo assembly and quantitative assessment of the obtained reads, 99,174 unigenes were generated. Based on similarity searches with known proteins in public protein databases, 59,994 unigenes were functionally annotated. Among all differentially expressed genes (DEGs), 8302, 10,638 and 11,021 up-regulated genes and 9898, 17,876, and 12,358 down-regulated genes were identified after treatment at 4, 0, and -10 °C, respectively. Furthermore, 3417 up-regulated unigenes were expressed only in CT samples. Twenty major cold-related genes, including transcription factors, antioxidant enzymes, osmoregulation proteins, and Ca²⁺ and ABA signaling components, were identified, and their expression levels were estimated. Overall, this is the first transcriptome sequencing of this plant species under cold stress. Studies of DEGs involved in cold-related metabolic pathways may facilitate the discovery of cold-resistance genes.

  20. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes.

    PubMed

    Xu, Xing-Li; Cheng, Tian-Yin; Yang, Hu; Yan, Fen; Yang, Ya

    2015-06-01

    Saliva plays an important role in feeding and pathogen transmission, identification and analysis of tick salivary gland (SG) proteins is considered as a hot spot in anti-tick researching area. Herein, we present the first description of SG transcriptome of Haemaphysalis flava using next-generation sequencing (NGS). A total of over 143 million high-quality reads were assembled into 54,357 unigenes, of which 20,145 (37.06%) had significant similarities to proteins in the Swiss-Prot database. 13,513 annotated sequences were associated with GO terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that 14,280 unigenes were assigned to 279 KEGG pathways in total. Reads per kb per million reads (RPKM) analysis showed that there were 3035 down-regulated unigenes and 2260 up-regulated unigenes in the engorged ticks (ET) compared with the semi-engorged one (SET). Several important genes are associated with blood feeding and ingestion as secreted salivary proteins, concluding cysteine, longipain, 4D8, calreticulin, metalloproteases, serine protease inhibitor, enolase, heat shock protein and AV422 in SG, were identified. The qRT-PCR results confirmed that patterns of these genes (except for the longipain gene) expression were consistent with RNA-seq results. This de novo assembly of SG transcriptome of H. flava not only provides more chance for screening and cloning functional genes, but also forms a solid basis for further insight into the changes of salivary proteins during blood-feeding.

  1. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines

    PubMed Central

    Shukla, Anjali; Singh, V. K.; Bharadwaj, D. R.; Kumar, Rajesh; Rai, Ashutosh; Rai, A. K.; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P. S.

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd. PMID:26047102

  2. De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube

    PubMed Central

    Iaria, Domenico

    2016-01-01

    In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca2+ binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive. PMID:26998509

  3. De novo characterization of the Dialeurodes citri transcriptome: mining genes involved in stress resistance and simple sequence repeats (SSRs) discovery.

    PubMed

    Chen, E-H; Wei, D-D; Shen, G-M; Yuan, G-R; Bai, P-P; Wang, J-J

    2014-02-01

    The citrus whitefly, Dialeurodes citri (Ashmead), is one of the three economically important whitefly species that infest citrus plants around the world; however, limited genetic research has been focused on D. citri, partly because of lack of genomic resources. In this study, we performed de novo assembly of a transcriptome using Illumina paired-end sequencing technology (Illumina Inc., San Diego, CA, USA). In total, 36,766 unigenes with a mean length of 497 bp were identified. Of these unigenes, we identified 17,788 matched known proteins in the National Center for Biotechnology Information database, as determined by Blast search, with 5731, 4850 and 14,441 unigenes assigned to clusters of orthologous groups (COG), gene ontology (GO), and SwissProt, respectively. In total, 7507 unigenes were assigned to 308 known pathways. In-depth analysis of the data showed that 117 unigenes were identified as potentially involved in the detoxification of xenobiotics and 67 heat shock protein (Hsp) genes were associated with environmental stress. In addition, these enzymes were searched against the GO and COG database, and the results showed that the three major detoxification enzymes and Hsps were classified into 18 and 3, 6, and 8 annotations, respectively. In addition, 149 simple sequence repeats were detected. The results facilitate the investigation of molecular resistance mechanisms to insecticides and environmental stress, and contribute to molecular marker development. The findings greatly improve our genetic understanding of D. citri, and lay the foundation for future functional genomics studies on this species.

  4. Novel proline-hydroxyproline glycopeptides from the dandelion (Taraxacum officinale Wigg.) flowers: de novo sequencing and biological activity.

    PubMed

    Astafieva, Alexandra A; Enyenihi, Atim A; Rogozhin, Eugene A; Kozlov, Sergey A; Grishin, Eugene V; Odintsova, Tatyana I; Zubarev, Roman A; Egorov, Tsezi A

    2015-09-01

    Two novel homologous peptides named ToHyp1 and ToHyp2 that show no similarity to any known proteins were isolated from Taraxacum officinale Wigg. flowers by multidimensional liquid chromatography. Amino acid and mass spectrometry analyses demonstrated that the peptides have unusual structure: they are cysteine-free, proline-hydroxyproline-rich and post-translationally glycosylated by pentoses, with 5 carbohydrates in ToHyp2 and 10 in ToHyp1. The ToHyp2 peptide with a monoisotopic molecular mass of 4350.3Da was completely sequenced by a combination of Edman degradation and de novo sequencing via top down multistage collision induced dissociation (CID) and higher energy dissociation (HCD) tandem mass spectrometry (MS(n)). ToHyp2 consists of 35 amino acids, contains eighteen proline residues, of which 8 prolines are hydroxylated. The peptide displays antifungal activity and inhibits growth of Gram-positive and Gram-negative bacteria. We further showed that carbohydrate moieties have no significant impact on the peptide structure, but are important for antifungal activity although not absolutely necessary. The deglycosylated ToHyp2 peptide was less active against the susceptible fungus Bipolaris sorokiniana than the native peptide. Unique structural features of the ToHyp2 peptide place it into a new family of plant defense peptides. The discovery of ToHyp peptides in T. officinale flowers expands the repertoire of molecules of plant origin with practical applications.

  5. Sequencing and De novo Draft Assemblies of the Fathead Minnow (Pimphales promelas)Reference Genome

    EPA Science Inventory

    This study was undertaken to develop genome-scale resources for the fathead minnow (Pimphales promelas) an important model organism widely used in both aquatic ecotoxicology research and in regulatory toxicity testing. We report on the first sequencing and two draft assemblies fo...

  6. Sequence analysis for a de novo genome assembly of Bos indicus (Nelore) cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A second draft sequence assembly of the bovine genome based on the sub-species, Bos indicus, is essential to better evaluate the genetic variation underlying the prototypical beef and dairy cattle in tropical and sub-tropical production environments. A linebred bull (Futuro), two generations remove...

  7. De Novo Genome Sequence of "Candidatus Liberibacter solanacearum" from a Single Potato Psyllid in California.

    PubMed

    Wu, F; Deng, X; Liang, G; Wallis, C; Trumble, J T; Prager, S; Chen, J

    2015-01-01

    The draft genome sequence of "Candidatus Liberibacter solanacearum" strain RSTM from a potato psyllid (Bactericera cockerelli) in California is reported here. The RSTM strain has a genome size of 1,286,787 bp, a G+C content of 35.1%, 1,211 predicted open reading frames (ORFs), and 43 RNA genes. PMID:26679599

  8. Increased Frequency of De Novo Copy Number Variations in Congenital Heart Disease by Integrative Analysis of SNP Array and Exome Sequence Data

    PubMed Central

    Rodriguez-Murillo, Laura; Fromer, Menachem; Mazaika, Erica; Vardarajan, Badri; Italia, Michael; Leipzig, Jeremy; DePalma, Steven R.; Golhar, Ryan; Sanders, Stephan J.; Yamrom, Boris; Ronemus, Michael; Iossifov, Ivan; Willsey, A. Jeremy; State, Matthew W.; Kaltman, Jonathan R.; White, Peter S.; Shen, Yufeng; Warburton, Dorothy; Brueckner, Martina; Seidman, Christine; Goldmuntz, Elizabeth; Gelb, Bruce D.; Lifton, Richard; Seidman, Jonathan; Hakonarson, Hakon; Chung, Wendy K.

    2014-01-01

    Rationale Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown etiology. Objective To determine the contribution of de novo copy number variants (CNVs) in the etiology of sporadic CHD. Methods and Results We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism (SNP) arrays and/or whole exome sequencing (WES). Results were experimentally validated using digital droplet PCR. We compared validated CNVs in CHD cases to CNVs in 1,301 healthy control trios. The two complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either SNP array (p=7x10−5, Odds Ratio (OR)=4.6) or WES data (p=6x10−4, OR=3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (p=0.02, OR=2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in WES and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q sub-telomeric deletions. Conclusions We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD. PMID:25205790

  9. De novo sequencing and a comprehensive analysis of purple sweet potato (Impomoea batatas L.) transcriptome.

    PubMed

    Xie, Fuliang; Burklew, Caitlin E; Yang, Yanfang; Liu, Min; Xiao, Peng; Zhang, Baohong; Qiu, Deyou

    2012-07-01

    High-throughput RNA sequencing was performed for comprehensively analyzing the transcriptome of the purple sweet potato. A total of 58,800 unigenes were obtained and ranged from 200 nt to 10,380 nt with an average length of 476 nt. The average expression of one unigene was 34 reads per kb per million reads (RPKM) with a maximum expression of 1,935 RPKM. At least 40,280 (68.5%) unigenes were identified to be protein-coding genes, in which 11,978 and 5,184 genes were homologous to Arabidopsis and rice proteins, respectively. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis showed that 19,707 (33.5%) unigenes were classified to 1,807 terms of GO including molecular functions, biological processes, and cellular components and 9,970 (17.0%) unigenes were enriched to 11,119 KEGG pathways. We found that at least 3,553 genes may be involved in the biosynthesis pathways of starch, alkaloids, anthocyanin pigments, and vitamins. Additionally, 851 potential simple sequence repeats (SSRs) were identified in all unigenes. Transcriptome sequencing on tuberous roots of the sweet potato yielded substantial transcriptional sequences and potentially useful SSR markers which provide an important data source for sweet potato research. Comparison of two RNA-sequence datasets from the purple and the yellow sweet potato showed that UDP-glucose-flavonoid 3-O-glucosyltransferase was one of the key enzymes in the pathway of anthocyanin biosynthesis and that anthocyanin-3-glucoside might be one of the major components for anthocyanin pigments in the purple sweet potato. This study contributes to the molecular mechanisms of sweet potato development and metabolism and therefore that increases the potential utilization of the sweet potato in food nutrition and pharmacy.

  10. Illumina-based de novo transcriptome sequencing and analysis of Amanita exitialis basidiocarps.

    PubMed

    Li, Peng; Deng, Wang-qiu; Li, Tai-hui; Song, Bin; Shen, Ya-heng

    2013-12-10

    Amanita exitialis is a lethal mushroom that was first discovered in Guangdong Province, China. The high content of amanitin in its basidiocarps makes it lethal to humans. To comprehensively characterize the A. exitialis transcriptome and analyze the Amanita toxins as well as their related gene family, transcriptome sequencing of A. exitialis was performed using Illumina HiSeq 2000 technology. A total of 25,563,688 clean reads were collected and assembled into 62,137 cDNA contigs with an average length of 481 bp and N50 length of 788 bp. A total of 27,826 proteins and 39,661 unigenes were identified among the assembled contigs. All of the unigenes were classified into 166 functional categories for understanding the gene functions and regulation pathways. The genes contributing to toxic peptide biosynthesis were analyzed. From this set, eleven gene sequences encoding the toxins or related cyclic peptides were discovered in the transcriptome. Three of these sequences matched the peptide toxins α-amanitin, β-amanitin, and phallacidin, while others matched amanexitide and seven matched unknown peptides. All of the genes encoding peptide toxins were confirmed by polymerase chain reaction (PCR) in A. exitialis, and the phylogenetic relationships among these proprotein sequences were discussed. The gene polymorphism and degeneracy of the toxin encoding sequences were found and analyzed. This study provides the first primary transcriptome of A. exitialis, which provided comprehensive gene expression information on the lethal amanitas at the transcriptional level, and could lay a strong foundation for functional genomics studies in those fungi.

  11. A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

    PubMed Central

    Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth

    2016-01-01

    Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s

  12. IDBA-MT: de novo assembler for metatranscriptomic data generated from next-generation sequencing technology.

    PubMed

    Leung, Henry C M; Yiu, Siu-Ming; Parkinson, John; Chin, Francis Y L

    2013-07-01

    High-throughput next-generation sequencing technology provides a great opportunity for analyzing metatranscriptomic data. However, the reads produced by these technologies are short and an assembling step is required to combine the short reads into longer contigs. As there are many repeat patterns in mRNAs from different genomes and the abundance ratio of mRNAs in a sample varies a lot, existing assemblers for genomic data, transcriptomic data, and metagenomic data do not work on metatranscriptomic data and produce chimeric contigs, that is, incorrect contigs formed by merging multiple mRNA sequences. To our best knowledge, there is no assembler designed for metatranscriptomic data. In this article, we introduce an assembler called IDBA-MT, which is designed for assembling reads from metatranscriptomic data. IDBA-MT produces much fewer chimeric contigs (reduce by 50% or more) when compared with existing assemblers such as Oases, IDBA-UD, and Trinity. PMID:23829653

  13. Sequencing and de novo assembly of a Dahlia hybrid cultivar transcriptome.

    PubMed

    Lehnert, Erik M; Walbot, Virginia

    2014-01-01

    Dahlia variabilis, with an exceptionally high diversity of floral forms and colors, is a popular flower amongst both commercial growers and hobbyists. Recently, some genetic controls of pigment patterns have been elucidated. These studies have been limited, however, by the lack of comprehensive transcriptomic resources for this species. Here we report the sequencing, assembly, and annotation of the transcriptome of the developing leaves, stems, and floral buds of D. variabilis. This resulted in 35,638 contigs, most of which seem to contain the complete coding sequence, and of which 20,881 could be successfully annotated by similarity to UniProt. Furthermore, we conducted a preliminary investigation to identify contigs with expression patterns consistent with tissue-specificity. These results will accelerate research into the genetic controls of pigmentation and floral form of D. variabilis. PMID:25101098

  14. De novo transcriptome sequencing reveals a considerable bias in the incidence of simple sequence repeats towards the downstream of 'Pre-miRNAs' of black pepper.

    PubMed

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.

  15. De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

    PubMed

    Peng, Yanhui; Lai, Zhao; Lane, Thomas; Nageswara-Rao, Madhugiri; Okada, Miki; Jasieniuk, Marie; O'Geen, Henriette; Kim, Ryan W; Sammons, R Douglas; Rieseberg, Loren H; Stewart, C Neal

    2014-11-01

    Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.

  16. De novo sequencing and transcriptome analysis of Wolfiporia cocos to reveal genes related to biosynthesis of triterpenoids.

    PubMed

    Shu, Shaohua; Chen, Bei; Zhou, Mengchun; Zhao, Xinmei; Xia, Haiyang; Wang, Mo

    2013-01-01

    Wolfiporia cocos Ryvarden et Gilbertson is a saprophytic fungus in the Basidiomycetes. Its dried sclerotium is widely used as a traditional crude drug in East Asia. Especially in China, the dried sclerotium is regarded as the silver of the Chinese traditional drugs, not only for its white color, but also its medicinal value. Furthermore, triterpenoids from W. cocos are the main active compounds with antitumor and anti-inflammatory activity. Biosynthesis of the triterpenoids has rarely been researched. In this study, the de novo sequencing of the mycelia and sclerotia of W. cocos were carried out by Illumina HiSeq 2000. A total of 3,484,996,740 bp from 38,722,186 sequence reads of mycelia, and 3,573,921,960 bp from 39,710,244 high quality sequence reads of sclerotium were obtained. These raw data were assembled into 60,354 contigs and 40,939 singletons, and 56,938 contigs and 37,220 singletons for mycelia and sclerotia, respectively. The transcriptomic data clearly showed that terpenoid biosynthesis was only via the MVA pathwayin W. cocos. The production of total triterpenoids and pachymic acid was examined in the dry mycelia and sclerotia. The content of total triterpenoids was 5.36% and 1.43% in mycelia and sclerotia, respectively, and the content of pachymic acid was 0.458% and 0.174%. Some genes involved in the triterpenoid biosynthetic pathway were chosen to be verified by qRT-PCR. The unigenes encoding diphosphomevalonate decarboxylase (Unigene 20430), farnesyl diphosphate synthase (Unigene 14106 and 21656), hydroxymethylglutaryl-CoA reductase (NADPH) (Unigene 6395_All) and lanosterol synthase (Unigene28001_All) were upregulated in the mycelia stage. It is likely that expression of these genes influences the biosynthesis of triterpenoids in the mycelia stage.

  17. De Novo Assembly of Auricularia polytricha Transcriptome Using Illumina Sequencing for Gene Discovery and SSR Marker Identification

    PubMed Central

    Zhou, Yan; Chen, Lianfu; Fan, Xiuzhi; Bian, Yinbing

    2014-01-01

    Auricularia polytricha (Mont.) Sacc., a type of edible black-brown mushroom with a gelatinous and modality-specific fruiting body, is in high demand in Asia due to its nutritional and medicinal properties. Illumina Solexa sequenceing technology was used to generate very large transcript sequences from the mycelium and the mature fruiting body of A. polytricha for gene discovery and molecular marker development. De novo assembly generated 36,483 ESTs with an N50 length of 636 bp. A total of 28,108 ESTs demonstrated significant hits with known proteins in the nr database, and 94.03% of the annotated ESTs showed the greatest similarity to A. delicata, a related species of A. polytricha. Functional categorization of the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways revealed the conservation of genes involved in various biological processes in A. polytricha. Gene expression profile analysis indicated that a total of 2,057 ESTs were differentially expressed, including 1,020 ESTs that were up-regulated in the mycelium and 1,037 up-regulated in the fruiting body. Functional enrichment showed that the ESTs associated with biosynthesis, metabolism and assembly of proteins were more active in fruiting body development. The expression patterns of homologous transcription factors indicated that the molecular mechanisms of fruiting body formation and development were not exactly the same as for other agarics. Interestingly, an EST encoding tyrosinase was significantly up-regulated in the fruiting body, indicating that melanins accumulated during the processes of the formation of the black-brown color of the fruiting body in A. polytricha development. In addition, a total of 1,715 potential SSRs were detected in this transcriptome. The transcriptome analysis of A. polytricha provides valuable sequence resources and numerous molecular markers to facilitate further functional genomics studies and

  18. A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

    PubMed Central

    2013-01-01

    Background The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. Results We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. Conclusions These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies. PMID:23496952

  19. De novo assembly of transcriptome sequencing in Caragana korshinskii Kom. and characterization of EST-SSR markers.

    PubMed

    Long, Yan; Wang, Yanyan; Wu, Shanshan; Wang, Jiao; Tian, Xinjie; Pei, Xinwu

    2015-01-01

    Caragana korshinskii Kom. is widely distributed in various habitats, including gravel desert, clay desert, fixed and semi-fixed sand, and saline land in the Asian and African deserts. To date, no previous genomic information or EST-SSR marker has been reported in Caragana Fabr. genus. In this study, more than two billion bases of high-quality sequence of C. korshinskii were generated by using illumina sequencing technology and demonstrated the de novo assembly and annotation of genes without prior genome information. These reads were assembled into 86,265 unigenes (mean length = 709 bp). The similarity search indicated that 33,955 and 21,978 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 26,232 a unigenes were separately assigned to Gene Ontology (GO) database. When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database, 5,598 unigenes were assigned to 5 main categories including 32 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (2,862, 43.7%), suggesting the active metabolic processes in the desert tree. In addition, a total of 19,150 EST-SSRs were identified from 15,484 unigenes, and the characterizations of EST-SSRs were further compared with other four species in Fabraceae. 126 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among the 9 germplasms in Caranaga Fabr. genus, PCR success rate were 93.7% and the phylogenic tree was constructed based on the genotypic data. This research generated a substantial fraction of transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding.

  20. De Novo Assembly of Transcriptome Sequencing in Caragana korshinskii Kom. and Characterization of EST-SSR Markers

    PubMed Central

    Long, Yan; Wang, Yanyan; Wu, Shanshan; Wang, Jiao; Tian, Xinjie; Pei, Xinwu

    2015-01-01

    Caragana korshinskii Kom. is widely distributed in various habitats, including gravel desert, clay desert, fixed and semi-fixed sand, and saline land in the Asian and African deserts. To date, no previous genomic information or EST-SSR marker has been reported in Caragana Fabr. genus. In this study, more than two billion bases of high-quality sequence of C. korshinskii were generated by using illumina sequencing technology and demonstrated the de novo assembly and annotation of genes without prior genome information. These reads were assembled into 86,265 unigenes (mean length = 709 bp). The similarity search indicated that 33,955 and 21,978 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 26,232 a unigenes were separately assigned to Gene Ontology (GO) database. When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database, 5,598 unigenes were assigned to 5 main categories including 32 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (2,862, 43.7%), suggesting the active metabolic processes in the desert tree. In addition, a total of 19,150 EST-SSRs were identified from 15,484 unigenes, and the characterizations of EST-SSRs were further compared with other four species in Fabraceae. 126 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among the 9 germplasms in Caranaga Fabr. genus, PCR success rate were 93.7% and the phylogenic tree was constructed based on the genotypic data. This research generated a substantial fraction of transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding. PMID:25629164

  1. De Novo Sequencing and Assembly Analysis of the Pseudostellaria heterophylla Transcriptome

    PubMed Central

    Li, Jun; Zhen, Wei; Long, Dengkai; Ding, Ling; Gong, Anhui; Xiao, Chenghong; Jiang, Weike; Liu, Xiaoqing; Zhou, Tao; Huang, Luqi

    2016-01-01

    Pseudostellaria heterophylla (Miq.) Pax is a mild tonic herb widely cultivated in the Southern part of China. The tuberous roots of P. heterophylla accumulate high levels of secondary metabolism products of medicinal value such as saponins, flavonoids, and isoquinoline alkaloids. Despite numerous studies on the pharmacological importance and purification of these compounds in P. heterophylla, their biosynthesis is not well understood. In the present study, we used Illumina HiSeq 4000 sequencing platform to sequence the RNA from flowers, leaves, stem, root cortex and xylem tissues of P. heterophylla. We obtained 616,413,316 clean reads that we assembled into 127, 334 unique sequences with an N50 length of 951 bp. Among these unigenes, 53,184 unigenes (41.76%) were annotated in a public database and 39, 795 unigenes were assigned to 356 KEGG pathways; 23,714 unigenes (8.82%) had high homology with the genes from Beta vulgaris. We discovered 32, 095 DEGs in different tissues and performed GO and KEGG enrichment analysis. The most enriched KEGG pathway of secondary metabolism showed up-regulated expression in tuberous roots as compared with the ground parts of P. heterophylla. Moreover, we identified 72 candidate genes involved in triterpenoids saponins biosynthesis in P. heterophylla. The expression profiles of 11 candidate unigenes were analyzed by quantitative real-time PCR (RT-qPCR). Our study established a global transcriptome database of P. heterophylla for gene identification and regulation. We also identified the candidate unigenes involved in triterpenoids saponins biosynthesis. Our results provide an invaluable resource for the secondary metabolites and physiological processes in different tissues of P. heterophylla. PMID:27764127

  2. De novo assembly and characterization of germinating lettuce seed transcriptome using Illumina paired-end sequencing.

    PubMed

    Liu, Shu-Jun; Song, Shun-Hua; Wang, Wei-Qing; Song, Song-Quan

    2015-11-01

    At supraoptimal temperature, germination of lettuce (Lactuca sativa L.) seeds exhibits a typical germination thermoinhibition, which can be alleviated by sodium nitroprusside (SNP) in a nitric oxide-dependent manner. However, the molecular mechanism of seed germination thermoinhibition and its alleviation by SNP are poorly understood. In the present study, the lettuce seeds imbibed at optimal temperature in water or at supraoptimal temperature with or without 100 μM SNP for different periods of time were used as experimental materials, the total RNA was extracted and sequenced, we gained 147,271,347 raw reads using Illumina paired-end sequencing technique and assembled the transcriptome of germinating lettuce seeds. A total of 51,792 unigenes with a mean length of 849 nucleotides were obtained. Of these unigenes, a total of 29,542 unigenes were annotated by sequence similarity searching in four databases, NCBI non-redundant protein database, SwissProt protein database, euKaryotic Ortholog Groups database, and NCBI nucleotide database. Among the annotated unigenes, 22,276 unigenes were assigned to Gene Ontology database. When all the annotated unigenes were searched against the Kyoto Encyclopedia of Genes and Genomes Pathway database, a total of 8,810 unigenes were mapped to 5 main categories including 260 pathways. We first obtained a lot of unigenes encoding proteins involved in abscisic acid (ABA) signaling in lettuce, including 11 ABA receptors, 94 protein phosphatase 2Cs and 16 sucrose non-fermenting 1-related protein kinases. These results will help us to better understand the molecular mechanism of seed germination, thermoinhibition of seed germination and its alleviation by SNP. PMID:26263518

  3. De novo assembly and characterization of germinating lettuce seed transcriptome using Illumina paired-end sequencing.

    PubMed

    Liu, Shu-Jun; Song, Shun-Hua; Wang, Wei-Qing; Song, Song-Quan

    2015-11-01

    At supraoptimal temperature, germination of lettuce (Lactuca sativa L.) seeds exhibits a typical germination thermoinhibition, which can be alleviated by sodium nitroprusside (SNP) in a nitric oxide-dependent manner. However, the molecular mechanism of seed germination thermoinhibition and its alleviation by SNP are poorly understood. In the present study, the lettuce seeds imbibed at optimal temperature in water or at supraoptimal temperature with or without 100 μM SNP for different periods of time were used as experimental materials, the total RNA was extracted and sequenced, we gained 147,271,347 raw reads using Illumina paired-end sequencing technique and assembled the transcriptome of germinating lettuce seeds. A total of 51,792 unigenes with a mean length of 849 nucleotides were obtained. Of these unigenes, a total of 29,542 unigenes were annotated by sequence similarity searching in four databases, NCBI non-redundant protein database, SwissProt protein database, euKaryotic Ortholog Groups database, and NCBI nucleotide database. Among the annotated unigenes, 22,276 unigenes were assigned to Gene Ontology database. When all the annotated unigenes were searched against the Kyoto Encyclopedia of Genes and Genomes Pathway database, a total of 8,810 unigenes were mapped to 5 main categories including 260 pathways. We first obtained a lot of unigenes encoding proteins involved in abscisic acid (ABA) signaling in lettuce, including 11 ABA receptors, 94 protein phosphatase 2Cs and 16 sucrose non-fermenting 1-related protein kinases. These results will help us to better understand the molecular mechanism of seed germination, thermoinhibition of seed germination and its alleviation by SNP.

  4. A Cost-Effective Approach to Sequence Hundreds of Complete Mitochondrial Genomes

    PubMed Central

    Oleksiak, Marjorie F.

    2016-01-01

    We present a cost-effective approach to sequence whole mitochondrial genomes for hundreds of individuals. Our approach uses small reaction volumes and unmodified (non-phosphorylated) barcoded adaptors to minimize reagent costs. We demonstrate our approach by sequencing 383 Fundulus sp. mitochondrial genomes (192 F. heteroclitus and 191 F. majalis). Prior to sequencing, we amplified the mitochondrial genomes using 4–5 custom-made, overlapping primer pairs, and sequencing was performed on an Illumina HiSeq 2500 platform. After removing low quality and short sequences, 2.9 million and 2.8 million reads were generated for F. heteroclitus and F. majalis respectively. Individual genomes were assembled for each species by mapping barcoded reads to a reference genome. For F. majalis, the reference genome was built de novo. On average, individual consensus sequences had high coverage: 61-fold for F. heteroclitus and 57-fold for F. majalis. The approach discussed in this paper is optimized for sequencing mitochondrial genomes on an Illumina platform. However, with the proper modifications, this approach could be easily applied to other small genomes and sequencing platforms. PMID:27505419

  5. A Cost-Effective Approach to Sequence Hundreds of Complete Mitochondrial Genomes.

    PubMed

    Nunez, Joaquin C B; Oleksiak, Marjorie F

    2016-01-01

    We present a cost-effective approach to sequence whole mitochondrial genomes for hundreds of individuals. Our approach uses small reaction volumes and unmodified (non-phosphorylated) barcoded adaptors to minimize reagent costs. We demonstrate our approach by sequencing 383 Fundulus sp. mitochondrial genomes (192 F. heteroclitus and 191 F. majalis). Prior to sequencing, we amplified the mitochondrial genomes using 4-5 custom-made, overlapping primer pairs, and sequencing was performed on an Illumina HiSeq 2500 platform. After removing low quality and short sequences, 2.9 million and 2.8 million reads were generated for F. heteroclitus and F. majalis respectively. Individual genomes were assembled for each species by mapping barcoded reads to a reference genome. For F. majalis, the reference genome was built de novo. On average, individual consensus sequences had high coverage: 61-fold for F. heteroclitus and 57-fold for F. majalis. The approach discussed in this paper is optimized for sequencing mitochondrial genomes on an Illumina platform. However, with the proper modifications, this approach could be easily applied to other small genomes and sequencing platforms. PMID:27505419

  6. A Cost-Effective Approach to Sequence Hundreds of Complete Mitochondrial Genomes.

    PubMed

    Nunez, Joaquin C B; Oleksiak, Marjorie F

    2016-01-01

    We present a cost-effective approach to sequence whole mitochondrial genomes for hundreds of individuals. Our approach uses small reaction volumes and unmodified (non-phosphorylated) barcoded adaptors to minimize reagent costs. We demonstrate our approach by sequencing 383 Fundulus sp. mitochondrial genomes (192 F. heteroclitus and 191 F. majalis). Prior to sequencing, we amplified the mitochondrial genomes using 4-5 custom-made, overlapping primer pairs, and sequencing was performed on an Illumina HiSeq 2500 platform. After removing low quality and short sequences, 2.9 million and 2.8 million reads were generated for F. heteroclitus and F. majalis respectively. Individual genomes were assembled for each species by mapping barcoded reads to a reference genome. For F. majalis, the reference genome was built de novo. On average, individual consensus sequences had high coverage: 61-fold for F. heteroclitus and 57-fold for F. majalis. The approach discussed in this paper is optimized for sequencing mitochondrial genomes on an Illumina platform. However, with the proper modifications, this approach could be easily applied to other small genomes and sequencing platforms.

  7. De Novo Sequencing and Characterization of the Transcriptome of Dwarf Polish Wheat (Triticum polonicum L.)

    PubMed Central

    Wang, Chao; Wang, Xiaolu; Peng, Fan; Wang, Ruijiao; Jiang, Yulin; Zeng, Jian; Fan, Xing; Kang, Houyang; Sha, Lina; Zhang, Haiqin; Xiao, Xue

    2016-01-01

    Construction as well as characterization of a polish wheat transcriptome is a crucial step to study useful traits of polish wheat. In this study, a transcriptome, including 76,014 unigenes, was assembled from dwarf polish wheat (DPW) roots, stems, and leaves using the software of Trinity. Among these unigenes, 61,748 (81.23%) unigenes were functionally annotated in public databases and classified into differentially functional types. Aligning this transcriptome against draft wheat genome released by the International Wheat Genome Sequencing Consortium (IWGSC), 57,331 (75.42%) unigenes, including 26,122 AB-specific and 2,622 D-specific unigenes, were mapped on A, B, and/or D genomes. Compared with the transcriptome of T. turgidum, 56,343 unigenes were matched with 103,327 unigenes of T. turgidum. Compared with the genomes of rice and barley, 14,404 and 7,007 unigenes were matched with 14,608 genes of barley and 7,708 genes of rice, respectively. On the other hand, 2,148, 1,611, and 2,707 unigenes were expressed specifically in roots, stems, and leaves, respectively. Finally, 5,531 SSR sequences were observed from 4,531 unigenes, and 518 primer pairs were designed. PMID:27429972

  8. De novo transcriptome sequencing facilitates genomic resource generation in Tinospora cordifolia.

    PubMed

    Singh, Rakesh; Kumar, Rajesh; Mahato, Ajay Kumar; Paliwal, Ritu; Singh, Amit Kumar; Kumar, Sundeep; Marla, Soma S; Kumar, Ashok; Singh, Nagendra K

    2016-09-01

    Tinospora cordifolia is known for its medicinal properties owing to the presence of useful constituents such as terpenes, glycosides, steroids, alkaloids, and flavonoids belonging to secondary metabolism origin. However, there is little information available pertaining to critical genomic elements (ESTs, molecular markers) necessary for judicious exploitation of its germplasm. We employed 454 GS-FLX pyrosequencing of entire transcripts and altogether ∼25 K assembled transcripts or Expressed sequence tags (ESTs) were identified. As the interest in T. cordifolia is primarily due to its secondary metabolite constituents, the ESTs pertaining to terpenoids biosynthetic pathway were identified in the present study. Additionally, several ESTs were assigned to different transcription factor families. To validate our transcripts dataset, the novel EST-SSR markers were generated to assess the genetic diversity among germplasm of T. cordifolia. These EST-SSR markers were found to be polymorphic and the dendrogram based on dice similarity index revealed three distinct clustering of accessions. The present study demonstrates effectiveness in using both NEWBLER and MIRA sequence read assembler software for enriching transcript-dataset and thus enables better exploitation of EST resources for mining candidate genes and designing molecular markers. PMID:27465295

  9. De Novo Sequencing and Characterization of the Transcriptome of Dwarf Polish Wheat (Triticum polonicum L.).

    PubMed

    Wang, Yi; Wang, Chao; Wang, Xiaolu; Peng, Fan; Wang, Ruijiao; Jiang, Yulin; Zeng, Jian; Fan, Xing; Kang, Houyang; Sha, Lina; Zhang, Haiqin; Xiao, Xue; Zhou, Yonghong

    2016-01-01

    Construction as well as characterization of a polish wheat transcriptome is a crucial step to study useful traits of polish wheat. In this study, a transcriptome, including 76,014 unigenes, was assembled from dwarf polish wheat (DPW) roots, stems, and leaves using the software of Trinity. Among these unigenes, 61,748 (81.23%) unigenes were functionally annotated in public databases and classified into differentially functional types. Aligning this transcriptome against draft wheat genome released by the International Wheat Genome Sequencing Consortium (IWGSC), 57,331 (75.42%) unigenes, including 26,122 AB-specific and 2,622 D-specific unigenes, were mapped on A, B, and/or D genomes. Compared with the transcriptome of T. turgidum, 56,343 unigenes were matched with 103,327 unigenes of T. turgidum. Compared with the genomes of rice and barley, 14,404 and 7,007 unigenes were matched with 14,608 genes of barley and 7,708 genes of rice, respectively. On the other hand, 2,148, 1,611, and 2,707 unigenes were expressed specifically in roots, stems, and leaves, respectively. Finally, 5,531 SSR sequences were observed from 4,531 unigenes, and 518 primer pairs were designed. PMID:27429972

  10. Whole exome sequencing identifies de novo heterozygous CAV1 mutations associated with a novel neonatal onset lipodystrophy syndrome.

    PubMed

    Garg, Abhimanyu; Kircher, Martin; Del Campo, Miguel; Amato, R Stephen; Agarwal, Anil K

    2015-08-01

    Despite remarkable progress in identifying causal genes for many types of genetic lipodystrophies in the last decade, the molecular basis of many extremely rare lipodystrophy patients with distinctive phenotypes remains unclear. We conducted whole exome sequencing of the parents and probands from six pedigrees with neonatal onset of generalized loss of subcutaneous fat with additional distinctive phenotypic features and report de novo heterozygous null mutations, c.424C>T (p.Q142*) and c.479_480delTT (p.F160*), in CAV1 in a 7-year-old male and a 3-year-old female of European origin, respectively. Both the patients had generalized fat loss, thin mottled skin and progeroid features at birth. The male patient had cataracts requiring extraction at age 30 months and the female patient had pulmonary arterial hypertension. Dermal fibroblasts of the female patient revealed negligible CAV1 immunofluorescence staining compared to control but there were no differences in the number and morphology of caveolae upon electron microscopy examination. Based upon the similarities in the clinical features of these two patients, previous reports of CAV1 mutations in patients with lipodystrophies and pulmonary hypertension, and similar features seen in CAV1 null mice, we conclude that these variants are the most likely cause of one subtype of neonatal onset generalized lipodystrophy syndrome.

  11. De novo transcriptome sequencing and gene expression profiling of spinach (Spinacia oleracea L.) leaves under heat stress

    PubMed Central

    Yan, Jun; Yu, Li; Xuan, Jiping; Lu, Ying; Lu, Shijun; Zhu, Weimin

    2016-01-01

    Spinach (Spinacia oleracea) has cold tolerant but heat sensitive characteristics. The spinach variety ‘Island,’ is suitable for summer periods. There is lack molecular information available for spinach in response to heat stress. In this study, high throughput de novo transcriptome sequencing and gene expression analyses were carried out at different spinach variety ‘Island’ leaves (grown at 24 °C (control), exposed to 35 °C for 30 min (S1), and 5 h (S2)). A total of 133,200,898 clean reads were assembled into 59,413 unigenes (average size 1259.55 bp). 33,573 unigenes could match to public databases. The DEG of controls vs S1 was 986, the DEG of control vs S2 was 1741 and the DEG of S1 vs S2 was 1587. Gene Ontology (GO) and pathway enrichment analysis indicated that a great deal of heat-responsive genes and other stress-responsive genes were identified in these DEGs, suggesting that the heat stress may have induced an extensive abiotic stress effect. Comparative transcriptome analysis found 896 unique genes in spinach heat response transcript. The expression patterns of 13 selected genes were verified by RT-qPCR (quantitative real-time PCR). Our study found a series of candidate genes and pathways that may be related to heat resistance in spinach. PMID:26857466

  12. De novo assembly, functional annotation, and marker development of Asian pear (Pyrus pyrifolia) fruit transcriptome through massively parallel sequencing.

    PubMed

    Li, J F; Gao, Z; Lou, Y S; Luo, M; Song, S R; Xu, W P; Wang, S P; Zhang, C X

    2015-01-01

    This study investigated the Asian pear transcriptome using the RNA-Seq normalized fruit cDNA library to create a transcriptomic resource for unigene and marker discovery. Following the removal of lowquality reads, 127,085,054 trimmed reads were assembled de novo to yield 37,649 non-redundant unigenes with an average length of 599 bp. Alternative splicing events were detected in 4121 contigs. A total of 30,560 single nucleotide polymorphisms (SNPs) and 7443 simple sequence repeat (SSR) makers were obtained. Approximately 21,449 (56.9%) unigenes were categorized into three gene ontology groups; 3682 (9.8%) were classified into 25 cluster of orthologous groups; and 10,451 (27.8%) were assigned to six Kyoto Encyclopedia of Genes and Genomes pathways. Differentially expressed genes were investigated using the reads per kilobase of the exon model per million reads methodology. A total of 546 unigenes showed significant differences in expression levels at different fruit developmental stages. Gene ontology categories associated with various aspects, including carbohydrate metabolic processes, transmembrane transport, and signal transduction, were enriched with genes with divergent expressions. These Pyrus pyrifolia transcriptome data provide a rich resource for the discovery and identification of new genes. Furthermore, the numerous putative SSRs and SNPs detected in this study will be important resources for the future development of a linkage map or of marker-assisted breeding programs for the Asian pear.

  13. De novo transcriptome sequencing and gene expression profiling of spinach (Spinacia oleracea L.) leaves under heat stress.

    PubMed

    Yan, Jun; Yu, Li; Xuan, Jiping; Lu, Ying; Lu, Shijun; Zhu, Weimin

    2016-01-01

    Spinach (Spinacia oleracea) has cold tolerant but heat sensitive characteristics. The spinach variety 'Island,' is suitable for summer periods. There is lack molecular information available for spinach in response to heat stress. In this study, high throughput de novo transcriptome sequencing and gene expression analyses were carried out at different spinach variety 'Island' leaves (grown at 24 °C (control), exposed to 35 °C for 30 min (S1), and 5 h (S2)). A total of 133,200,898 clean reads were assembled into 59,413 unigenes (average size 1259.55 bp). 33,573 unigenes could match to public databases. The DEG of controls vs S1 was 986, the DEG of control vs S2 was 1741 and the DEG of S1 vs S2 was 1587. Gene Ontology (GO) and pathway enrichment analysis indicated that a great deal of heat-responsive genes and other stress-responsive genes were identified in these DEGs, suggesting that the heat stress may have induced an extensive abiotic stress effect. Comparative transcriptome analysis found 896 unique genes in spinach heat response transcript. The expression patterns of 13 selected genes were verified by RT-qPCR (quantitative real-time PCR). Our study found a series of candidate genes and pathways that may be related to heat resistance in spinach.

  14. De novo sequencing, assembly and analysis of eight different transcriptomes from the Malayan pangolin.

    PubMed

    Mohamed Yusoff, Aini; Tan, Tze King; Hari, Ranjeev; Koepfli, Klaus-Peter; Wee, Wei Yee; Antunes, Agostinho; Sitam, Frankie Thomas; Rovie-Ryan, Jeffrine Japning; Karuppannan, Kayal Vizi; Wong, Guat Jah; Lipovich, Leonard; Warren, Wesley C; O'Brien, Stephen J; Choo, Siew Woh

    2016-09-13

    Pangolins are scale-covered mammals, containing eight endangered species. Maintaining pangolins in captivity is a significant challenge, in part because little is known about their genetics. Here we provide the first large-scale sequencing of the critically endangered Manis javanica transcriptomes from eight different organs using Illumina HiSeq technology, yielding ~75 Giga bases and 89,754 unigenes. We found some unigenes involved in the insect hormone biosynthesis pathway and also 747 lipids metabolism-related unigenes that may be insightful to understand the lipid metabolism system in pangolins. Comparative analysis between M. javanica and other mammals revealed many pangolin-specific genes significantly over-represented in stress-related processes, cell proliferation and external stimulus, probably reflecting the traits and adaptations of the analyzed pregnant female M. javanica. Our study provides an invaluable resource for future functional works that may be highly relevant for the conservation of pangolins.

  15. De novo sequencing, assembly and analysis of eight different transcriptomes from the Malayan pangolin.

    PubMed

    Mohamed Yusoff, Aini; Tan, Tze King; Hari, Ranjeev; Koepfli, Klaus-Peter; Wee, Wei Yee; Antunes, Agostinho; Sitam, Frankie Thomas; Rovie-Ryan, Jeffrine Japning; Karuppannan, Kayal Vizi; Wong, Guat Jah; Lipovich, Leonard; Warren, Wesley C; O'Brien, Stephen J; Choo, Siew Woh

    2016-01-01

    Pangolins are scale-covered mammals, containing eight endangered species. Maintaining pangolins in captivity is a significant challenge, in part because little is known about their genetics. Here we provide the first large-scale sequencing of the critically endangered Manis javanica transcriptomes from eight different organs using Illumina HiSeq technology, yielding ~75 Giga bases and 89,754 unigenes. We found some unigenes involved in the insect hormone biosynthesis pathway and also 747 lipids metabolism-related unigenes that may be insightful to understand the lipid metabolism system in pangolins. Comparative analysis between M. javanica and other mammals revealed many pangolin-specific genes significantly over-represented in stress-related processes, cell proliferation and external stimulus, probably reflecting the traits and adaptations of the analyzed pregnant female M. javanica. Our study provides an invaluable resource for future functional works that may be highly relevant for the conservation of pangolins. PMID:27618997

  16. De novo sequencing, assembly and analysis of eight different transcriptomes from the Malayan pangolin

    PubMed Central

    Mohamed Yusoff, Aini; Tan, Tze King; Hari, Ranjeev; Koepfli, Klaus-Peter; Wee, Wei Yee; Antunes, Agostinho; Sitam, Frankie Thomas; Rovie-Ryan, Jeffrine Japning; Karuppannan, Kayal Vizi; Wong, Guat Jah; Lipovich, Leonard; Warren, Wesley C.; O’Brien, Stephen J.; Choo, Siew Woh

    2016-01-01

    Pangolins are scale-covered mammals, containing eight endangered species. Maintaining pangolins in captivity is a significant challenge, in part because little is known about their genetics. Here we provide the first large-scale sequencing of the critically endangered Manis javanica transcriptomes from eight different organs using Illumina HiSeq technology, yielding ~75 Giga bases and 89,754 unigenes. We found some unigenes involved in the insect hormone biosynthesis pathway and also 747 lipids metabolism-related unigenes that may be insightful to understand the lipid metabolism system in pangolins. Comparative analysis between M. javanica and other mammals revealed many pangolin-specific genes significantly over-represented in stress-related processes, cell proliferation and external stimulus, probably reflecting the traits and adaptations of the analyzed pregnant female M. javanica. Our study provides an invaluable resource for future functional works that may be highly relevant for the conservation of pangolins. PMID:27618997

  17. De Novo Transcriptome Sequencing and Analysis of the Cereal Cyst Nematode, Heterodera avenae

    PubMed Central

    Kumar, Mukesh; Gantasala, Nagavara Prasad; Roychowdhury, Tanmoy; Thakur, Prasoon Kumar; Banakar, Prakash; Shukla, Rohit N.; Jones, Michael G. K.; Rao, Uma

    2014-01-01

    The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction. PMID:24802510

  18. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae), by High-Throughput Illumina Sequencing

    PubMed Central

    Wang, Jia; Xiong, Ke-Cai; Liu, Ying-Hong

    2016-01-01

    The Chinese citrus fly, Bactrocera minax (Enderlein), is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr), Swiss-Prot, Clusters of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps) genes, 26 glutathione S-transferases (GSTs) genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR) and ultraspiracle (USP), were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs) were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species. PMID:27331903

  19. De novo Transcriptome Analysis of Chinese Citrus Fly, Bactrocera minax (Diptera: Tephritidae), by High-Throughput Illumina Sequencing.

    PubMed

    Wang, Jia; Xiong, Ke-Cai; Liu, Ying-Hong

    2016-01-01

    The Chinese citrus fly, Bactrocera minax (Enderlein), is one of the most devastating pests of citrus in the temperate areas of Asia. So far, studies involving molecular biology and physiology of B. minax are still scarce, partly because of the lack of genomic information and inability to rear this insect in laboratory. In this study, de novo assembly of a transcriptome was performed using Illumina sequencing technology. A total of 20,928,907 clean reads were obtained and assembled into 33,324 unigenes, with an average length of 908.44 bp. Unigenes were annotated by alignment against NCBI non-redundant protein (Nr), Swiss-Prot, Clusters of Orthologous Groups (COG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Genes potentially involved in stress tolerance, including 20 heat shock protein (Hsps) genes, 26 glutathione S-transferases (GSTs) genes, and 2 ferritin subunit genes, were identified. These genes may play roles in stress tolerance in B. minax diapause stage. It has previously been found that 20E application on B. minax pupae could avert diapause, but the underlying mechanisms remain unknown. Thus, genes encoding enzymes in 20E biosynthesis pathway, including Neverland, Spook, Phantom, Disembodied, Shadow, Shade, and Cyp18a1, and genes encoding 20E receptor proteins, ecdysone receptor (EcR) and ultraspiracle (USP), were identified. The expression patterns of 20E-related genes among developmental stages and between 20E-treated and untreated pupae demonstrated their roles in diapause program. In addition, 1,909 simple sequence repeats (SSRs) were detected, which will contribute to molecular marker development. The findings in this study greatly improve our genetic understanding of B. minax, and lay the foundation for future studies on this species. PMID:27331903

  20. Transcriptomic Analysis of Flower Blooming in Jasminum sambac through De Novo RNA Sequencing.

    PubMed

    Li, Yong-Hua; Zhang, Wei; Li, Yong

    2015-06-10

    Flower blooming is a critical and complicated plant developmental process in flowering plants. However, insufficient information is available about the complex network that regulates flower blooming in Jasminum sambac. In this study, we used the RNA-Seq platform to analyze the molecular regulation of flower blooming in J. sambac by comparing the transcript profiles at two flower developmental stages: budding and blooming. A total of 4577 differentially-expressed genes (DEGs) were identified between the two floral stages. The Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses revealed that the DEGs in the "oxidation-reduction process", "extracellular region", "steroid biosynthesis", "glycosphingolipid biosynthesis", "plant hormone signal transduction" and "pentose and glucuronate interconversions" might be associated with flower development. A total of 103 and 92 unigenes exhibited sequence similarities to the known flower development and floral scent genes from other plants. Among these unigenes, five flower development and 19 floral scent unigenes exhibited at least four-fold differences in expression between the two stages. Our results provide abundant genetic resources for studying the flower blooming mechanisms and molecular breeding of J. sambac.

  1. De novo transcriptome sequencing of Momordica cochinchinensis to identify genes involved in the carotenoid biosynthesis.

    PubMed

    Hyun, Tae Kyung; Rim, Yeonggil; Jang, Hui-Jeong; Kim, Cheol Hong; Park, Jongsun; Kumar, Ritesh; Lee, Sunghoon; Kim, Byung Chul; Bhak, Jong; Nguyen-Quoc, Binh; Kim, Seon-Won; Lee, Sang Yeol; Kim, Jae-Yean

    2012-07-01

    The ripe fruit of Momordica cochinchinensis Spreng, known as gac, is featured by very high carotenoid content. Although this plant might be a good resource for carotenoid metabolic engineering, so far, the genes involved in the carotenoid metabolic pathways in gac were unidentified due to lack of genomic information in the public database. In order to expedite the process of gene discovery, we have undertaken Illumina deep sequencing of mRNA prepared from aril of gac fruit. From 51,446,670 high-quality reads, we obtained 81,404 assembled unigenes with average length of 388 base pairs. At the protein level, gac aril transcripts showed about 81.5% similarity with cucumber proteomes. In addition 17,104 unigenes have been assigned to specific metabolic pathways in Kyoto Encyclopedia of Genes and Genomes, and all of known enzymes involved in terpenoid backbones biosynthetic and carotenoid biosynthetic pathways were also identified in our library. To analyze the relationship between putative carotenoid biosynthesis genes and alteration of carotenoid content during fruit ripening, digital gene expression analysis was performed on three different ripening stages of aril. This study has revealed putative phytoene synthase, 15-cis-phytone desaturase, zeta-carotene desaturase, carotenoid isomerase and lycopene epsilon cyclase might be key factors for controlling carotenoid contents during aril ripening. Taken together, this study has also made availability of a large gene database. This unique information for gac gene discovery would be helpful to facilitate functional studies for improving carotenoid quantities. PMID:22580955

  2. Motor Sequence Learning and Consolidation in Unilateral De Novo Patients with Parkinson’s Disease

    PubMed Central

    Doyon, Julien; Chan, Piu

    2015-01-01

    Previous research investigating motor sequence learning (MSL) and consolidation in patients with Parkinson’s disease (PD) has predominantly included heterogeneous participant samples with early and advanced disease stages; thus, little is known about the onset of potential behavioral impairments. We employed a multisession MSL paradigm to investigate whether behavioral deficits in learning and consolidation appear immediately after or prior to the detection of clinical symptoms in the tested (left) hand. Specifically, our patient sample was limited to recently diagnosed patients with pure unilateral PD. The left hand symptomatic (LH-S) patients provided an assessment of performance following the onset of clinical symptoms in the tested hand. Conversely, right hand affected (left hand asymptomatic, LH-A) patients served to investigate whether MSL impairments appear before symptoms in the tested hand. LH-S patients demonstrated impaired learning during the initial training session and both LH-S and LH-A patients demonstrated decreased performance compared to controls during the next-day retest. Critically, the impairments in later learning stages in the LH-A patients were evident even before the appearance of traditional clinical symptoms in the tested hand. Results may be explained by the progression of disease-related alterations in relevant corticostriatal networks. PMID:26222151

  3. De novo transcriptome sequencing of Momordica cochinchinensis to identify genes involved in the carotenoid biosynthesis.

    PubMed

    Hyun, Tae Kyung; Rim, Yeonggil; Jang, Hui-Jeong; Kim, Cheol Hong; Park, Jongsun; Kumar, Ritesh; Lee, Sunghoon; Kim, Byung Chul; Bhak, Jong; Nguyen-Quoc, Binh; Kim, Seon-Won; Lee, Sang Yeol; Kim, Jae-Yean

    2012-07-01

    The ripe fruit of Momordica cochinchinensis Spreng, known as gac, is featured by very high carotenoid content. Although this plant might be a good resource for carotenoid metabolic engineering, so far, the genes involved in the carotenoid metabolic pathways in gac were unidentified due to lack of genomic information in the public database. In order to expedite the process of gene discovery, we have undertaken Illumina deep sequencing of mRNA prepared from aril of gac fruit. From 51,446,670 high-quality reads, we obtained 81,404 assembled unigenes with average length of 388 base pairs. At the protein level, gac aril transcripts showed about 81.5% similarity with cucumber proteomes. In addition 17,104 unigenes have been assigned to specific metabolic pathways in Kyoto Encyclopedia of Genes and Genomes, and all of known enzymes involved in terpenoid backbones biosynthetic and carotenoid biosynthetic pathways were also identified in our library. To analyze the relationship between putative carotenoid biosynthesis genes and alteration of carotenoid content during fruit ripening, digital gene expression analysis was performed on three different ripening stages of aril. This study has revealed putative phytoene synthase, 15-cis-phytone desaturase, zeta-carotene desaturase, carotenoid isomerase and lycopene epsilon cyclase might be key factors for controlling carotenoid contents during aril ripening. Taken together, this study has also made availability of a large gene database. This unique information for gac gene discovery would be helpful to facilitate functional studies for improving carotenoid quantities.

  4. Sequencing, De Novo Assembly and Annotation of the Colorado Potato Beetle, Leptinotarsa decemlineata, Transcriptome

    PubMed Central

    Kumar, Abhishek; Congiu, Leonardo; Lindström, Leena; Piiroinen, Saija; Vidotto, Michele; Grapputo, Alessandro

    2014-01-01

    Background The Colorado potato beetle (Leptinotarsa decemlineata) is a major pest and a serious threat to potato cultivation throughout the northern hemisphere. Despite its high importance for invasion biology, phenology and pest management, little is known about L. decemlineata from a genomic perspective. We subjected European L. decemlineata adult and larval transcriptome samples to 454-FLX massively-parallel DNA sequencing to characterize a basal set of genes from this species. We created a combined assembly of the adult and larval datasets including the publicly available midgut larval Roche 454 reads and provided basic annotation. We were particularly interested in diapause-specific genes and genes involved in pesticide and Bacillus thuringiensis (Bt) resistance. Results Using 454-FLX pyrosequencing, we obtained a total of 898,048 reads which, together with the publicly available 804,056 midgut larval reads, were assembled into 121,912 contigs. We established a repository of genes of interest, with 101 out of the 108 diapause-specific genes described in Drosophila montana; and 621 contigs involved in insecticide resistance, including 221 CYP450, 45 GSTs, 13 catalases, 15 superoxide dismutases, 22 glutathione peroxidases, 194 esterases, 3 ADAM metalloproteases, 10 cadherins and 98 calmodulins. We found 460 putative miRNAs and we predicted a significant number of single nucleotide polymorphisms (29,205) and microsatellite loci (17,284). Conclusions This report of the assembly and annotation of the transcriptome of L. decemlineata offers new insights into diapause-associated and insecticide-resistance-associated genes in this species and provides a foundation for comparative studies with other species of insects. The data will also open new avenues for researchers using L. decemlineata as a model species, and for pest management research. Our results provide the basis for performing future gene expression and functional analysis in L. decemlineata and improve our

  5. De novo transcriptome sequencing analysis and comparison of differentially expressed genes (DEGs) in Macrobrachium rosenbergii in China.

    PubMed

    Nguyen Thanh, Hai; Zhao, Liangjie; Liu, Qigen

    2014-01-01

    Giant freshwater prawn (GFP; Macrobrachium rosenbergii) is an exotic species that was introduced into China in 1976 and thereafter it became a major species in freshwater aquaculture. However the gene discovery in this species has been limited to small-scale data collection in China. We used the next generation sequencing technology for the experiment; the transcriptome was sequenced of samples of hepatopancreas organ in individuals from 4 GFP groups (A1, A2, B1 and B2). De novo transcriptome sequencing generated 66,953 isogenes. Using BLASTX to search the Non-redundant (NR), Search Tool for the Retrieval of Interacting Genes (STRING), and Kyoto Encyclopedia of Genes and Genome (KEGG) databases; 21,224 unigenes were annotated, 9,552 matched unigenes with the Gene Ontology (GO) classification; 5,782 matched unigenes in 25 categories of Clusters of Orthologous Groups of proteins (COG) and 20,859 unigenes were consequently assigned to 312 KEGG pathways. Between the A and B groups 147 differentially expressed genes (DEGs) were identified; between the A1 and A2 groups 6,860 DEGs were identified and between the B1 and B2 groups 5,229 DEGs were identified. After enrichment, the A and B groups identified 38 DEGs, but none of them were significantly enriched. The A1 and A2 groups identified 21,856 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function and the KEGG pathway defined 2,459 genes had a KEGG Ortholog-ID (KO-ID) and could be categorized into 251 pathways, of those, 9 pathways were significantly enriched. The B1 and B2 groups identified 5,940 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function, and the KEGG pathway defined 1,543 genes had a KO-ID and could be categorized into 240 pathways, of those, 2 pathways were significantly enriched. We investigated 99 queries (GO) which related to growth of GFP in 4 groups. After enrichment we

  6. De Novo Transcriptome Sequencing Analysis and Comparison of Differentially Expressed Genes (DEGs) in Macrobrachium rosenbergii in China

    PubMed Central

    Liu, Qigen

    2014-01-01

    Giant freshwater prawn (GFP; Macrobrachium rosenbergii) is an exotic species that was introduced into China in 1976 and thereafter it became a major species in freshwater aquaculture. However the gene discovery in this species has been limited to small-scale data collection in China. We used the next generation sequencing technology for the experiment; the transcriptome was sequenced of samples of hepatopancreas organ in individuals from 4 GFP groups (A1, A2, B1 and B2). De novo transcriptome sequencing generated 66,953 isogenes. Using BLASTX to search the Non-redundant (NR), Search Tool for the Retrieval of Interacting Genes (STRING), and Kyoto Encyclopedia of Genes and Genome (KEGG) databases; 21,224 unigenes were annotated, 9,552 matched unigenes with the Gene Ontology (GO) classification; 5,782 matched unigenes in 25 categories of Clusters of Orthologous Groups of proteins (COG) and 20,859 unigenes were consequently assigned to 312 KEGG pathways. Between the A and B groups 147 differentially expressed genes (DEGs) were identified; between the A1 and A2 groups 6,860 DEGs were identified and between the B1 and B2 groups 5,229 DEGs were identified. After enrichment, the A and B groups identified 38 DEGs, but none of them were significantly enriched. The A1 and A2 groups identified 21,856 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function and the KEGG pathway defined 2,459 genes had a KEGG Ortholog - ID (KO-ID) and could be categorized into 251 pathways, of those, 9 pathways were significantly enriched. The B1 and B2 groups identified 5,940 DEGs in three main categories based on functional groups: biological process, cellular_component and molecular function, and the KEGG pathway defined 1,543 genes had a KO-ID and could be categorized into 240 pathways, of those, 2 pathways were significantly enriched. We investigated 99 queries (GO) which related to growth of GFP in 4 groups. After enrichment we

  7. De novo transcriptome sequencing in Bixa orellana to identify genes involved in methylerythritol phosphate, carotenoid and bixin biosynthesis

    DOE PAGES

    Cárdenas-Conejo, Yair; Carballo-Uicab, Víctor; Lieberman, Meric; Aguilar-Espinosa, Margarita; Comai, Luca; Rivera-Madrid, Renata

    2015-10-28

    Bixin or annatto is a commercially important natural orange-red pigment derived from lycopene that is produced and stored in seeds of Bixa orellana L. An enzymatic pathway for bixin biosynthesis was inferred from homology of putative proteins encoded by differentially expressed seed cDNAs. Some activities were later validated in a heterologous system. Nevertheless, much of the pathway remains to be clarified. For example, it is essential to identify the methylerythritol phosphate (MEP) and carotenoid pathways genes. In order to investigate the MEP, carotenoid, and bixin pathways genes, total RNA from young leaves and two different developmental stages of seeds frommore » B. orellana were used for the construction of indexed mRNA libraries, sequenced on the Illumina HiSeq 2500 platform and assembled de novo using Velvet, CLC Genomics Workbench and CAP3 software. A total of 52,549 contigs were obtained with average length of 1,924 bp. Two phylogenetic analyses of inferred proteins, in one case encoded by thirteen general, single-copy cDNAs, in the other from carotenoid and MEP cDNAs, indicated that B. orellana is closely related to sister Malvales species cacao and cotton. Using homology, we identified 7 and 14 core gene products from the MEP and carotenoid pathways, respectively. Surprisingly, previously defined bixin pathway cDNAs were not present in our transcriptome. Here we propose a new set of gene products involved in bixin pathway. In conclusion, the identification and qRT-PCR quantification of cDNAs involved in annatto production suggest a hypothetical model for bixin biosynthesis that involve coordinated activation of some MEP, carotenoid and bixin pathway genes. These findings provide a better understanding of the mechanisms regulating these pathways and will facilitate the genetic improvement of B. orellana.« less

  8. Synthesis of Several Cleistrioside and Cleistetroside Natural Products via a Divergent De Novo Asymmetric Approach

    PubMed Central

    Wu, Bulan; Li, Miaosheng; O’Doherty, George A.

    2010-01-01

    The de novo asymmetric syntheses of several partially acylated dodecanyl tri- and tetra-rhamnoside natural products (cleistriosides-5 & 6 and cleistetrosides-2 to 7) have been achieved (19 to 24 steps). The divergent route requires the use of three or less protecting groups. The asymmetry was derived via Noyori reduction of an acylfuran. The rhamno-stereochemistry was installed by a diastereoselective palladium-catalyzed glycosylation, ketone reduction and dihydroxylation. PMID:21038879

  9. From the periphery to centre stage: de novo single nucleotide variants play a key role in human genetic disease.

    PubMed

    Ku, Chee-Seng; Tan, Eng King; Cooper, David N

    2013-04-01

    Human germline mutations arise anew during meiosis in every generation. Such spontaneously occurring genetic variants are termed de novo mutations. Although the introduction of microarray based approaches led to the discovery of numerous de novo copy number variants underlying a range of human genetic conditions, de novo single nucleotide variants (SNVs) remained refractory to analysis at the whole genome level until the advent of next generation sequencing technologies such as whole genome sequencing and whole exome sequencing. These approaches have recently allowed the estimation of the mutation rate of de novo SNVs and greatly increased our understanding of their contribution to human genetic disease. Indeed, de novo SNVs have been found to underlie various common human neurodevelopmental conditions such as schizophrenia, autism and intellectual disability, as well as sporadic cases of rare Mendelian disorders. In many cases, however, confirmation of the pathogenicity of identified de novo SNVs remains a major challenge. PMID:23396985

  10. Frequency and Complexity of De Novo Structural Mutation in Autism

    PubMed Central

    Brandler, William M.; Antaki, Danny; Gujral, Madhusudan; Noor, Amina; Rosanio, Gabriel; Chapman, Timothy R.; Barrera, Daniel J.; Lin, Guan Ning; Malhotra, Dheeraj; Watts, Amanda C.; Wong, Lawrence C.; Estabillo, Jasper A.; Gadomski, Therese E.; Hong, Oanh; Fajardo, Karin V. Fuentes; Bhandari, Abhishek; Owen, Renius; Baughn, Michael; Yuan, Jeffrey; Solomon, Terry; Moyzis, Alexandra G.; Maile, Michelle S.; Sanders, Stephan J.; Reiner, Gail E.; Vaux, Keith K.; Strom, Charles M.; Zhang, Kang; Muotri, Alysson R.; Akshoomoff, Natacha; Leal, Suzanne M.; Pierce, Karen; Courchesne, Eric; Iakoucheva, Lilia M.; Corsello, Christina; Sebat, Jonathan

    2016-01-01

    Genetic studies of autism spectrum disorder (ASD) have established that de novo duplications and deletions contribute to risk. However, ascertainment of structural variants (SVs) has been restricted by the coarse resolution of current approaches. By applying a custom pipeline for SV discovery, genotyping, and de novo assembly to genome sequencing of 235 subjects (71 affected individuals, 26 healthy siblings, and their parents), we compiled an atlas of 29,719 SV loci (5,213/genome), comprising 11 different classes. We found a high diversity of de novo mutations, the majority of which were undetectable by previous methods. In addition, we observed complex mutation clusters where combinations of de novo SVs, nucleotide substitutions, and indels occurred as a single event. We estimate a high rate of structural mutation in humans (20%) and propose that genetic risk for ASD is attributable to an elevated frequency of gene-disrupting de novo SVs, but not an elevated rate of genome rearrangement. PMID:27018473

  11. Genetic variation and the de novo assembly of human genomes

    PubMed Central

    Chaisson, Mark J. P.; Wilson, Richard K.; Eichler, Evan E.

    2016-01-01

    The discovery of genetic variation and the assembly of genome sequences are both inextricably linked to advances in DNA-sequencing technology. Short-read massively parallel sequencing has revolutionized our ability to discover genetic variation but is insufficient to generate high-quality genome assemblies or resolve most structural variation. Full resolution of variation is only guaranteed by complete de novo assembly of a genome. Here, we review approaches to genome assembly, the nature of gaps or missing sequences, and biases in the assembly process. We describe the challenges of generating a complete de novo genome assembly using current technologies and the impact that being able to perfectly sequence the genome would have on understanding human disease and evolution. Finally, we summarize recent technological advances that improve both contiguity and accuracy and emphasize the importance of complete de novo assembly as opposed to read mapping as the primary means to understanding the full range of human genetic variation. PMID:26442640

  12. Large Scale Discovery and De Novo-Assisted Sequencing of Cationic Antimicrobial Peptides (CAMPs) by Microparticle Capture and Electron-Transfer Dissociation (ETD) Mass Spectrometry.

    PubMed

    Juba, Melanie L; Russo, Paul S; Devine, Megan; Barksdale, Stephanie; Rodriguez, Carlos; Vliet, Kent A; Schnur, Joel M; van Hoek, Monique L; Bishop, Barney M

    2015-10-01

    The identification and sequencing of novel cationic antimicrobial peptides (CAMPs) have proven challenging due to the limitations associated with traditional proteomics methods and difficulties sequencing peptides present in complex biomolecular mixtures. We present here a process for large-scale identification and de novo-assisted sequencing of newly discovered CAMPs using microparticle capture followed by tandem mass spectrometry equipped with electron-transfer dissociation (ETD). This process was initially evaluated and verified using known CAMPs with varying physicochemical properties. The effective parameters were then applied in the analysis of a complex mixture of peptides harvested from American alligator plasma using custom-made (Bioprospector) functionalized hydrogel particles. Here, we report the successful sequencing process for CAMPs that has led to the identification of 340 unique peptides and the discovery of five novel CAMPs from American alligator plasma. PMID:26327436

  13. Large Scale Discovery and De Novo-Assisted Sequencing of Cationic Antimicrobial Peptides (CAMPs) by Microparticle Capture and Electron-Transfer Dissociation (ETD) Mass Spectrometry.

    PubMed

    Juba, Melanie L; Russo, Paul S; Devine, Megan; Barksdale, Stephanie; Rodriguez, Carlos; Vliet, Kent A; Schnur, Joel M; van Hoek, Monique L; Bishop, Barney M

    2015-10-01

    The identification and sequencing of novel cationic antimicrobial peptides (CAMPs) have proven challenging due to the limitations associated with traditional proteomics methods and difficulties sequencing peptides present in complex biomolecular mixtures. We present here a process for large-scale identification and de novo-assisted sequencing of newly discovered CAMPs using microparticle capture followed by tandem mass spectrometry equipped with electron-transfer dissociation (ETD). This process was initially evaluated and verified using known CAMPs with varying physicochemical properties. The effective parameters were then applied in the analysis of a complex mixture of peptides harvested from American alligator plasma using custom-made (Bioprospector) functionalized hydrogel particles. Here, we report the successful sequencing process for CAMPs that has led to the identification of 340 unique peptides and the discovery of five novel CAMPs from American alligator plasma.

  14. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing

    PubMed Central

    Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C.; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird

    2016-01-01

    The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. PMID:27345719

  15. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing.

    PubMed

    Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird

    2016-08-01

    The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90-99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. PMID:27345719

  16. Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios

    PubMed Central

    Besenbacher, Søren; Liu, Siyang; Izarzugaza, José M. G.; Grove, Jakob; Belling, Kirstine; Bork-Jensen, Jette; Huang, Shujia; Als, Thomas D.; Li, Shengting; Yadav, Rachita; Rubio-García, Arcadio; Lescai, Francesco; Demontis, Ditte; Rao, Junhua; Ye, Weijian; Mailund, Thomas; Friborg, Rune M.; Pedersen, Christian N. S.; Xu, Ruiqi; Sun, Jihua; Liu, Hao; Wang, Ou; Cheng, Xiaofang; Flores, David; Rydza, Emil; Rapacki, Kristoffer; Damm Sørensen, John; Chmura, Piotr; Westergaard, David; Dworzynski, Piotr; Sørensen, Thorkild I. A.; Lund, Ole; Hansen, Torben; Xu, Xun; Li, Ning; Bolund, Lars; Pedersen, Oluf; Eiberg, Hans; Krogh, Anders; Børglum, Anders D.; Brunak, Søren; Kristiansen, Karsten; Schierup, Mikkel H.; Wang, Jun; Gupta, Ramneek; Villesen, Palle; Rasmussen, Simon

    2015-01-01

    Building a population-specific catalogue of single nucleotide variants (SNVs), indels and structural variants (SVs) with frequencies, termed a national pan-genome, is critical for further advancing clinical and public health genetics in large cohorts. Here we report a Danish pan-genome obtained from sequencing 10 trios to high depth (50 × ). We report 536k novel SNVs and 283k novel short indels from mapping approaches and develop a population-wide de novo assembly approach to identify 132k novel indels larger than 10 nucleotides with low false discovery rates. We identify a higher proportion of indels and SVs than previous efforts showing the merits of high coverage and de novo assembly approaches. In addition, we use trio information to identify de novo mutations and use a probabilistic method to provide direct estimates of 1.27e−8 and 1.5e−9 per nucleotide per generation for SNVs and indels, respectively. PMID:25597990

  17. Massively parallel sequencing approaches for characterization of structural variation.

    PubMed

    Koboldt, Daniel C; Larson, David E; Chen, Ken; Ding, Li; Wilson, Richard K

    2012-01-01

    The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.

  18. De novo Transcriptome Sequencing and Development of Abscission Zone-Specific Microarray as a New Molecular Tool for Analysis of Tomato Organ Abscission

    PubMed Central

    Sundaresan, Srivignesh; Philosoph-Hadas, Sonia; Riov, Joseph; Mugasimangalam, Raja; Kuravadi, Nagesh A.; Kochanek, Bettina; Salim, Shoshana; Tucker, Mark L.; Meir, Shimon

    2016-01-01

    Abscission of flower pedicels and leaf petioles of tomato (Solanum lycopersicum) can be induced by flower removal or leaf deblading, respectively, which leads to auxin depletion, resulting in increased sensitivity of the abscission zone (AZ) to ethylene. However, the molecular mechanisms that drive the acquisition of abscission competence and its modulation by auxin gradients are not yet known. We used RNA-Sequencing (RNA-Seq) to obtain a comprehensive transcriptome of tomato flower AZ (FAZ) and leaf AZ (LAZ) during abscission. RNA-Seq was performed on a pool of total RNA extracted from tomato FAZ and LAZ, at different abscission stages, followed by de novo assembly. The assembled clusters contained transcripts that are already known in the Solanaceae (SOL) genomics and NCBI databases, and over 8823 identified novel tomato transcripts of varying sizes. An AZ-specific microarray, encompassing the novel transcripts identified in this study and all known transcripts from the SOL genomics and NCBI databases, was constructed to study the abscission process. Multiple probes for longer genes and key AZ-specific genes, including antisense probes for all transcripts, make this array a unique tool for studying abscission with a comprehensive set of transcripts, and for mining for naturally occurring antisense transcripts. We focused on comparing the global transcriptomes generated from the FAZ and the LAZ to establish the divergences and similarities in their transcriptional networks, and particularly to characterize the processes and transcriptional regulators enriched in gene clusters that are differentially regulated in these two AZs. This study is the first attempt to analyze the global gene expression in different AZs in tomato by combining the RNA-Seq technique with oligonucleotide microarrays. Our AZ-specific microarray chip provides a cost-effective approach for expression profiling and robust analysis of multiple samples in a rapid succession. PMID:26834766

  19. Differential 14N/15N-Labeling of Peptides Using N-Terminal Charge Derivatization with a High-Proton Affinity for Straightforward de novo Peptide Sequencing

    PubMed Central

    Nihashi, Yoichiro; Miyashita, Masahiro; Awane, Hiroyuki; Miyagawa, Hisashi

    2013-01-01

    While de novo peptide sequencing is essential in many situations, it remains a difficult task. This is because peptide fragmentation results in complicated and often incomplete product ion spectra. In a previous study, we demonstrated that N-terminal charge derivatization with 4-amidinobenzoic acid (Aba) resulted in improved peptide fragmentation under low-energy CID conditions. However, even with this derivatization, some ambiguity exists, due to difficulties in discriminating between N- and C-terminal fragments. In this study, to specifically identify b-ions from complex product ion spectra, the differential 14N/15N-labeling of peptides was performed using Aba derivatization. 15N-Labeled Aba was synthesized in the form of a succinimide ester. Peptides were derivatized individually with 14N-Aba or 15N-Aba and analyzed by ESI-MS/MS using a linear ion trap-Orbitrap hybrid FTMS system. The N-terminal fragments (i.e., b-ions) were then identified based on m/z differences arising from isotope labeling. By comparing the spectra between 14N- and 15N-Aba derivatized peptides, b-ions could be successfully identified based on the m/z shifts, which provided reliable sequencing results for all of the peptides examined in this study. The method developed in this study allows the easy and reliable de novo sequencing of peptides, which is useful in peptidomics and proteomics studies. PMID:24860714

  20. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation

    PubMed Central

    McLysaght, Aoife; Guerzoni, Daniele

    2015-01-01

    The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. PMID:26323763

  1. in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding it...

  2. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development

    PubMed Central

    Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. ‘Rehmannii’ using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia. PMID:27635342

  3. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development.

    PubMed

    Wei, Zunzheng; Sun, Zhenzhen; Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian; Zhou, Di

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. 'Rehmannii' using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia. PMID:27635342

  4. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development.

    PubMed

    Wei, Zunzheng; Sun, Zhenzhen; Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian; Zhou, Di

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. 'Rehmannii' using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia.

  5. Transcriptome analysis of colored calla lily (Zantedeschia rehmannii Engl.) by Illumina sequencing: de novo assembly, annotation and EST-SSR marker development

    PubMed Central

    Cui, Binbin; Zhang, Qixiang; Xiong, Min; Wang, Xian

    2016-01-01

    Colored calla lily is the short name for the species or hybrids in section Aestivae of genus Zantedeschia. It is currently one of the most popular flower plants in the world due to its beautiful flower spathe and long postharvest life. However, little genomic information and few molecular markers are available for its genetic improvement. Here, de novo transcriptome sequencing was performed to produce large transcript sequences for Z. rehmannii cv. ‘Rehmannii’ using an Illumina HiSeq 2000 instrument. More than 59.9 million cDNA sequence reads were obtained and assembled into 39,298 unigenes with an average length of 1,038 bp. Among these, 21,077 unigenes showed significant similarity to protein sequences in the non-redundant protein database (Nr) and in the Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, a total of 117 unique transcripts were then defined that might regulate the flower spathe development of colored calla lily. Additionally, 9,933 simple sequence repeats (SSRs) and 7,162 single nucleotide polymorphisms (SNPs) were identified as putative molecular markers. High-quality primers for 200 SSR loci were designed and selected, of which 58 amplified reproducible amplicons were polymorphic among 21 accessions of colored calla lily. The sequence information and molecular markers in the present study will provide valuable resources for genetic diversity analysis, germplasm characterization and marker-assisted selection in the genus Zantedeschia.

  6. Identification of novel and useful EST-SSR markers from de novo transcriptome sequence of wheat (Triticum aestivum L.).

    PubMed

    Yang, Z J; Peng, Z S; Yang, H

    2016-01-01

    Simple sequence repeats (SSRs) are highly informative, polymorphic, and co-dominant Mendelian markers that provide an important genomic resource for genetic research. Recently, the use of large-scale transcriptome sequence has become a reliable and efficient approach for the identification and development of new EST-SSR markers. In this study, 8389 potential SSRs with a minimum of five repetitions for all motifs were identified from 121,210 unigenes. Gene ontology analysis indicated that the unigenes containing SSR loci participate in various biological processes of regulation, growth, development, metabolism, and apoptosis in wheat. As in many other plants, trinucleotide repeats were found to be the most abundant repeat units with a frequency of 62.33%. A subset of 300 EST-SSRs was randomly selected for the applicability of EST-SSRs to be evaluated. Of the 300 primer pairs tested, 177 (59%) yielded unambiguous PCR products among five wheat cultivars. Using the Chinese Spring nulli-tetrasomic line, 131 of the 177 EST-SSR primer pairs yielded products and 178 loci were found to be located on all the 21 wheat chromosomes. These findings suggest that the novel EST-SSR markers, as a basis for future genetic linkage and gene tagging analysis, are a valuable tool for genetic mapping, marker assisted selection, and comparative genome analysis. PMID:26909990

  7. Identification of novel and useful EST-SSR markers from de novo transcriptome sequence of wheat (Triticum aestivum L.).

    PubMed

    Yang, Z J; Peng, Z S; Yang, H

    2016-02-19

    Simple sequence repeats (SSRs) are highly informative, polymorphic, and co-dominant Mendelian markers that provide an important genomic resource for genetic research. Recently, the use of large-scale transcriptome sequence has become a reliable and efficient approach for the identification and development of new EST-SSR markers. In this study, 8389 potential SSRs with a minimum of five repetitions for all motifs were identified from 121,210 unigenes. Gene ontology analysis indicated that the unigenes containing SSR loci participate in various biological processes of regulation, growth, development, metabolism, and apoptosis in wheat. As in many other plants, trinucleotide repeats were found to be the most abundant repeat units with a frequency of 62.33%. A subset of 300 EST-SSRs was randomly selected for the applicability of EST-SSRs to be evaluated. Of the 300 primer pairs tested, 177 (59%) yielded unambiguous PCR products among five wheat cultivars. Using the Chinese Spring nulli-tetrasomic line, 131 of the 177 EST-SSR primer pairs yielded products and 178 loci were found to be located on all the 21 wheat chromosomes. These findings suggest that the novel EST-SSR markers, as a basis for future genetic linkage and gene tagging analysis, are a valuable tool for genetic mapping, marker assisted selection, and comparative genome analysis.

  8. "De-novo" amino acid sequence elucidation of protein G'e by combined "Top-Down" and "Bottom-Up" mass spectrometry

    NASA Astrophysics Data System (ADS)

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F. M.; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L.; Glocker, Michael O.

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein Ǵ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α- N-gluconoylation and α- N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α- N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant ( K d ) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  9. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins. PMID:25560987

  10. Exome sequencing identifies de novo gain of function missense mutation in KCND2 in identical twins with autism and seizures that slows potassium channel inactivation.

    PubMed

    Lee, Hane; Lin, Meng-chin A; Kornblum, Harley I; Papazian, Diane M; Nelson, Stanley F

    2014-07-01

    Numerous studies and case reports show comorbidity of autism and epilepsy, suggesting some common molecular underpinnings of the two phenotypes. However, the relationship between the two, on the molecular level, remains unclear. Here, whole exome sequencing was performed on a family with identical twins affected with autism and severe, intractable seizures. A de novo variant was identified in the KCND2 gene, which encodes the Kv4.2 potassium channel. Kv4.2 is a major pore-forming subunit in somatodendritic subthreshold A-type potassium current (ISA) channels. The de novo mutation p.Val404Met is novel and occurs at a highly conserved residue within the C-terminal end of the transmembrane helix S6 region of the ion permeation pathway. Functional analysis revealed the likely pathogenicity of the variant in that the p.Val404Met mutant construct showed significantly slowed inactivation, either by itself or after equimolar coexpression with the wild-type Kv4.2 channel construct consistent with a dominant effect. Further, the effect of the mutation on closed-state inactivation was evident in the presence of auxiliary subunits that associate with Kv4 subunits to form ISA channels in vivo. Discovery of a functionally relevant novel de novo variant, coupled with physiological evidence that the mutant protein disrupts potassium current inactivation, strongly supports KCND2 as the causal gene for epilepsy in this family. Interaction of KCND2 with other genes implicated in autism and the role of KCND2 in synaptic plasticity provide suggestive evidence of an etiological role in autism. PMID:24501278

  11. An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis

    PubMed Central

    Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang

    2013-01-01

    Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer

  12. De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.)

    PubMed Central

    2012-01-01

    Background In rubber tree, bark is one of important agricultural and biological organs. However, the molecular mechanism involved in the bark formation and development in rubber tree remains largely unknown, which is at least partially due to lack of bark transcriptomic and genomic information. Therefore, it is necessary to carried out high-throughput transcriptome sequencing of rubber tree bark to generate enormous transcript sequences for the functional characterization and molecular marker development. Results In this study, more than 30 million sequencing reads were generated using Illumina paired-end sequencing technology. In total, 22,756 unigenes with an average length of 485 bp were obtained with de novo assembly. The similarity search indicated that 16,520 and 12,558 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 6,867 and 5,559 unigenes were separately assigned to Gene Ontology (GO) and Clusters of Orthologous Group (COG). When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database, 12,097 unigenes were assigned to 5 main categories including 123 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (9,043, 74.75%), suggesting the active metabolic processes in rubber tree bark. In addition, a total of 39,257 EST-SSRs were identified from 22,756 unigenes, and the characterizations of EST-SSRs were further analyzed in rubber tree. 110 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among 13 Hevea germplasms, PCR success rate and polymorphism rate of 110 markers were separately 96.36% and 55.45% in this study. Conclusion By assembling and analyzing de novo transcriptome sequencing data, we reported the comprehensive functional characterization of rubber tree bark. This research generated a substantial fraction

  13. Evaluation of Methods for de novo Genome assembly from High-throughput Sequencing Reads Reveals Dependencies that Affect the Quality of the Results

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...

  14. Identification and de novo sequencing of housekeeping genes appropriate for gene expression analyses in farmed maraena whitefish (Coregonus maraena) during crowding stress.

    PubMed

    Altmann, Simone; Rebl, Alexander; Kühn, Carsten; Goldammer, Tom

    2015-04-01

    Maraena whitefish (Coregonus maraena; synonym Coregonus lavaretus f. balticus) is a high-quality food fish in the Southern Baltic Sea belonging to the group of salmonid fishes. Coregonus sp. is successfully kept in aquaculture throughout northern Europe (e.g. in Finland, Germany, Russia) and North America. In this regard, the molecular and immunological characterisation of stress response in maraena whitefish contributes to the development of robust and fast-growing maraena whitefish breeding strains for aquaculture. Thus, in the present study, the potential housekeeping genes beta actin (ACTB), elongation factor 1 alpha (EEF1A1), glyceraldehydes-3-phosphate dehydrogenase (GAPDH), ribosomal protein 9 (RPL9), ribosomal protein 32 (RPL32) and ribosomal protein S20 (RPS20) were de novo sequenced and tested concerning their applicability as reference genes in quantitative real-time PCR (qPCR) in maraena whitefish under different stocking densities. For this purpose, tissue samples of liver, kidney, gills, head kidney, skin, adipose tissue, heart and dorsal fin were investigated. qPCR data were analysed with Normfinder tool to determine gene expression stability. DNA sequencing exposed transcribed paralogous EEF1A1A and EEF1A1B genes differing in their putative protein structure. Normfinder analysis revealed RPL9 and RPL32 as most stable, GAPDH and ACTB as least stable genes for qPCR analyses, respectively. This is the first study that provides a subset of seven de novo sequenced housekeeping genes usable as reference genes in studies of stress response in maraena whitefish.

  15. Monosaccharide identification as a first step toward de novo carbohydrate sequencing: mass spectrometry strategy for the identification and differentiation of diastereomeric and enantiomeric pentose isomers.

    PubMed

    Nagy, Gabe; Pohl, Nicola L B

    2015-04-21

    De novo carbohydrate sequencing, including monosaccharide identification, largely remains a tremendous analytical challenge. A first step in the complete structural determination of any large polysaccharide is an accurate and robust method for analysis of the constituent monosaccharides. Herein, the first mass spectrometry-based method for the complete identification and absolute configuration determination of all 12 pentose isomers, including the d and l enantiomers for arabinose, lyxose, ribose, xylose, ribulose, and xylulose, is reported. As compared to earlier work to distinguish hexose isomers, the chiral separation of the pentose isomers was significantly more challenging. Specifically, the 12 pentoses are much more structurally similar to one another, with only the axial or equatorial orientation of two hydroxyl groups differentiating among these isomers in their five-membered ring furanose structure and smaller energetic differences between pentose conformations than between hexose conformations. Despite such inherently minimal energetic differences between the 12 pentoses, two unique fixed ligand kinetic method combinations were discovered to achieve chiral discrimination for this set of isomers. This assay can be readily applied to the identification of any isolated pentose monosaccharide using only microgram quantities and a commercial instrument and complements the method to distinguish hexose isomers. A workflow that incorporates this mass spectrometry-based method and thereby could achieve complete de novo identification of all monosaccharide building blocks in an oligo- or polysaccharide is proposed.

  16. De novo sequencing of Astyanax mexicanus surface fish and Pachón cavefish transcriptomes reveals enrichment of mutations in cavefish putative eye genes.

    PubMed

    Hinaux, Hélène; Poulain, Julie; Da Silva, Corinne; Noirot, Céline; Jeffery, William R; Casane, Didier; Rétaux, Sylvie

    2013-01-01

    Astyanax mexicanus, a teleost species with surface dwelling (surface fish) and cave adapted (cavefish) morphs, is an important model system in evolutionary developmental biology (evodevo). Astyanax cavefish differ from surface fish in numerous traits, including the enhancement of non-visual sensory systems, and the loss of eyes and pigmentation. The genetic bases for these differences are not fully understood as genomic and transcriptomic data are lacking. We here present de novo transcriptome sequencing of embryonic and larval stages of a surface fish population and a cavefish population originating from the Pachón cave using the Sanger method. This effort represents the first large scale sequence and clone resource for the Astyanax research community. The analysis of these sequences show low levels of polymorphism in cavefish compared to surface fish, confirming previous studies on a small number of genes. A high proportion of the genes mutated in cavefish are known to be expressed in the zebrafish visual system. Such a high number of mutations in cavefish putative eye genes may be explained by relaxed selection for vision during the evolution in the absence of light. Based on these sequence differences, we provide a list of 11 genes that are potential candidates for having a role in cavefish visual system degeneration. PMID:23326453

  17. Rational Structure-Based Rescaffolding Approach to De Novo Design of Interleukin 10 (IL-10) Receptor-1 Mimetics

    PubMed Central

    Philipp, Jenny; Künze, Georg; Wodtke, Robert; Löser, Reik; Fahmy, Karim; Pisabarro, M. Teresa

    2016-01-01

    Tackling protein interfaces with small molecules capable of modulating protein-protein interactions remains a challenge in structure-based ligand design. Particularly arduous are cases in which the epitopes involved in molecular recognition have a non-structured and discontinuous nature. Here, the basic strategy of translating continuous binding epitopes into mimetic scaffolds cannot be applied, and other innovative approaches are therefore required. We present a structure-based rational approach involving the use of a regular expression syntax inspired in the well established PROSITE to define minimal descriptors of geometric and functional constraints signifying relevant functionalities for recognition in protein interfaces of non-continuous and unstructured nature. These descriptors feed a search engine that explores the currently available three-dimensional chemical space of the Protein Data Bank (PDB) in order to identify in a straightforward manner regular architectures containing the desired functionalities, which could be used as templates to guide the rational design of small natural-like scaffolds mimicking the targeted recognition site. The application of this rescaffolding strategy to the discovery of natural scaffolds incorporating a selection of functionalities of interleukin-10 receptor-1 (IL-10R1), which are relevant for its interaction with interleukin-10 (IL-10) has resulted in the de novo design of a new class of potent IL-10 peptidomimetic ligands. PMID:27123592

  18. De Novo sequencing of sunflower genome for SNP discovery using RAD (Restriction site Associated DNA) approach

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Application of Single Nucleotide Polymorphism (SNP) marker technology as a tool in sunflower breeding programs offers enormous potential to improve sunflower genetics, and facilitate faster release of sunflower hybrids to the market place. Through a National Sunflower Association (NSA) funded initia...

  19. De Novo Transcriptome Sequencing of the Octopus vulgaris Hemocytes Using Illumina RNA-Seq Technology: Response to the Infection by the Gastrointestinal Parasite Aggregata octopiana

    PubMed Central

    Castellanos-Martínez, Sheila; Arteta, David; Catarino, Susana; Gestal, Camino

    2014-01-01

    Background Octopus vulgaris is a highly valuable species of great commercial interest and excellent candidate for aquaculture diversification; however, the octopus’ well-being is impaired by pathogens, of which the gastrointestinal coccidian parasite Aggregata octopiana is one of the most important. The knowledge of the molecular mechanisms of the immune response in cephalopods, especially in octopus is scarce. The transcriptome of the hemocytes of O. vulgaris was de novo sequenced using the high-throughput paired-end Illumina technology to identify genes involved in immune defense and to understand the molecular basis of octopus tolerance/resistance to coccidiosis. Results A bi-directional mRNA library was constructed from hemocytes of two groups of octopus according to the infection by A. octopiana, sick octopus, suffering coccidiosis, and healthy octopus, and reads were de novo assembled together. The differential expression of transcripts was analysed using the general assembly as a reference for mapping the reads from each condition. After sequencing, a total of 75,571,280 high quality reads were obtained from the sick octopus group and 74,731,646 from the healthy group. The general transcriptome of the O. vulgaris hemocytes was assembled in 254,506 contigs. A total of 48,225 contigs were successfully identified, and 538 transcripts exhibited differential expression between groups of infection. The general transcriptome revealed genes involved in pathways like NF-kB, TLR and Complement. Differential expression of TLR-2, PGRP, C1q and PRDX genes due to infection was validated using RT-qPCR. In sick octopuses, only TLR-2 was up-regulated in hemocytes, but all of them were up-regulated in caecum and gills. Conclusion The transcriptome reported here de novo establishes the first molecular clues to understand how the octopus immune system works and interacts with a highly pathogenic coccidian. The data provided here will contribute to identification of biomarkers

  20. The genetic landscape of paediatric de novo acute myeloid leukaemia as defined by single nucleotide polymorphism array and exon sequencing of 100 candidate genes.

    PubMed

    Olsson, Linda; Zettermark, Sofia; Biloglav, Andrea; Castor, Anders; Behrendtz, Mikael; Forestier, Erik; Paulsson, Kajsa; Johansson, Bertil

    2016-07-01

    Cytogenetic analyses of a consecutive series of 67 paediatric (median age 8 years; range 0-17) de novo acute myeloid leukaemia (AML) patients revealed aberrations in 55 (82%) cases. The most common subgroups were KMT2A rearrangement (29%), normal karyotype (15%), RUNX1-RUNX1T1 (10%), deletions of 5q, 7q and/or 17p (9%), myeloid leukaemia associated with Down syndrome (7%), PML-RARA (7%) and CBFB-MYH11 (5%). Single nucleotide polymorphism array (SNP-A) analysis and exon sequencing of 100 genes, performed in 52 and 40 cases, respectively (39 overlapping), revealed ≥1 aberration in 89%; when adding cytogenetic data, this frequency increased to 98%. Uniparental isodisomies (UPIDs) were detected in 13% and copy number aberrations (CNAs) in 63% (median 2/case); three UPIDs and 22 CNAs were recurrent. Twenty-two genes were targeted by focal CNAs, including AEBP2 and PHF6 deletions and genes involved in AML-associated gene fusions. Deep sequencing identified mutations in 65% of cases (median 1/case). In total, 60 mutations were found in 30 genes, primarily those encoding signalling proteins (47%), transcription factors (25%), or epigenetic modifiers (13%). Twelve genes (BCOR, CEBPA, FLT3, GATA1, KIT, KRAS, NOTCH1, NPM1, NRAS, PTPN11, SMC3 and TP53) were recurrently mutated. We conclude that SNP-A and deep sequencing analyses complement the cytogenetic diagnosis of paediatric AML.

  1. De Novo Assembly of Coding Sequences of the Mangrove Palm (Nypa fruticans) Using RNA-Seq and Discovery of Whole-Genome Duplications in the Ancestor of Palms.

    PubMed

    He, Ziwen; Zhang, Zhang; Guo, Wuxia; Zhang, Ying; Zhou, Renchao; Shi, Suhua

    2015-01-01

    Nypa fruticans (Arecaceae) is the only monocot species of true mangroves. This species represents the earliest mangrove fossil recorded. How N. fruticans adapts to the harsh and unstable intertidal zone is an interesting question. However, the 60 gene segments deposited in NCBI are insufficient for solving this question. In this study, we sequenced, assembled and annotated the transcriptome of N. fruticans using next-generation sequencing technology. A total of 19,918,800 clean paired-end reads were de novo assembled into 45,368 unigenes with a N50 length of 1,096 bp. A total of 41.35% unigenes were functionally annotated using Blast2GO. Many genes annotated to "response to stress" and 15 putative positively selected genes were identified. Simple sequence repeats were identified and compared with other palms. The divergence time between N. fruticans and other palms was estimated at 75 million years ago using the genomic data, which is consistent with the fossil record. After calculating the synonymous substitution rate between paralogs, we found that two whole-genome duplication events were shared by N. fruticans and other palms. These duplication events provided a large amount of raw material for the more than 2,000 later speciation events in Arecaceae. This study provides a high quality resource for further functional and evolutionary studies of N. fruticans and palms in general. PMID:26684618

  2. Transcriptome Profile of the Asian Giant Hornet (Vespa mandarinia) Using Illumina HiSeq 4000 Sequencing: De Novo Assembly, Functional Annotation, and Discovery of SSR Markers.

    PubMed

    Patnaik, Bharat Bhusan; Park, So Young; Kang, Se Won; Hwang, Hee-Ju; Wang, Tae Hun; Park, Eun Bi; Chung, Jong Min; Song, Dae Kwon; Kim, Changmu; Kim, Soonok; Lee, Jae Bong; Jeong, Heon Cheon; Park, Hong Seog; Han, Yeon Soo; Lee, Yong Seok

    2016-01-01

    Vespa mandarinia found in the forests of East Asia, including Korea, occupies the highest rank in the arthropod food web within its geographical range. It serves as a source of nutrition in the form of Vespa amino acid mixture and is listed as a threatened species, although no conservation measures have been implemented. Here, we performed de novo assembly of the V. mandarinia transcriptome by Illumina HiSeq 4000 sequencing. Over 60 million raw reads and 59,184,811 clean reads were obtained. After assembly, a total of 66,837 unigenes were clustered, 40,887, 44,455, and 22,390 of which showed homologous matches against the PANM, Unigene, and KOG databases, respectively. A total of 15,675 unigenes were assigned to Gene Ontology terms, and 5,132 unigenes were mapped to 115 KEGG pathways. The zinc finger domain (C2H2-like), serine/threonine/dual specificity protein kinase domain, and RNA recognition motif domain were among the top InterProScan domains predicted for V. mandarinia sequences. Among the unigenes, we identified 534,922 cDNA simple sequence repeats as potential markers. This is the first transcriptomic analysis of the wasp V. mandarinia using Illumina HiSeq 4000. The obtained datasets should promote the search for new genes to understand the physiological attributes of this wasp. PMID:26881195

  3. The genetic landscape of paediatric de novo acute myeloid leukaemia as defined by single nucleotide polymorphism array and exon sequencing of 100 candidate genes.

    PubMed

    Olsson, Linda; Zettermark, Sofia; Biloglav, Andrea; Castor, Anders; Behrendtz, Mikael; Forestier, Erik; Paulsson, Kajsa; Johansson, Bertil

    2016-07-01

    Cytogenetic analyses of a consecutive series of 67 paediatric (median age 8 years; range 0-17) de novo acute myeloid leukaemia (AML) patients revealed aberrations in 55 (82%) cases. The most common subgroups were KMT2A rearrangement (29%), normal karyotype (15%), RUNX1-RUNX1T1 (10%), deletions of 5q, 7q and/or 17p (9%), myeloid leukaemia associated with Down syndrome (7%), PML-RARA (7%) and CBFB-MYH11 (5%). Single nucleotide polymorphism array (SNP-A) analysis and exon sequencing of 100 genes, performed in 52 and 40 cases, respectively (39 overlapping), revealed ≥1 aberration in 89%; when adding cytogenetic data, this frequency increased to 98%. Uniparental isodisomies (UPIDs) were detected in 13% and copy number aberrations (CNAs) in 63% (median 2/case); three UPIDs and 22 CNAs were recurrent. Twenty-two genes were targeted by focal CNAs, including AEBP2 and PHF6 deletions and genes involved in AML-associated gene fusions. Deep sequencing identified mutations in 65% of cases (median 1/case). In total, 60 mutations were found in 30 genes, primarily those encoding signalling proteins (47%), transcription factors (25%), or epigenetic modifiers (13%). Twelve genes (BCOR, CEBPA, FLT3, GATA1, KIT, KRAS, NOTCH1, NPM1, NRAS, PTPN11, SMC3 and TP53) were recurrently mutated. We conclude that SNP-A and deep sequencing analyses complement the cytogenetic diagnosis of paediatric AML. PMID:27022003

  4. OVNIp: an open source application facilitating the interpretation, the validation and the edition of proteomics data generated by MS analyses and de novo sequencing.

    PubMed

    Tessier, Dominique; Yclon, Pascal; Jacquemin, Ingrid; Larré, Colette; Rogniaux, Hélène

    2010-05-01

    Several academic software are available to help the validation and reporting of proteomics data generated by MS analyses. However, to our knowledge, none of them have been conceived to meet the particular needs generated by the study of organisms whose genomes are not sequenced. In that context, we have developed OVNIp, an open-source application which facilitates the whole process of proteomics results interpretation. One of its unique attributes is its capacity to compile multiple results (from several search engines and/or several databank searches) with a resolution of conflicting interpretations. Moreover, OVNIp enables automated exploitation of de novo sequences generated from unassigned MS/MS spectra leading to higher sequence coverage and enhancing confidence in the identified proteins. The exploitation of these additional spectra might also identify novel proteins through a MS-BLAST search, which can be easily ran from the OVNIp interface. Beyond this primary scope, OVNIp can also benefit to users who look for a simple standalone application to both visualize and confirm MS/MS result interpretations through a simple graphical interface and generate reports according to user-defined forms which may integrate the prerequisites for publication. Sources, documentation and a stable release for Windows are available at http://wwwappli.nantes.inra.fr:8180/OVNIp.

  5. De Novo Assembly of Coding Sequences of the Mangrove Palm (Nypa fruticans) Using RNA-Seq and Discovery of Whole-Genome Duplications in the Ancestor of Palms

    PubMed Central

    Guo, Wuxia; Zhang, Ying; Zhou, Renchao; Shi, Suhua

    2015-01-01

    Nypa fruticans (Arecaceae) is the only monocot species of true mangroves. This species represents the earliest mangrove fossil recorded. How N. fruticans adapts to the harsh and unstable intertidal zone is an interesting question. However, the 60 gene segments deposited in NCBI are insufficient for solving this question. In this study, we sequenced, assembled and annotated the transcriptome of N. fruticans using next-generation sequencing technology. A total of 19,918,800 clean paired-end reads were de novo assembled into 45,368 unigenes with a N50 length of 1,096 bp. A total of 41.35% unigenes were functionally annotated using Blast2GO. Many genes annotated to “response to stress” and 15 putative positively selected genes were identified. Simple sequence repeats were identified and compared with other palms. The divergence time between N. fruticans and other palms was estimated at 75 million years ago using the genomic data, which is consistent with the fossil record. After calculating the synonymous substitution rate between paralogs, we found that two whole-genome duplication events were shared by N. fruticans and other palms. These duplication events provided a large amount of raw material for the more than 2,000 later speciation events in Arecaceae. This study provides a high quality resource for further functional and evolutionary studies of N. fruticans and palms in general. PMID:26684618

  6. Transcriptome Profile of the Asian Giant Hornet (Vespa mandarinia) Using Illumina HiSeq 4000 Sequencing: De Novo Assembly, Functional Annotation, and Discovery of SSR Markers

    PubMed Central

    Park, So Young; Kang, Se Won; Hwang, Hee-Ju; Wang, Tae Hun; Park, Eun Bi; Chung, Jong Min; Song, Dae Kwon; Kim, Changmu; Kim, Soonok; Lee, Jae Bong; Jeong, Heon Cheon; Park, Hong Seog; Han, Yeon Soo; Lee, Yong Seok

    2016-01-01

    Vespa mandarinia found in the forests of East Asia, including Korea, occupies the highest rank in the arthropod food web within its geographical range. It serves as a source of nutrition in the form of Vespa amino acid mixture and is listed as a threatened species, although no conservation measures have been implemented. Here, we performed de novo assembly of the V. mandarinia transcriptome by Illumina HiSeq 4000 sequencing. Over 60 million raw reads and 59,184,811 clean reads were obtained. After assembly, a total of 66,837 unigenes were clustered, 40,887, 44,455, and 22,390 of which showed homologous matches against the PANM, Unigene, and KOG databases, respectively. A total of 15,675 unigenes were assigned to Gene Ontology terms, and 5,132 unigenes were mapped to 115 KEGG pathways. The zinc finger domain (C2H2-like), serine/threonine/dual specificity protein kinase domain, and RNA recognition motif domain were among the top InterProScan domains predicted for V. mandarinia sequences. Among the unigenes, we identified 534,922 cDNA simple sequence repeats as potential markers. This is the first transcriptomic analysis of the wasp V. mandarinia using Illumina HiSeq 4000. The obtained datasets should promote the search for new genes to understand the physiological attributes of this wasp. PMID:26881195

  7. Transcriptome Profile of the Asian Giant Hornet (Vespa mandarinia) Using Illumina HiSeq 4000 Sequencing: De Novo Assembly, Functional Annotation, and Discovery of SSR Markers.

    PubMed

    Patnaik, Bharat Bhusan; Park, So Young; Kang, Se Won; Hwang, Hee-Ju; Wang, Tae Hun; Park, Eun Bi; Chung, Jong Min; Song, Dae Kwon; Kim, Changmu; Kim, Soonok; Lee, Jae Bong; Jeong, Heon Cheon; Park, Hong Seog; Han, Yeon Soo; Lee, Yong Seok

    2016-01-01

    Vespa mandarinia found in the forests of East Asia, including Korea, occupies the highest rank in the arthropod food web within its geographical range. It serves as a source of nutrition in the form of Vespa amino acid mixture and is listed as a threatened species, although no conservation measures have been implemented. Here, we performed de novo assembly of the V. mandarinia transcriptome by Illumina HiSeq 4000 sequencing. Over 60 million raw reads and 59,184,811 clean reads were obtained. After assembly, a total of 66,837 unigenes were clustered, 40,887, 44,455, and 22,390 of which showed homologous matches against the PANM, Unigene, and KOG databases, respectively. A total of 15,675 unigenes were assigned to Gene Ontology terms, and 5,132 unigenes were mapped to 115 KEGG pathways. The zinc finger domain (C2H2-like), serine/threonine/dual specificity protein kinase domain, and RNA recognition motif domain were among the top InterProScan domains predicted for V. mandarinia sequences. Among the unigenes, we identified 534,922 cDNA simple sequence repeats as potential markers. This is the first transcriptomic analysis of the wasp V. mandarinia using Illumina HiSeq 4000. The obtained datasets should promote the search for new genes to understand the physiological attributes of this wasp.

  8. De Novo Sequencing and Transcriptome Analysis of Pleurotus eryngii subsp. tuoliensis (Bailinggu) Mycelia in Response to Cold Stimulation.

    PubMed

    Fu, Yong-Ping; Liang, Yuan; Dai, Yue-Ting; Yang, Chen-Tao; Duan, Ming-Zheng; Zhang, Zhuo; Hu, Song-Nian; Zhang, Zhi-Wu; Li, Yu

    2016-05-17

    Cold stimulation of Bailinggu's mycelia is the main factor that triggers primordia initiation for successful production of fruiting bodies under commercial cultivation. Yet, the molecular-level mechanisms involved in mycelia response to cold stimulation are still unclear. Here, we performed comparative transcriptomic analysis using RNA-Seq technology to better understand the gene expression regulation during different temporal stages of cold stimulation in Bailinggu. A total of 21,558 Bailinggu mycelia unigenes were de novo assembled and annotated from four libraries (control at 25 °C, plus cold stimulation treatments at -3 °C for a duration of 1-2 days, 5-6 days, and 9-10 days). GO and KEGG pathway analysis indicated that functional groups of differentially expressed unigenes associated with cell wall and membrane stabilization, calcium signaling and mitogen-activated protein kinases (MAPK) pathways, and soluble sugars and protein biosynthesis and metabolism pathways play a vital role in Bailinggu's response to cold stimulation. Six hundred and seven potential EST-based SSRs loci were identified in these unigenes, and 100 EST-SSR primers were randomly selected for validation. The overall polymorphism rate was 92% by using 10 wild strains of Bailinggu. Therefore, these results can serve as a valuable resource for a better understanding of the molecular mechanisms associated with Bailinggu's response to cold stimulation.

  9. De novo sequencing of root transcriptome reveals complex cadmium-responsive regulatory networks in radish (Raphanus sativus L.).

    PubMed

    Xu, Liang; Wang, Yan; Liu, Wei; Wang, Jin; Zhu, Xianwen; Zhang, Keyun; Yu, Rugang; Wang, Ronghua; Xie, Yang; Zhang, Wei; Gong, Yiqin; Liu, Liwang

    2015-07-01

    Cadmium (Cd) is a nonessential metallic trace element that poses potential chronic toxicity to living organisms. To date, little is known about the Cd-responsive regulatory network in root vegetable crops including radish. In this study, 31,015 unigenes representing 66,552 assembled unique transcripts were isolated from radish root under Cd stress based on de novo transcriptome assembly. In all, 1496 differentially expressed genes (DEGs) consisted of 3579 transcripts were identified from Cd-free (CK) and Cd-treated (Cd200) libraries. Gene Ontology and pathway enrichment analysis indicated that the up- and down-regulated DEGs were predominately involved in glucosinolate biosynthesis as well as cysteine and methionine-related pathways, respectively. RT-qPCR showed that the expression profiles of DEGs were in consistent with results from RNA-Seq analysis. Several candidate genes encoding phytochelatin synthase (PCS), metallothioneins (MTs), glutathione (GSH), zinc iron permease (ZIPs) and ABC transporter were responsible for Cd uptake, accumulation, translocation and detoxification in radish. The schematic model of DEGs and microRNAs-involved in Cd-responsive regulatory network was proposed. This study represents a first comprehensive transcriptome-based characterization of Cd-responsive DEGs in radish. These results could provide fundamental insight into complex Cd-responsive regulatory networks and facilitate further genetic manipulation of Cd accumulation in root vegetable crops. PMID:26025544

  10. De novo sequencing of root transcriptome reveals complex cadmium-responsive regulatory networks in radish (Raphanus sativus L.).

    PubMed

    Xu, Liang; Wang, Yan; Liu, Wei; Wang, Jin; Zhu, Xianwen; Zhang, Keyun; Yu, Rugang; Wang, Ronghua; Xie, Yang; Zhang, Wei; Gong, Yiqin; Liu, Liwang

    2015-07-01

    Cadmium (Cd) is a nonessential metallic trace element that poses potential chronic toxicity to living organisms. To date, little is known about the Cd-responsive regulatory network in root vegetable crops including radish. In this study, 31,015 unigenes representing 66,552 assembled unique transcripts were isolated from radish root under Cd stress based on de novo transcriptome assembly. In all, 1496 differentially expressed genes (DEGs) consisted of 3579 transcripts were identified from Cd-free (CK) and Cd-treated (Cd200) libraries. Gene Ontology and pathway enrichment analysis indicated that the up- and down-regulated DEGs were predominately involved in glucosinolate biosynthesis as well as cysteine and methionine-related pathways, respectively. RT-qPCR showed that the expression profiles of DEGs were in consistent with results from RNA-Seq analysis. Several candidate genes encoding phytochelatin synthase (PCS), metallothioneins (MTs), glutathione (GSH), zinc iron permease (ZIPs) and ABC transporter were responsible for Cd uptake, accumulation, translocation and detoxification in radish. The schematic model of DEGs and microRNAs-involved in Cd-responsive regulatory network was proposed. This study represents a first comprehensive transcriptome-based characterization of Cd-responsive DEGs in radish. These results could provide fundamental insight into complex Cd-responsive regulatory networks and facilitate further genetic manipulation of Cd accumulation in root vegetable crops.

  11. De novo sequencing and transcriptome analysis of a low temperature tolerant Saccharum spontaneum clone IND 00-1037.

    PubMed

    Dharshini, S; Chakravarthi, M; J, Ashwin Narayan; Manoj, V M; Naveenarani, M; Kumar, Ravinder; Meena, Minturam; Ram, Bakshi; Appunu, C

    2016-08-10

    Saccharum spontaneum L., a wild relative of sugarcane, is known for its adaptability to environmental stresses, particularly cold stress. In the present study, an attempt was made for transcriptome profiling of the low temperature (10°C) tolerant S. spontaneum clone IND 00-1037 collected from high altitude regions of Arunachal Pradesh, North Eastern India. The Illumina Nextseq500 platform yielded a total of 47.63 and 48.18 million reads corresponding to 4.7 and 4.8 gigabase pairs (Gb) of processed reads for control and cold stressed (10°C for 24h) samples, respectively. These reads were de novo assembled into 214,611 unigenes with an average length of 801bp. Further, all unigenes were aligned to GO, KEGG and COG databases in order to identify novel genes and pathways responsive upon low temperature conditions. The differential gene expression analysis revealed that about 2583 genes were upregulated and 3302 genes were down regulated during the stress. This is perhaps the comprehensive transcriptome data of a low temperature tolerant clone of S. spontaneum. This study would aid in identifying novel genes and also in future genomic studies pertaining to sugarcane and its wild relatives. PMID:27269250

  12. Sequencing, De Novo Assembly, and Annotation of the Transcriptome of the Endangered Freshwater Pearl Bivalve, Cristaria plicata, Provides Novel Insights into Functional Genes and Marker Discovery

    PubMed Central

    Kang, Se Won; Hwang, Hee-Ju; Park, So Young; Park, Eun Bi; Chung, Jong Min; Song, Dae Kwon; Kim, Changmu; Kim, Soonok; Lee, Jun Sang; Han, Yeon Soo; Park, Hong Seog; Lee, Yong Seok

    2016-01-01

    Background The freshwater mussel Cristaria plicata (Bivalvia: Eulamellibranchia: Unionidae), is an economically important species in molluscan aquaculture due to its use in pearl farming. The species have been listed as endangered in South Korea due to the loss of natural habitats caused by anthropogenic activities. The decreasing population and a lack of genomic information on the species is concerning for environmentalists and conservationists. In this study, we conducted a de novo transcriptome sequencing and annotation analysis of C. plicata using Illumina HiSeq 2500 next-generation sequencing (NGS) technology, the Trinity assembler, and bioinformatics databases to prepare a sustainable resource for the identification of candidate genes involved in immunity, defense, and reproduction. Results The C. plicata transcriptome analysis included a total of 286,152,584 raw reads and 281,322,837 clean reads. The de novo assembly identified a total of 453,931 contigs and 374,794 non-redundant unigenes with average lengths of 731.2 and 737.1 bp, respectively. Furthermore, 100% coverage of C. plicata mitochondrial genes within two unigenes supported the quality of the assembler. In total, 84,274 unigenes showed homology to entries in at least one database, and 23,246 unigenes were allocated to one or more Gene Ontology (GO) terms. The most prominent GO biological process, cellular component, and molecular function categories (level 2) were cellular process, membrane, and binding, respectively. A total of 4,776 unigenes were mapped to 123 biological pathways in the KEGG database. Based on the GO terms and KEGG annotation, the unigenes were suggested to be involved in immunity, stress responses, sex-determination, and reproduction. A total of 17,251 cDNA simple sequence repeats (cSSRs) were identified from 61,141 unigenes (size of >1 kb) with the most abundant being dinucleotide repeats. Conclusions This dataset represents the first transcriptome analysis of the endangered

  13. Deciphering the human microbiome using next-generation sequencing data and bioinformatics approaches.

    PubMed

    Kim, Yihwan; Koh, InSong; Rho, Mina

    2015-06-01

    The human microbiome is one of the key factors affecting the host immune system and metabolic functions that are not encoded in the human genome. Culture-independent analysis of the human microbiome using metagenomics approach allows us to investigate the compositions and functions of the human microbiome. Computational methods analyze the microbial community by using specific marker genes or by using shotgun sequencing of the entire microbial community. Taxonomy profiling is conducted by using the reference sequences or by de novo clustering of the specific region of sequences. Functional profiling, which is mainly based on the sequence similarity, is more challenging since about half of ORFs predicted in the metagenomic data could not find homology with known protein families. This review examines computational methods that are valuable for the analysis of human microbiome, and highlights the results of several large-scale human microbiome studies. It is becoming increasingly evident that dysbiosis of the gut microbiome is strongly associated with the development of immune disorder and metabolic dysfunction.

  14. High-Throughput Sequencing and De Novo Assembly of Brassica oleracea var. Capitata L. for Transcriptome Analysis

    PubMed Central

    Kim, Sangmi; Choe, Jun Kyoung; Jo, Sung-Hwan; Baek, Namkwon; Kwon, Suk-Yoon

    2014-01-01

    Background The cabbage, Brassica oleracea var. capitata L., has a distinguishable phenotype within the genus Brassica. Despite the economic and genetic importance of cabbage, there is little genomic data for cabbage, and most studies of Brassica are focused on other species or other B. oleracea subspecies. The lack of genomic data for cabbage, a non-model organism, hinders research on its molecular biology. Hence, the construction of reliable transcriptomic data based on high-throughput sequencing technologies is needed to enhance our understanding of cabbage and provide genomic information for future work. Methodology/Principal Findings We constructed cDNAs from total RNA isolated from the roots, leaves, flowers, seedlings, and calcium-limited seedling tissues of two cabbage genotypes: 102043 and 107140. We sequenced a total of six different samples using the Illumina HiSeq platform, producing 40.5 Gbp of sequence data comprising 401,454,986 short reads. We assembled 205,046 transcripts (≥ 200 bp) using the Velvet and Oases assembler and predicted 53,562 loci from the transcripts. We annotated 35,274 of the loci with 55,916 plant peptides in the Phytozome database. The average length of the annotated loci was 1,419 bp. We confirmed the reliability of the sequencing assembly using reverse-transcriptase PCR to identify tissue-specific gene candidates among the annotated loci. Conclusion Our study provides valuable transcriptome sequence data for B. oleracea var. capitata L., offering a new resource for studying B. oleracea and closely related species. Our transcriptomic sequences will enhance the quality of gene annotation and functional analysis of the cabbage genome and serve as a material basis for future genomic research on cabbage. The sequencing data from this study can be used to develop molecular markers and to identify the extreme differences among the phenotypes of different species in the genus Brassica. PMID:24682075

  15. Malan syndrome: Sotos-like overgrowth with de novo NFIX sequence variants and deletions in six new patients and a review of the literature

    PubMed Central

    Klaassens, Merel; Morrogh, Deborah; Rosser, Elisabeth M; Jaffer, Fatima; Vreeburg, Maaike; Bok, Levinus A; Segboer, Tim; van Belzen, Martine; Quinlivan, Ros M; Kumar, Ajith; Hurst, Jane A; Scott, Richard H

    2015-01-01

    De novo monoallelic variants in NFIX cause two distinct syndromes. Whole gene deletions, nonsense variants and missense variants affecting the DNA-binding domain have been seen in association with a Sotos-like phenotype that we propose is referred to as Malan syndrome. Frameshift and splice-site variants thought to avoid nonsense-mediated RNA decay have been seen in Marshall–Smith syndrome. We report six additional patients with Malan syndrome and de novo NFIX deletions or sequence variants and review the 20 patients now reported. The phenotype is characterised by moderate postnatal overgrowth and macrocephaly. Median height and head circumference in childhood are 2.0 and 2.3 standard deviations (SD) above the mean, respectively. There is overlap of the facial phenotype with NSD1-positive Sotos syndrome in some cases including a prominent forehead, high anterior hairline, downslanting palpebral fissures and prominent chin. Neonatal feeding difficulties and/or hypotonia have been reported in 30% of patients. Developmental delay/learning disability have been reported in all cases and are typically moderate. Ocular phenotypes are common, including strabismus (65%), nystagmus (25% ) and optic disc pallor/hypoplasia (25%). Other recurrent features include pectus excavatum (40%) and scoliosis (25%). Eight reported patients have a deletion also encompassing CACNA1A, haploinsufficiency of which causes episodic ataxia type 2 or familial hemiplegic migraine. One previous case had episodic ataxia and one case we report has had cyclical vomiting responsive to pizotifen. In individuals with this contiguous gene deletion syndrome, awareness of possible later neurological manifestations is important, although their penetrance is not yet clear. PMID:25118028

  16. Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree

    PubMed Central

    2013-01-01

    Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206

  17. BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data

    PubMed Central

    Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel

    2012-01-01

    BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310

  18. The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome.

    PubMed

    Sakai, Hiroaki; Naito, Ken; Ogiso-Tanaka, Eri; Takahashi, Yu; Iseki, Kohtaro; Muto, Chiaki; Satou, Kazuhito; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Hirano, Takashi; Itoh, Takeshi; Kaga, Akito; Tomooka, Norihiko

    2015-11-30

    Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

  19. Large scale in-silico identification and characterization of simple sequence repeats (SSRs) from de novo assembled transcriptome of Catharanthus roseus (L.) G. Don.

    PubMed

    Kumar, Santosh; Shah, Niraj; Garg, Vanika; Bhatia, Sabhyata

    2014-06-01

    Transcriptomic data of C. roseus offering ample sequence resources for providing better insights into gene diversity: large resource of genic SSR markers to accelerate genomic studies and breeding in Catharanthus . Next-generation sequencing is an efficient system for generating high-throughput complete transcripts/genes and developing molecular markers. We present here the transcriptome sequencing of a 26-day-old Catharanthus roseus seedling tissue using Illumina GAIIX platform that resulted in a total of 3.37 Gb of nucleotide sequence data comprising 29,964,104 reads which were de novo assembled into 26,581 unigenes. Based on similarity searches 58 % of the unigenes were annotated of which 13,580 unique transcripts were assigned 5016 gene ontology terms. Further, 7,687 of the unigenes were found to have Cluster of Orthologous Group classifications, and 4,006 were assigned to 289 Kyoto Encyclopedia of Genes and Genome pathways. Also, 5,221 (19.64 %) of transcripts were distributed to 81 known transcription factor (TF) families. In-silico analysis of the transcriptome resulted in identification of 11,004 SSRs in 26.62 % transcripts from which 2,520 SSR markers were designed which exhibited a non-random pattern of distribution. The most abundant was the trinucleotide repeats (AAG/CTT) followed by the dinucleotide repeats (AG/CT). Location specific analysis of SSRs revealed that SSRs were preferentially associated with the 5'-UTRs with a predicted role in regulation of gene expression. A PCR validation of a set of 48 primers revealed 97.9 % successful amplification, and 76.6 % of them showed polymorphism across different Catharanthus species as well as accessions of C. roseus. In summary, this study will provide an insight into understanding the seedling development and resources for novel gene discovery and SSR development for utilization in marker-assisted selective breeding in C. roseus.

  20. Transcriptome de novo assembly from next-generation sequencing and comparative analyses in the hexaploid salt marsh species Spartina maritima and Spartina alterniflora (Poaceae)

    PubMed Central

    Ferreira de Carvalho, J; Poulain, J; Da Silva, C; Wincker, P; Michon-Coudouel, S; Dheilly, A; Naquin, D; Boutte, J; Salmon, A; Ainouche, M

    2013-01-01

    Spartina species have a critical ecological role in salt marshes and represent an excellent system to investigate recurrent polyploid speciation. Using the 454 GS-FLX pyrosequencer, we assembled and annotated the first reference transcriptome (from roots and leaves) for two related hexaploid Spartina species that hybridize in Western Europe, the East American invasive Spartina alterniflora and the Euro-African S. maritima. The de novo read assembly generated 38 478 consensus sequences and 99% found an annotation using Poaceae databases, representing a total of 16 753 non-redundant genes. Spartina expressed sequence tags were mapped onto the Sorghum bicolor genome, where they were distributed among the subtelomeric arms of the 10 S. bicolor chromosomes, with high gene density correlation. Normalization of the complementary DNA library improved the number of annotated genes. Ecologically relevant genes were identified among GO biological function categories in salt and heavy metal stress response, C4 photosynthesis and in lignin and cellulose metabolism. Expression of some of these genes had been found to be altered by hybridization and genome duplication in a previous microarray-based study in Spartina. As these species are hexaploid, up to three duplicated homoeologs may be expected per locus. When analyzing sequence polymorphism at four different loci in S. maritima and S. alterniflora, we found up to four haplotypes per locus, suggesting the presence of two expressed homoeologous sequences with one or two allelic variants each. This reference transcriptome will allow analysis of specific Spartina genes of ecological or evolutionary interest, estimation of homoeologous gene expression variation using RNA-seq and further gene expression evolution analyses in natural populations. PMID:23149455

  1. De novo sequencing-based transcriptome and digital gene expression analysis reveals insecticide resistance-relevant genes in Propylaea japonica (Thunberg) (Coleoptea: Coccinellidae).

    PubMed

    Tang, Liang-De; Wang, Xing-Min; Jin, Feng-Liang; Qiu, Bao-Li; Wu, Jian-Hui; Ren, Shun-Xiang

    2014-01-01

    The ladybird Propylaea japonica (Thunberg) is one of most important natural enemies of aphids in China. This species is threatened by the extensive use of insecticides but genomics-based information on the molecular mechanisms underlying insecticide resistance is limited. Hence, we analyzed the transcriptome and expression profile data of P. japonica in order to gain a deeper understanding of insecticide resistance in ladybirds. We performed de novo assembly of a transcriptome using Illumina's Solexa sequencing technology and short reads. A total of 27,243,552 reads were generated. These were assembled into 81,458 contigs and 33,647 unigenes (6,862 clusters and 26,785 singletons). Of the unigenes, 23,965 (71.22%) have putative homologues in the non-redundant (nr) protein database from NCBI, using BLASTX, with a cut-off E-value of 10(-5). We examined COG, GO and KEGG annotations to better understand the functions of these unigenes. Digital gene expression (DGE) libraries showed differences in gene expression profiles between two insecticide resistant strains. When compared with an insecticide susceptible profile, a total of 4,692 genes were significantly up- or down- regulated in a moderately resistant strain. Among these genes, 125 putative insecticide resistance genes were identified. To confirm the DGE results, 16 selected genes were validated using quantitative real time PCR (qRT-PCR). This study is the first to report genetic information on P. japonica and has greatly enriched the sequence data for ladybirds. The large number of gene sequences produced from the transcriptome and DGE sequencing will greatly improve our understanding of this important insect, at the molecular level, and could contribute to the in-depth research into insecticide resistance mechanisms. PMID:24959827

  2. De novo Sequencing and Transcriptome Analysis of Pinellia ternata Identify the Candidate Genes Involved in the Biosynthesis of Benzoic Acid and Ephedrine

    PubMed Central

    Zhang, Guang-hui; Jiang, Ni-hao; Song, Wan-ling; Ma, Chun-hua; Yang, Sheng-chao; Chen, Jun-wen

    2016-01-01

    Background: The medicinal herb, Pinellia ternata, is purported to be an anti-emetic with analgesic and sedative effects. Alkaloids are the main biologically active compounds in P. ternata, especially ephedrine that is a phenylpropylamino alkaloid specifically produced by Ephedra and Catha edulis. However, how ephedrine is synthesized in plants is uncertain. Only the phenylalanine ammonia lyase (PAL) and relevant genes in this pathway have been characterized. Genomic information of P. ternata is also unavailable. Results: We analyzed the transcriptome of the tuber of P. ternata with the Illumina HiSeq™ 2000 sequencing platform. 66,813,052 high-quality reads were generated, and these reads were assembled de novo into 89,068 unigenes. Most known genes involved in benzoic acid biosynthesis were identified in the unigene dataset of P. ternata, and the expression patterns of some ephedrine biosynthesis-related genes were analyzed by reverse transcription quantitative real-time PCR (RT-qPCR). Also, 14,468 simple sequence repeats (SSRs) were identified from 12,000 unigenes. Twenty primer pairs for SSRs were randomly selected for the validation of their amplification effect. Conclusion: RNA-seq data was used for the first time to provide a comprehensive gene information on P. ternata at the transcriptional level. These data will advance molecular genetics in this valuable medicinal plant. PMID:27579029

  3. De Novo Genome Sequence of “Candidatus Liberibacter solanacearum” from a Single Potato Psyllid in California

    PubMed Central

    Wu, F.; Liang, G.; Wallis, C.; Trumble, J. T.; Prager, S.

    2015-01-01

    The draft genome sequence of “Candidatus Liberibacter solanacearum” strain RSTM from a potato psyllid (Bactericera cockerelli) in California is reported here. The RSTM strain has a genome size of 1,286,787 bp, a G+C content of 35.1%, 1,211 predicted open reading frames (ORFs), and 43 RNA genes. PMID:26679599

  4. Estimating evolution of temporal sequence changes: a practical approach to inferring ancestral developmental sequences and sequence heterochrony.

    PubMed

    Harrison, Luke B; Larsson, Hans C E

    2008-06-01

    Developmental biology often yields data in a temporal context. Temporal data in phylogenetic systematics has important uses in the field of evolutionary developmental biology and, in general, comparative biology. The evolution of temporal sequences, specifically developmental sequences, has proven difficult to examine due to the highly variable temporal progression of development. Issues concerning the analysis of temporal sequences and problems with current methods of analysis are discussed. We present here an algorithm to infer ancestral temporal sequences, quantify sequence heterochronies, and estimate pseudoreplicate consensus support for sequence changes using Parsimov-based genetic inference [PGi]. Real temporal developmental sequence data sets are used to compare PGi with currently used approaches, and PGi is shown to be the most efficient, accurate, and practical method to examine biological data and infer ancestral states on a phylogeny. The method is also expandable to address further issues in developmental evolution, namely modularity. PMID:18570033

  5. Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework

    PubMed Central

    2015-01-01

    Background Recent progress in next-generation sequencing technology has afforded several improvements such as ultra-high throughput at low cost, very high read quality, and substantially increased sequencing depth. State-of-the-art high-throughput sequencers, such as the Illumina MiSeq system, can generate ~15 Gbp sequencing data per run, with >80% bases above Q30 and a sequencing depth of up to several 1000x for small genomes. Illumina HiSeq 2500 is capable of generating up to 1 Tbp per run, with >80% bases above Q30 and often >100x sequencing depth for large genomes. To speed up otherwise time-consuming genome assembly and/or to obtain a skeleton of the assembly quickly for scaffolding or progressive assembly, methods for noise removal and reduction of redundancy in the original data, with almost equal or better assembly results, are worth studying. Results We developed two subset selection methods for single-end reads and a method for paired-end reads based on base quality scores and other read analytic tools using the MapReduce framework. We proposed two strategies to select reads: MinimalQ and ProductQ. MinimalQ selects reads with minimal base-quality above a threshold. ProductQ selects reads with probability of no incorrect base above a threshold. In the single-end experiments, we used Escherichia coli and Bacillus cereus datasets of MiSeq, Velvet assembler for genome assembly, and GAGE benchmark tools for result evaluation. In the paired-end experiments, we used the giant grouper (Epinephelus lanceolatus) dataset of HiSeq, ALLPATHS-LG genome assembler, and QUAST quality assessment tool for comparing genome assemblies of the original set and the subset. The results show that subset selection not only can speed up the genome assembly but also can produce substantially longer scaffolds. Availability: The software is freely available at https://github.com/moneycat/QReadSelector. PMID:26678408

  6. De novo mutations in human genetic disease.

    PubMed

    Veltman, Joris A; Brunner, Han G

    2012-08-01

    New mutations have long been known to cause genetic disease, but their true contribution to the disease burden can only now be determined using family-based whole-genome or whole-exome sequencing approaches. In this Review we discuss recent findings suggesting that de novo mutations play a prominent part in rare and common forms of neurodevelopmental diseases, including intellectual disability, autism and schizophrenia. De novo mutations provide a mechanism by which early-onset reproductively lethal diseases remain frequent in the population. These mutations, although individually rare, may capture a significant part of the heritability for complex genetic diseases that is not detectable by genome-wide association studies. PMID:22805709

  7. De novo transcriptome sequencing and comparative analysis of differentially expressed genes in Gossypium aridum under salt stress.

    PubMed

    Xu, Peng; Liu, Zhangwei; Fan, Xinqi; Gao, Jin; Zhang, Xia; Zhang, Xianggui; Shen, Xinlian

    2013-08-01

    Salinity stress is one of the most serious factors that impede the growth and development of various crops. Wild Gossypium species, which are remarkably tolerant to salt water immersion, are valuable resources for understanding salt tolerance mechanisms of Gossypium and improving salinity resistance in upland cotton. To generate a broad survey of genes with altered expression during various stages of salt stress, a mixed RNA sample was prepared from the roots and leaves of Gossypium aridum plants subjected to salt stress. The transcripts were sequenced using the Illumina sequencing platform. After cleaning and quality checks, approximately 41.5 million clean reads were obtained. Finally, these reads were eventually assembled into 98,989 unigenes with a mean size of 452 bp. All unigenes were compared to known cluster of orthologous groups (COG) sequences to predict and classify the possible functions of these genes, which were classified into at least 25 molecular families. Variations in gene expression were then examined after exposing the plants to 200 mM NaCl for 3, 12, 72 or 144 h. Sequencing depths of approximately six million raw tags were achieved for each of the five stages of salt stress. There were 2634 (1513 up-regulated/1121 down-regulated), 2449 (1586 up-regulated/863 down-regulated), 2271 (946 up-regulated/1325 down-regulated) and 3352 (933 up-regulated/2419 down-regulated) genes that were differentially expressed after exposure to NaCl for 3, 12, 72 and 144 h, respectively. Digital gene expression analysis indicated that pathways involved in "transport", "response to hormone stimulus" and "signaling" play important roles during salt stress, while genes involved in "protein kinase activity" and "transporter activity" undergo major changes in expression during early and later stages of salt stress, respectively.

  8. The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome

    PubMed Central

    Sakai, Hiroaki; Naito, Ken; Ogiso-Tanaka, Eri; Takahashi, Yu; Iseki, Kohtaro; Muto, Chiaki; Satou, Kazuhito; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Hirano, Takashi; Itoh, Takeshi; Kaga, Akito; Tomooka, Norihiko

    2015-01-01

    Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome. PMID:26616024

  9. De Novo Genome Assembly of the Economically Important Weed Horseweed Using Integrated Data from Multiple Sequencing Platforms1[C][W][OPEN

    PubMed Central

    Peng, Yanhui; Lai, Zhao; Lane, Thomas; Nageswara-Rao, Madhugiri; Okada, Miki; Jasieniuk, Marie; O’Geen, Henriette; Kim, Ryan W.; Sammons, R. Douglas; Rieseberg, Loren H.; Stewart, C. Neal

    2014-01-01

    Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed. PMID:25209985

  10. An optimization approach and its application to compare DNA sequences

    NASA Astrophysics Data System (ADS)

    Liu, Liwei; Li, Chao; Bai, Fenglan; Zhao, Qi; Wang, Ying

    2015-02-01

    Studying the evolutionary relationship between biological sequences has become one of the main tasks in bioinformatics research by means of comparing and analyzing the gene sequence. Many valid methods have been applied to the DNA sequence alignment. In this paper, we propose a novel comparing method based on the Lempel-Ziv (LZ) complexity to compare biological sequences. Moreover, we introduce a new distance measure and make use of the corresponding similarity matrix to construct phylogenic tree without multiple sequence alignment. Further, we construct phylogenic tree for 24 species of Eutherian mammals and 48 countries of Hepatitis E virus (HEV) by an optimization approach. The results indicate that this new method improves the efficiency of sequence comparison and successfully construct phylogenies.

  11. De novo Taproot Transcriptome Sequencing and Analysis of Major Genes Involved in Sucrose Metabolism in Radish (Raphanus sativus L.)

    PubMed Central

    Yu, Rugang; Xu, Liang; Zhang, Wei; Wang, Yan; Luo, Xiaobo; Wang, Ronghua; Zhu, Xianwen; Xie, Yang; Karanja, Benard; Liu, Liwang

    2016-01-01

    Radish (Raphanus sativus L.) is an important annual or biennial root vegetable crop. The fleshy taproot comprises the main edible portion of the plant with high nutrition and medical value. Molecular biology study of radish begun rather later, and lacks sufficient transcriptomic and genomic data in pubic databases for understanding of the molecular mechanism during the radish taproot formation. To develop a comprehensive overview of the ‘NAU-YH’ root transcriptome, a cDNA library, prepared from three equally mixed RNA of taproots at different developmental stages including pre-cortex splitting stage, cortex splitting stage, and expanding stage was sequenced using high-throughput Illumina RNA sequencing. From approximately 51 million clean reads, a total of 70,168 unigenes with a total length of 50.28 Mb, an average length of 717 bp and a N50 of 994 bp were obtained. In total, 63,991 (about 91.20% of the assembled unigenes) unigenes were successfully annotated in five public databases including NR, GO, COG, KEGG, and Nt. GO analysis revealed that the majority of these unigenes were predominately involved in basic physiological and metabolic processes, catalytic, binding, and cellular process. In addition, a total of 103 unigenes encoding eight enzymes involved in the sucrose metabolism related pathways were also identified by KEGG pathway analysis. Sucrose synthase (29 unigenes), invertase (17 unigenes), sucrose-phosphate synthase (16 unigenes), fructokinase (17 unigenes), and hexokinase (11 unigenes) ranked top five in these eight key enzymes. From which, two genes (RsSuSy1, RsSPS1) were validated by T-A cloning and sequenced, while the expression of six unigenes were profiled with RT-qPCR analysis. These results would be served as an important public reference platform to identify the related key genes during taproot thickening and facilitate the dissection of molecular mechanisms underlying taproot formation in radish. PMID:27242808

  12. De novo Taproot Transcriptome Sequencing and Analysis of Major Genes Involved in Sucrose Metabolism in Radish (Raphanus sativus L.).

    PubMed

    Yu, Rugang; Xu, Liang; Zhang, Wei; Wang, Yan; Luo, Xiaobo; Wang, Ronghua; Zhu, Xianwen; Xie, Yang; Karanja, Benard; Liu, Liwang

    2016-01-01

    Radish (Raphanus sativus L.) is an important annual or biennial root vegetable crop. The fleshy taproot comprises the main edible portion of the plant with high nutrition and medical value. Molecular biology study of radish begun rather later, and lacks sufficient transcriptomic and genomic data in pubic databases for understanding of the molecular mechanism during the radish taproot formation. To develop a comprehensive overview of the 'NAU-YH' root transcriptome, a cDNA library, prepared from three equally mixed RNA of taproots at different developmental stages including pre-cortex splitting stage, cortex splitting stage, and expanding stage was sequenced using high-throughput Illumina RNA sequencing. From approximately 51 million clean reads, a total of 70,168 unigenes with a total length of 50.28 Mb, an average length of 717 bp and a N50 of 994 bp were obtained. In total, 63,991 (about 91.20% of the assembled unigenes) unigenes were successfully annotated in five public databases including NR, GO, COG, KEGG, and Nt. GO analysis revealed that the majority of these unigenes were predominately involved in basic physiological and metabolic processes, catalytic, binding, and cellular process. In addition, a total of 103 unigenes encoding eight enzymes involved in the sucrose metabolism related pathways were also identified by KEGG pathway analysis. Sucrose synthase (29 unigenes), invertase (17 unigenes), sucrose-phosphate synthase (16 unigenes), fructokinase (17 unigenes), and hexokinase (11 unigenes) ranked top five in these eight key enzymes. From which, two genes (RsSuSy1, RsSPS1) were validated by T-A cloning and sequenced, while the expression of six unigenes were profiled with RT-qPCR analysis. These results would be served as an important public reference platform to identify the related key genes during taproot thickening and facilitate the dissection of molecular mechanisms underlying taproot formation in radish.

  13. De novo Taproot Transcriptome Sequencing and Analysis of Major Genes Involved in Sucrose Metabolism in Radish (Raphanus sativus L.).

    PubMed

    Yu, Rugang; Xu, Liang; Zhang, Wei; Wang, Yan; Luo, Xiaobo; Wang, Ronghua; Zhu, Xianwen; Xie, Yang; Karanja, Benard; Liu, Liwang

    2016-01-01

    Radish (Raphanus sativus L.) is an important annual or biennial root vegetable crop. The fleshy taproot comprises the main edible portion of the plant with high nutrition and medical value. Molecular biology study of radish begun rather later, and lacks sufficient transcriptomic and genomic data in pubic databases for understanding of the molecular mechanism during the radish taproot formation. To develop a comprehensive overview of the 'NAU-YH' root transcriptome, a cDNA library, prepared from three equally mixed RNA of taproots at different developmental stages including pre-cortex splitting stage, cortex splitting stage, and expanding stage was sequenced using high-throughput Illumina RNA sequencing. From approximately 51 million clean reads, a total of 70,168 unigenes with a total length of 50.28 Mb, an average length of 717 bp and a N50 of 994 bp were obtained. In total, 63,991 (about 91.20% of the assembled unigenes) unigenes were successfully annotated in five public databases including NR, GO, COG, KEGG, and Nt. GO analysis revealed that the majority of these unigenes were predominately involved in basic physiological and metabolic processes, catalytic, binding, and cellular process. In addition, a total of 103 unigenes encoding eight enzymes involved in the sucrose metabolism related pathways were also identified by KEGG pathway analysis. Sucrose synthase (29 unigenes), invertase (17 unigenes), sucrose-phosphate synthase (16 unigenes), fructokinase (17 unigenes), and hexokinase (11 unigenes) ranked top five in these eight key enzymes. From which, two genes (RsSuSy1, RsSPS1) were validated by T-A cloning and sequenced, while the expression of six unigenes were profiled with RT-qPCR analysis. These results would be served as an important public reference platform to identify the related key genes during taproot thickening and facilitate the dissection of molecular mechanisms underlying taproot formation in radish. PMID:27242808

  14. Specific versus non-specific immune responses in an invertebrate species evidenced by a comparative de novo sequencing study.

    PubMed

    Deleury, Emeline; Dubreuil, Géraldine; Elangovan, Namasivayam; Wajnberg, Eric; Reichhart, Jean-Marc; Gourbal, Benjamin; Duval, David; Baron, Olga Lucia; Gouzy, Jérôme; Coustau, Christine

    2012-01-01

    Our present understanding of the functioning and evolutionary history of invertebrate innate immunity derives mostly from studies on a few model species belonging to ecdysozoa. In particular, the characterization of signaling pathways dedicated to specific responses towards fungi and Gram-positive or Gram-negative bacteria in Drosophila melanogaster challenged our original view of a non-specific immunity in invertebrates. However, much remains to be elucidated from lophotrochozoan species. To investigate the global specificity of the immune response in the fresh-water snail Biomphalaria glabrata, we used massive Illumina sequencing of 5'-end cDNAs to compare expression profiles after challenge by Gram-positive or Gram-negative bacteria or after a yeast challenge. 5'-end cDNA sequencing of the libraries yielded over 12 millions high quality reads. To link these short reads to expressed genes, we prepared a reference transcriptomic database through automatic assembly and annotation of the 758,510 redundant sequences (ESTs, mRNAs) of B. glabrata available in public databases. Computational analysis of Illumina reads followed by multivariate analyses allowed identification of 1685 candidate transcripts differentially expressed after an immune challenge, with a two fold ratio between transcripts showing a challenge-specific expression versus a lower or non-specific differential expression. Differential expression has been validated using quantitative PCR for a subset of randomly selected candidates. Predicted functions of annotated candidates (approx. 700 unisequences) belonged to a large extend to similar functional categories or protein types. This work significantly expands upon previous gene discovery and expression studies on B. glabrata and suggests that responses to various pathogens may involve similar immune processes or signaling pathways but different genes belonging to multigenic families. These results raise the question of the importance of gene

  15. De Novo Assembly and Characterization of the Transcriptome of Seagrass Zostera marina Using Illumina Paired-End Sequencing

    PubMed Central

    Kong, Fanna; Li, Hong; Sun, Peipei; Zhou, Yang; Mao, Yunxiang

    2014-01-01

    Background The seagrass Zostera marina is a monocotyledonous angiosperm belonging to a polyphyletic group of plants that can live submerged in marine habitats. Zostera marina L. is one of the most common seagrasses and is considered a cornerstone of marine plant molecular ecology research and comparative studies. However, the mechanisms underlying its adaptation to the marine environment still remain poorly understood due to limited transcriptomic and genomic data. Principal Findings Here we explored the transcriptome of Z. marina leaves under different environmental conditions using Illumina paired-end sequencing. Approximately 55 million sequencing reads were obtained, representing 58,457 transcripts that correspond to 24,216 unigenes. A total of 14,389 (59.41%) unigenes were annotated by blast searches against the NCBI non-redundant protein database. 45.18% and 46.91% of the unigenes had significant similarity with proteins in the Swiss-Prot database and Pfam database, respectively. Among these, 13,897 unigenes were assigned to 57 Gene Ontology (GO) terms and 4,745 unigenes were identified and mapped to 233 pathways via functional annotation against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG). We compared the orthologous gene family of the Z. marina transcriptome to Oryza sativa and Pyropia yezoensis and 11,667 orthologous gene families are specific to Z. marina. Furthermore, we identified the photoreceptors sensing red/far-red light and blue light. Also, we identified a large number of genes that are involved in ion transporters and channels including Na+ efflux, K+ uptake, Cl− channels, and H+ pumping. Conclusions Our study contains an extensive sequencing and gene-annotation analysis of Z. marina. This information represents a genetic resource for the discovery of genes related to light sensing and salt tolerance in this species. Our transcriptome can be further utilized in future studies on molecular adaptation to abiotic stress in

  16. Solving the Water Jugs Problem by an Integer Sequence Approach

    ERIC Educational Resources Information Center

    Man, Yiu-Kwong

    2012-01-01

    In this article, we present an integer sequence approach to solve the classic water jugs problem. The solution steps can be obtained easily by additions and subtractions only, which is suitable for manual calculation or programming by computer. This approach can be introduced to secondary and undergraduate students, and also to teachers and…

  17. Identifying Gene Disruptions in Novel Balanced de novo Constitutional Translocations in Childhood Cancer Patients by Whole Genome Sequencing

    PubMed Central

    Ritter, Deborah I.; Haines, Katherine; Cheung, Hannah; Davis, Caleb F.; Lau, Ching C.; Berg, Jonathan S.; Brown, Chester W.; Thompson, Patrick A.; Gibbs, Richard; Wheeler, David A.; Plon, Sharon E.

    2014-01-01

    Purpose We applied whole genome sequencing to children diagnosed with neoplasms and found to carry apparently balanced constitutional translocations, to discover novel genic disruptions. Methods We applied SV calling programs CREST, Break Dancer, SV-STAT and CGAP-CNV, and developed an annotative filtering strategy to achieve nucleotide resolution at the translocations. Results We identified the breakpoints for t(6;12) (p21.1;q24.31) disrupting HNF1A in a patient diagnosed with hepatic adenomas and Maturity Onset Diabetes of the Young (MODY). Translocation as the disruptive event of HNF1A, a gene known to be involved in MODY3, has not been previously reported. In a subject with Hodgkin’s lymphoma and subsequent low-grade glioma, we identified t(5;18) (q35.1;q21.2), disrupting both SLIT3 and DCC, genes previously implicated in both glioma and lymphoma. Conclusions These examples suggest that implementing clinical whole genome sequencing in the diagnostic work-up of patients with novel but apparently balanced translocations may reveal unanticipated disruption of disease-associated genes and aid in prediction of the clinical phenotype. PMID:25569436

  18. De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

    PubMed

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori

    2014-11-01

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop.

  19. Multiplexed next-generation sequencing and de novo assembly to obtain near full-length HIV-1 genome from plasma virus.

    PubMed

    Aralaguppe, Shambhu G; Siddik, Abu Bakar; Manickam, Ashokkumar; Ambikan, Anoop T; Kumar, Milner M; Fernandes, Sunjay Jude; Amogne, Wondwossen; Bangaruswamy, Dhinoth K; Hanna, Luke Elizabeth; Sonnerborg, Anders; Neogi, Ujjwal

    2016-10-01

    Analysing the HIV-1 near full-length genome (HIV-NFLG) facilitates new understanding into the diversity of virus population dynamics at individual or population level. In this study we developed a simple but high-throughput next generation sequencing (NGS) protocol for HIV-NFLG using clinical specimens and validated the method against an external quality control (EQC) panel. Clinical specimens (n=105) were obtained from three cohorts from two highly conserved HIV-1C epidemics (India and Ethiopia) and one diverse epidemic (Sweden). Additionally an EQC panel (n=10) was used to validate the protocol. HIV-NFLG was performed amplifying the HIV-genome (Gag-to-nef) in two fragments. NGS was performed using the Illumina HiSeq2500 after multiplexing 24 samples, followed by de novo assembly in Iterative Virus Assembler or VICUNA. Subtyping was carried out using several bioinformatics tools. Amplification of HIV-NFLG has 90% (95/105) success-rate in clinical specimens. NGS was successful in all clinical specimens (n=45) and EQA samples (n=10) attempted. The mean error for mutations for the EQC panel viruses were <1%. Subtyping identified two as A1C recombinant. Our results demonstrate the feasibility of a simple NGS-based HIV-NFLG that can potentially be used in the molecular surveillance for effective identification of subtypes and transmission clusters for operational public health intervention.

  20. De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation

    PubMed Central

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L.; Chang, Bill Chia-Han; Matzke, Antonius J. M.; Matzke, Marjori

    2014-01-01

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. PMID:25193496

  1. De Novo Transcriptome Sequencing of the Orange-Fleshed Sweet Potato and Analysis of Differentially Expressed Genes Related to Carotenoid Biosynthesis

    PubMed Central

    Li, Ruijie; Zhai, Hong; Kang, Chen; Liu, Degao; He, Shaozhen; Liu, Qingchang

    2015-01-01

    Sweet potato, Ipomoea batatas (L.) Lam., is an important food crop worldwide. The orange-fleshed sweet potato is considered to be an important source of beta-carotene. In this study, the transcriptome profiles of an orange-fleshed sweet potato cultivar “Weiduoli” and its mutant “HVB-3” with high carotenoid content were determined by using the high-throughput sequencing technology. A total of 13,767,387 and 9,837,090 high-quality reads were produced from Weiduoli and HVB-3, respectively. These reads were de novo assembled into 58,277 transcripts and 35,909 unigenes with an average length of 596 bp and 533 bp, respectively. In all, 874 differentially expressed genes (DEGs) were obtained between Weiduoli and HVB-3, 401 of which were upregulated and 473 were downregulated in HVB-3 compared to Weiduoli. Of the 697 DEGs annotated, 316 DEGs had GO terms and 62 DEGs were mapped onto 50 pathways. The 22 DEGs and 31 transcription factors involved in carotenoid biosynthesis were identified between Weiduoli and HVB-3. In addition, 1,725 SSR markers were detected. This study provides the genomic resources for discovering the genes involved in carotenoid biosynthesis of sweet potato and other plants. PMID:26649293

  2. De Novo Transcriptome Sequencing of the Orange-Fleshed Sweet Potato and Analysis of Differentially Expressed Genes Related to Carotenoid Biosynthesis.

    PubMed

    Li, Ruijie; Zhai, Hong; Kang, Chen; Liu, Degao; He, Shaozhen; Liu, Qingchang

    2015-01-01

    Sweet potato, Ipomoea batatas (L.) Lam., is an important food crop worldwide. The orange-fleshed sweet potato is considered to be an important source of beta-carotene. In this study, the transcriptome profiles of an orange-fleshed sweet potato cultivar "Weiduoli" and its mutant "HVB-3" with high carotenoid content were determined by using the high-throughput sequencing technology. A total of 13,767,387 and 9,837,090 high-quality reads were produced from Weiduoli and HVB-3, respectively. These reads were de novo assembled into 58,277 transcripts and 35,909 unigenes with an average length of 596 bp and 533 bp, respectively. In all, 874 differentially expressed genes (DEGs) were obtained between Weiduoli and HVB-3, 401 of which were upregulated and 473 were downregulated in HVB-3 compared to Weiduoli. Of the 697 DEGs annotated, 316 DEGs had GO terms and 62 DEGs were mapped onto 50 pathways. The 22 DEGs and 31 transcription factors involved in carotenoid biosynthesis were identified between Weiduoli and HVB-3. In addition, 1,725 SSR markers were detected. This study provides the genomic resources for discovering the genes involved in carotenoid biosynthesis of sweet potato and other plants. PMID:26649293

  3. De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

    PubMed

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori

    2014-11-01

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. PMID:25193496

  4. De novo transcriptome sequencing of Agropyron cristatum to identify available gene resources for the enhancement of wheat.

    PubMed

    Zhang, Jinpeng; Liu, Weihua; Han, Haiming; Song, Liqiang; Bai, Li; Gao, Zhihui; Zhang, Yan; Yang, Xinming; Li, Xiuquan; Gao, Ainong; Li, Lihui

    2015-08-01

    Agropyron cristatum is a wild grass of the tribe Triticeae that is widely grown in harsh environments. As a wild relative of wheat, A. cristatum carries many resistance genes that could be used to broaden the genetic diversity of wheat. Here, we report the transcriptome sequencing of the flag leaf and young spike tissues of a representative tetraploid A. cristatum. More than 90 million reads from the two tissues were assembled into 73,664 unigenes. All unigenes were functionally annotated against the KEGG, COG, and Gene Ontology databases and predicted long non-coding RNAs. Pfam prediction demonstrates that A. cristatum carries an abundance of stress resistance genes. The extent of specific genes and rare alleles make A. cristatum a vital genetic reservoir for the improvement of wheat. Altogether, the available gene resources in A. cristatum facilitate efforts to harness the genetic diversity of wild relatives to enhance wheat. PMID:25889708

  5. De novo transcriptome sequencing of Cryptotermes domesticus and comparative analysis of gene expression in response to different wood species.

    PubMed

    Wu, Wenjing; Huang, Zhenyou; Li, Zhiqiang; Zhang, Shijun; Liu, Xiaolin; Gu, Daifei

    2016-01-10

    The drywood termite Cryptotermes domesticus is an important worldwide pest with limited genomic resources that causes substantial damage to dry timber and structural lumber. Here, we performed transcriptome sequencing for Cr. domesticus pseudergate using Illumina paired-end sequencing technology. A total of 108,745,470 clean reads were collected and assembled into 302,979 contigs with an average length of 648bp and an N50 length of 893bp. A total of 185,248 unigenes and 100,680 proteins were identified among the assembled contigs. Of these, there were 152,317 (50.27%) contigs with significant similarity to publicly available databases. To understand how the termites respond to phylogenetically diverse wood species, variations in gene expression were examined among pseudergates feeding on three wood species from different plant families, Casuarina equisetifolia (CE), Koompassia excelsa (KE) and Myristica sp. (MS). A total of 417 (118 up-regulated/299 down-regulated), 599 (148 up-regulated/451 down-regulated) and 505 (223 up-regulated/282 down-regulated) differentially expressed genes were detected in KE vs. CE, KE vs. MS and CE vs. MS, respectively. Digital gene expression analysis indicated that different wood species played an important role in the expression of termite genes, such as genes involved in carbohydrate metabolism, and proteins with catalytic activity and hydrolase activity. Additionally, the genes encoding cellulase were identified and analyzed. This study provides the first primary transcriptome of Cr. domesticus and lays a foundation for future functional genomics studies in the feeding responses. PMID:26410413

  6. De novo assembly and characterization of the spleen transcriptome of common carp (Cyprinus carpio) using Illumina paired-end sequencing.

    PubMed

    Li, Guoxi; Zhao, Yinli; Liu, Zhonghu; Gao, Chunsheng; Yan, Fengbin; Liu, Bianzhi; Feng, Jianxin

    2015-06-01

    Common carp (Cyprinus carpio) is one of the most important aquacultured species of the family Cyprinidae, and breeding this species for disease resistance is becoming more and more important. However, at the genome or transcriptome levels, study of the immunogenetics of disease resistance in the common carp is lacking. In this study, 60,316,906 and 75,200,328 paired-end clean reads were obtained from two cDNA libraries of the common carp spleen by Illumina paired-end sequencing technology. Totally, 130,293 unique transcript fragments (unigenes) were assembled, with an average length of 1400.57 bp. Approximately 105,612 (81.06%) unigenes could be annotated according to their homology with matches in the Nr, Nt, Swiss-Prot, COG, GO, or KEGG databases, and they were found to represent 46,747 non-redundant genes. Comparative analysis showed that 59.82% of the unigenes have significant similarity to zebrafish Refseq proteins. Gene expression comparison revealed that 10,432 and 6889 annotated unigenes were, respectively, up- and down-regulated with at least twofold changes between two developmental stages of the common carp spleen. Gene ontology and KEGG analysis were performed to classify all unigenes into functional categories for understanding gene functions and regulation pathways. In addition, 46,847 simple sequence repeats (SSRs) were detected from 35,618 unigenes, and a large number of single nucleotide polymorphism (SNP) and insertion/deletion (INDEL) sites were identified in the spleen transcriptome of common carp. This study has characterized the spleen transcriptome of the common carp for the first time, providing a valuable resource for a better understanding of the common carp immune system and defense mechanisms. This knowledge will also facilitate future functional studies on common carp immunogenetics that may eventually be applied in breeding programs.

  7. Transcriptome sequencing and de novo analysis of cytoplasmic male sterility and maintenance in JA-CMS cotton.

    PubMed

    Yang, Peng; Han, Jinfeng; Huang, Jinling

    2014-01-01

    Cytoplasmic male sterility (CMS) is the failure to produce functional pollen, which is inherited maternally. And it is known that anther development is modulated through complicated interactions between nuclear and mitochondrial genes in sporophytic and gametophytic tissues. However, an unbiased transcriptome sequencing analysis of CMS in cotton is currently lacking in the literature. This study compared differentially expressed (DE) genes of floral buds at the sporogenous cells stage (SS) and microsporocyte stage (MS) (the two most important stages for pollen abortion in JA-CMS) between JA-CMS and its fertile maintainer line JB cotton plants, using the Illumina HiSeq 2000 sequencing platform. A total of 709 (1.8%) DE genes including 293 up-regulated and 416 down-regulated genes were identified in JA-CMS line comparing with its maintainer line at the SS stage, and 644 (1.6%) DE genes with 263 up-regulated and 381 down-regulated genes were detected at the MS stage. By comparing the two stages in the same material, there were 8 up-regulated and 9 down-regulated DE genes in JA-CMS line and 29 up-regulated and 9 down-regulated DE genes in JB maintainer line at the MS stage. Quantitative RT-PCR was used to validate 7 randomly selected DE genes. Bioinformatics analysis revealed that genes involved in reduction-oxidation reactions and alpha-linolenic acid metabolism were down-regulated, while genes pertaining to photosynthesis and flavonoid biosynthesis were up-regulated in JA-CMS floral buds compared with their JB counterparts at the SS and/or MS stages. All these four biological processes play important roles in reactive oxygen species (ROS) homeostasis, which may be an important factor contributing to the sterile trait of JA-CMS. Further experiments are warranted to elucidate molecular mechanisms of these genes that lead to CMS.

  8. De novo transcriptome sequencing of Cryptotermes domesticus and comparative analysis of gene expression in response to different wood species.

    PubMed

    Wu, Wenjing; Huang, Zhenyou; Li, Zhiqiang; Zhang, Shijun; Liu, Xiaolin; Gu, Daifei

    2016-01-10

    The drywood termite Cryptotermes domesticus is an important worldwide pest with limited genomic resources that causes substantial damage to dry timber and structural lumber. Here, we performed transcriptome sequencing for Cr. domesticus pseudergate using Illumina paired-end sequencing technology. A total of 108,745,470 clean reads were collected and assembled into 302,979 contigs with an average length of 648bp and an N50 length of 893bp. A total of 185,248 unigenes and 100,680 proteins were identified among the assembled contigs. Of these, there were 152,317 (50.27%) contigs with significant similarity to publicly available databases. To understand how the termites respond to phylogenetically diverse wood species, variations in gene expression were examined among pseudergates feeding on three wood species from different plant families, Casuarina equisetifolia (CE), Koompassia excelsa (KE) and Myristica sp. (MS). A total of 417 (118 up-regulated/299 down-regulated), 599 (148 up-regulated/451 down-regulated) and 505 (223 up-regulated/282 down-regulated) differentially expressed genes were detected in KE vs. CE, KE vs. MS and CE vs. MS, respectively. Digital gene expression analysis indicated that different wood species played an important role in the expression of termite genes, such as genes involved in carbohydrate metabolism, and proteins with catalytic activity and hydrolase activity. Additionally, the genes encoding cellulase were identified and analyzed. This study provides the first primary transcriptome of Cr. domesticus and lays a foundation for future functional genomics studies in the feeding responses.

  9. Development of an expressed gene catalogue and molecular markers from the de novo assembly of short sequence reads of the lentil (Lens culinaris Medik.) transcriptome.

    PubMed

    Verma, Priyanka; Shah, Niraj; Bhatia, Sabhyata

    2013-09-01

    Genomic resources such as ESTs, molecular markers and linkage maps are essential for crop improvement. However, these resources are still limited in important legumes such as lentil (Lens culinaris Medik.), which is valued world wide as a rich source of dietary protein. In this study, the de novo transcriptome assembly of 119,855,798 short reads, generated by Illumina paired-end sequencing, was performed using various assembly programs. This resulted in 42,196 nonredundant high-quality transcripts of average length 810 bases, N50 value of 1,432 and an average expression per transcript of 26.21 rpkm reads per kilobase per million(RPKM). Similarity search with the unigenes and protein sequences of other plants resulted in maximum similarity with soybean. A total of 20,009 nonredundant transcripts showed similarity with the UniProtKB database and of these, 18,064 transcripts were grouped into three main GO categories, that is, biological process (15,126), molecular function (15,505) and cellular component (9,434). Annotated transcripts were mapped to 289 predicted Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and 8,893 transcripts were classified into 24 functional categories based on Cluster of Orthologous Groups (COG) of proteins. Mining the data set for the presence of SSRs resulted in 8,722 SSRs with a frequency occurrence of one SSR per 3.92 kb. From these, 5,673 SSR primer pairs were designed, and a subset of these were utilized for diversity analysis. This study, which provides a large data set of annotated transcripts and gene-based SSR markers, would serve as a foundation for various applications in lentil breeding and genetics.

  10. De novo sequencing and comprehensive analysis of the mutant transcriptome from purple sweet potato (Ipomoea batatas L.).

    PubMed

    Ma, Peiyong; Bian, Xiaofeng; Jia, Zhaodong; Guo, Xiaoding; Xie, Yizhi

    2016-01-10

    Purple sweet potatoes, rich in anthocyanin, have been widely favored in light of increasing awareness of health and food safety. In this study, a mutant of purple sweet potato (white peel and flesh) was used to study anthocyanin metabolism by high-throughput RNA sequencing and comparative analysis of the mutant and wild type transcriptomes. A total of 88,509 unigenes ranging from 200nt to 14,986nt with an average length of 849nt were obtained. Unigenes were assigned to Gene Ontology (GO), Clusters of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Functional enrichment using GO and KEGG annotations showed that 3828 of the differently expressed genes probably influenced many important biological and metabolic pathways, including anthocyanin biosynthesis. Most importantly, the structural and transcription factor genes that contribute to anthocyanin biosynthesis were downregulated in the mutant. The unigene dataset that was used to discover the anthocyanin candidate genes can serve as a comprehensive resource for molecular research in sweet potato.

  11. De novo sequencing and comprehensive analysis of the mutant transcriptome from purple sweet potato (Ipomoea batatas L.).

    PubMed

    Ma, Peiyong; Bian, Xiaofeng; Jia, Zhaodong; Guo, Xiaoding; Xie, Yizhi

    2016-01-10

    Purple sweet potatoes, rich in anthocyanin, have been widely favored in light of increasing awareness of health and food safety. In this study, a mutant of purple sweet potato (white peel and flesh) was used to study anthocyanin metabolism by high-throughput RNA sequencing and comparative analysis of the mutant and wild type transcriptomes. A total of 88,509 unigenes ranging from 200nt to 14,986nt with an average length of 849nt were obtained. Unigenes were assigned to Gene Ontology (GO), Clusters of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Functional enrichment using GO and KEGG annotations showed that 3828 of the differently expressed genes probably influenced many important biological and metabolic pathways, including anthocyanin biosynthesis. Most importantly, the structural and transcription factor genes that contribute to anthocyanin biosynthesis were downregulated in the mutant. The unigene dataset that was used to discover the anthocyanin candidate genes can serve as a comprehensive resource for molecular research in sweet potato. PMID:26410411

  12. Characterization and de novo sequencing of snow crab tropomyosin enzymatic peptides by both electrospray ionization and matrix-assisted laser desorption ionization QqToF tandem mass spectrometry.

    PubMed

    Abdel Rahman, Anas M; Lopata, Andreas L; O'Hehir, Robyn E; Robinson, John J; Banoub, Joseph H; Helleur, Robert J

    2010-04-01

    The protein tropomyosin (TM) is a known major allergen present in shellfish causing frequent food allergies. TM is also an occupational allergen generated in the working environment of snow crab (Chionoecetes opilio) processing plants. The TM protein was purified from both claw and leg meats of snow crab and analyzed by electrospray ionization and matrix-assisted laser desorption/ionization (MALDI) using hybrid quadruple time-of-flight tandem mass spectrometry (QqToF-MS). The native polypeptide molecular weight of TM was determined to be 32,733 Da. The protein was further characterized using the 'bottom-up' MS approach. A peptide mass fingerprinting was obtained by two different enzymatic digestions and de novo sequencing of the most abundant peptides performed. Any post-translational modifications were identified by searching their calculated and predicted molecular weights in precursor ion spectra. The immunological reactivity of snow crab extract was evaluated using specific antibodies and allergenic reactivity assessed with serum of allergic patients. Subsequently, a signature peptide for TM was identified and evaluated in terms of identity and homology using the basic local alignment search tool (BLAST). The identification of a signature peptide for the allergen TM using MALDI-QqToF-MS will be critical for the sensitive and specific quantification of this highly allergenic protein in the work place.

  13. De novo sequence assembly and characterisation of a partial transcriptome for an evolutionarily distinct reptile, the tuatara (Sphenodon punctatus)

    PubMed Central

    2012-01-01

    Background The tuatara (Sphenodon punctatus) is a species of extraordinary zoological interest, being the only surviving member of an entire order of reptiles which diverged early in amniote evolution. In addition to their unique phylogenetic placement, many aspects of tuatara biology, including temperature-dependent sex determination, cold adaptation and extreme longevity have the potential to inform studies of genome evolution and development. Despite increasing interest in the tuatara genome, genomic resources for the species are still very limited. We aimed to address this by assembling a transcriptome for tuatara from an early-stage embryo, which will provide a resource for genome annotation, molecular marker development and studies of development and adaptation in tuatara. Results We obtained 30 million paired-end 50 bp reads from an Illumina Genome Analyzer and assembled them with Velvet and Oases using a range of kmers. After removing redundancy and filtering out low quality transcripts, our transcriptome dataset contained 32911 transcripts, with an N50 of 675 and a mean length of 451 bp. Almost 50% (15965) of these transcripts could be annotated by comparison with the NCBI non-redundant (NR) protein database or the chicken, green anole and zebrafish UniGene sets. A scan of candidate genes and repetitive elements revealed genes involved in immune function, sex differentiation and temperature-sensitivity, as well as over 200 microsatellite markers. Conclusions This dataset represents a major increase in genomic resources for the tuatara, increasing the number of annotated gene sequences from just 60 to almost 16,000. This will facilitate future research in sex determination, genome evolution, local adaptation and population genetics of tuatara, as well as inform studies on amniote evolution. PMID:22938396

  14. SNP detection from de novo transcriptome sequencing in the bivalve Macoma balthica: marker development for evolutionary studies.

    PubMed

    Pante, Eric; Rohfritsch, Audrey; Becquet, Vanessa; Belkhir, Khalid; Bierne, Nicolas; Garcia, Pascale

    2012-01-01

    Hybrid zones are noteworthy systems for the study of environmental adaptation to fast-changing environments, as they constitute reservoirs of polymorphism and are key to the maintenance of biodiversity. They can move in relation to climate fluctuations, as temperature can affect both selection and migration, or remain trapped by environmental and physical barriers. There is therefore a very strong incentive to study the dynamics of hybrid zones subjected to climate variations. The infaunal bivalve Macoma balthica emerges as a noteworthy model species, as divergent lineages hybridize, and its native NE Atlantic range is currently contracting to the North. To investigate the dynamics and functioning of hybrid zones in M. balthica, we developed new molecular markers by sequencing the collective transcriptome of 30 individuals. Ten individuals were pooled for each of the three populations sampled at the margins of two hybrid zones. A single 454 run generated 277 Mb from which 17K SNPs were detected. SNP density averaged 1 polymorphic site every 14 to 19 bases, for mitochondrial and nuclear loci, respectively. An [Formula: see text] scan detected high genetic divergence among several hundred SNPs, some of them involved in energetic metabolism, cellular respiration and physiological stress. The high population differentiation, recorded for nuclear-encoded ATP synthase and NADH dehydrogenase as well as most mitochondrial loci, suggests cytonuclear genetic incompatibilities. Results from this study will help pave the way to a high-resolution study of hybrid zone dynamics in M. balthica, and the relative importance of endogenous and exogenous barriers to gene flow in this system. PMID:23300636

  15. The role of melanin pathways in extremotolerance and virulence of Fonsecaea revealed by de novo assembly transcriptomics using illumina paired-end sequencing.

    PubMed

    Li, X Q; Guo, B L; Cai, W Y; Zhang, J M; Huang, H Q; Zhan, P; Xi, L Y; Vicente, V A; Stielow, B; Sun, J F; de Hoog, G S

    2016-01-01

    Melanisation has been considered to be an important virulence factor of Fonsecaea monophora. However, the biosynthetic mechanisms of melanisation remain unknown. We therefore used next generation sequencing technology to investigate the transcriptome and digital gene expression data, which are valuable resources to better understand the molecular and biological mechanisms regulating melanisation in F. monophora. We performed de novo transcriptome assembly and digital gene expression (DGE) profiling analyses of parent (CBS 122845) and albino (CBS 125194) strains using the Illumina RNA-seq system. A total of 17 352 annotated unigenes were found by BLAST search of NR, Swiss-Prot, Gene Ontology, Clusters of Orthologous Groups and Kyoto Encyclopedia of Genes and Genomes (KEGG) (E-value <1e‒5). A total of 2 283 unigenes were judged to be the differentially expressed between the two genotypes. We identified most of the genes coding for key enzymes involved in melanin biosynthesis pathways, including polyketide synthase (pks), multicopper oxidase (mco), laccase, tyrosinase and homogentisate 1,2-dioxygenase (hmgA). DEG analysis showed extensive down-regulation of key genes in the DHN pathway, while up-regulation was noted in the DOPA pathway of the albino mutant. The transcript levels of partial genes were confirmed by real time RT-PCR, while the crucial role of key enzymes was confirmed by either inhibitor or substrate tests in vitro. Meanwhile, numbers of genes involved in light sensing, cell wall synthesis, morphology and environmental stress were identified in the transcriptome of F. monophora. In addition, 3 353 SSRs (Simple Sequence Repeats) markers were identified from 21 600 consensus sequences. Blocking of the DNH pathway is the most likely reason of melanin deficiency in the albino strain, while the production of pheomelanin and pyomelanin were probably regulated by unknown transcription factors on upstream of both pathways. Most of genes involved in

  16. Imparting functionality to biocatalysts via embedding enzymes into nanoporous materials by a de novo approach: size-selective sheltering of catalase in metal-organic framework microcrystals.

    PubMed

    Shieh, Fa-Kuen; Wang, Shao-Chun; Yen, Chia-I; Wu, Chang-Cheng; Dutta, Saikat; Chou, Lien-Yang; Morabito, Joseph V; Hu, Pan; Hsu, Ming-Hua; Wu, Kevin C-W; Tsung, Chia-Kuang

    2015-04-01

    We develop a new concept to impart new functions to biocatalysts by combining enzymes and metal-organic frameworks (MOFs). The proof-of-concept design is demonstrated by embedding catalase molecules into uniformly sized ZIF-90 crystals via a de novo approach. We have carried out electron microscopy, X-ray diffraction, nitrogen sorption, electrophoresis, thermogravimetric analysis, and confocal microscopy to confirm that the ~10 nm catalase molecules are embedded in 2 μm single-crystalline ZIF-90 crystals with ~5 wt % loading. Because catalase is immobilized and sheltered by the ZIF-90 crystals, the composites show activity in hydrogen peroxide degradation even in the presence of protease proteinase K.

  17. De novo transcriptome sequencing in Bixa orellana to identify genes involved in methylerythritol phosphate, carotenoid and bixin biosynthesis

    SciTech Connect

    Cárdenas-Conejo, Yair; Carballo-Uicab, Víctor; Lieberman, Meric; Aguilar-Espinosa, Margarita; Comai, Luca; Rivera-Madrid, Renata

    2015-10-28

    Bixin or annatto is a commercially important natural orange-red pigment derived from lycopene that is produced and stored in seeds of Bixa orellana L. An enzymatic pathway for bixin biosynthesis was inferred from homology of putative proteins encoded by differentially expressed seed cDNAs. Some activities were later validated in a heterologous system. Nevertheless, much of the pathway remains to be clarified. For example, it is essential to identify the methylerythritol phosphate (MEP) and carotenoid pathways genes. In order to investigate the MEP, carotenoid, and bixin pathways genes, total RNA from young leaves and two different developmental stages of seeds from B. orellana were used for the construction of indexed mRNA libraries, sequenced on the Illumina HiSeq 2500 platform and assembled de novo using Velvet, CLC Genomics Workbench and CAP3 software. A total of 52,549 contigs were obtained with average length of 1,924 bp. Two phylogenetic analyses of inferred proteins, in one case encoded by thirteen general, single-copy cDNAs, in the other from carotenoid and MEP cDNAs, indicated that B. orellana is closely related to sister Malvales species cacao and cotton. Using homology, we identified 7 and 14 core gene products from the MEP and carotenoid pathways, respectively. Surprisingly, previously defined bixin pathway cDNAs were not present in our transcriptome. Here we propose a new set of gene products involved in bixin pathway. In conclusion, the identification and qRT-PCR quantification of cDNAs involved in annatto production suggest a hypothetical model for bixin biosynthesis that involve coordinated activation of some MEP, carotenoid and bixin pathway genes. These findings provide a better understanding of the mechanisms regulating these pathways and will facilitate the genetic improvement of B. orellana.

  18. Cross-Curricular Sequence: An Approach for Teaching Business Communication.

    ERIC Educational Resources Information Center

    Clarke, Lillian W.; Franklin, Carl M.

    1985-01-01

    The Cross-Curricular Sequencing (CCS) approach to teaching business communications is explored. Its uses in word processing, principles of management, and business policy courses are discussed. Techniques for integrating materials from these courses into business communication classes are described. The implications of CCS for business…

  19. Tracing the evolutionary lineage of pattern recognition receptor homologues in vertebrates: An insight into reptilian immunity via de novo sequencing of the wall lizard splenic transcriptome.

    PubMed

    Priyam, Manisha; Tripathy, Mamta; Rai, Umesh; Ghorai, Soma Mondal

    2016-04-01

    Reptiles remain a deprived class in the area of genomic and molecular resources for the vertebrate classes. The transition of squamates from aquatic to terrestrial mode of life caused profound changes in their immune system to combat the altered variety of pathogens on land. The current study aims at delineating the evolution of defence mechanisms in wall lizard, Hemidactylus flaviviridis, by exploring its immunome. De novo sequencing of splenic transcriptome from wall lizard on the Illumina Hi-Seq platform generated 258,128 unique transcripts with an average GC content of 45%. Annotation of 555,557 and 6812 transcripts was carried out against NCBI (non-redundant database) and UniProt databases, respectively. The KEGG pathway annotation of transcripts classified them into 39 processes of six pathway function categories. A total of 3824 transcripts, involved in 23 immune-related pathways, were identified in the immune-relevant cluster built by harvesting the genes under KEGG pathways of immune system and immune diseases. Forty-two percent of the immune-relevant cluster was represented by pattern-recognition receptors (PRRs), of which the maximum number of transcripts was attributed to the Toll-like receptor (TLR) signalling pathway. Nine PRRs with potential full-length coding sequences were sorted for phylogenetic analysis and comparative domain analysis across the vertebrate lineage. They included DEC205/lymphocyte antigen 75 (ly75), nucleotide-binding oligomerisation domain-containing protein 1 (NOD1), NOD-like receptor family CARD domain-containing 3 (NLRC3), nucleotide-binding oligomerisation domain, leucine-rich repeat-containing X1 (NLRX1), DDX58/retinoic acid-inducible gene 1 (RIG-1), Toll-like receptor 3 (TLR3), TLR4, TLR5 and TLR7. From selection studies of these genes, we inferred positive selection for ly75, NOD1, RIG-1, TLR3 and TLR4. Apart from contributing to the scarce genomic resources available for reptiles and giving a broad scope for the immune

  20. De novo sequencing and analysis of the Ulva linza transcriptome to discover putative mechanisms associated with its successful colonization of coastal ecosystems

    PubMed Central

    2012-01-01

    Background The green algal genus Ulva Linnaeus (Ulvaceae, Ulvales, Chlorophyta) is well known for its wide distribution in marine, freshwater, and brackish environments throughout the world. The Ulva species are also highly tolerant of variations in salinity, temperature, and irradiance and are the main cause of green tides, which can have deleterious ecological effects. However, limited genomic information is currently available in this non-model and ecologically important species. Ulva linza is a species that inhabits bedrock in the mid to low intertidal zone, and it is a major contributor to biofouling. Here, we presented the global characterization of the U. linza transcriptome using the Roche GS FLX Titanium platform, with the aim of uncovering the genomic mechanisms underlying rapid and successful colonization of the coastal ecosystems. Results De novo assembly of 382,884 reads generated 13,426 contigs with an average length of 1,000 bases. Contiguous sequences were further assembled into 10,784 isotigs with an average length of 1,515 bases. A total of 304,101 reads were nominally identified by BLAST; 4,368 isotigs were functionally annotated with 13,550 GO terms, and 2,404 isotigs having enzyme commission (EC) numbers were assigned to 262 KEGG pathways. When compared with four other full sequenced green algae, 3,457 unique isotigs were found in U. linza and 18 conserved in land plants. In addition, a specific photoprotective mechanism based on both LhcSR and PsbS proteins and a C4-like carbon-concentrating mechanism were found, which may help U. linza survive stress conditions. At least 19 transporters for essential inorganic nutrients (i.e., nitrogen, phosphorus, and sulphur) were responsible for its ability to take up inorganic nutrients, and at least 25 eukaryotic cytochrome P450s, which is a higher number than that found in other algae, may be related to their strong allelopathy. Multi-origination of the stress related proteins, such as glutamate

  1. Are iridoids in leaf beetle larvae synthesized de novo or derived from plant precursors? A methodological approach.

    PubMed

    Søe, Astrid R B; Bartram, Stefan; Gatto, Nathalie; Boland, Wilhelm

    2004-09-01

    Iridoids, belonging to a group of cyclopentanoid monoterpenoids, are secreted by many species of leaf beetles as a defense against predators. Using chemically modified precursors of iridoid biosynthesis, it has been shown that some leaf beetle larvae can synthesize these iridoids de novo as well as sequester plant-produced molecules. Stable isotope techniques can provide useful methods for studying terpenoid biosynthesis without disturbing the natural conditions much. Two terpenoid biosynthesis pathways (mevalonic acid (MVA) pathway and methylerythritol-4-phosphate (MEP) pathway) may lead to different delta13C signatures of the products. Our results from natural abundance 13C and 13C-labelled iridoid precursors in Gastrophysa viridula and Phaedon cochleariae suggested that the two leaf beetle species use only de novo synthesis of their defensive iridoids. We observed that the isotope signature of the leaf-beetle-produced iridoids (via the MVA pathway) resembled that of the MEP-derived monoterpenoids from plants. Owing to this close similarity in the natural 13C abundances in the plant and insect compounds, a determination of iridoid-origin in leaf beetle secretion may only be possible by use of isotopically labelled compounds.

  2. Identification of potential inhibitors for AIRS from de novo purine biosynthesis pathway through molecular modeling studies - a computational approach.

    PubMed

    Rao, R Guru Raj; Biswal, Jayashree; Dhamodharan, Prabhu; Kanagarajan, Surekha; Jeyaraman, Jeyakanthan

    2016-10-01

    In cancer, de novo pathway plays an important role in cell proliferation by supplying huge demand of purine nucleotides. Aminoimidazole ribonucleotide synthetase (AIRS) catalyzes the fifth step of de novo purine biosynthesis facilitating in the conversion of formylglycinamidine ribonucleotide to aminoimidazole ribonucleotide. Hence, inhibiting AIRS is crucial due to its involvement in the regulation of uncontrollable cancer cell proliferation. In this study, the three-dimensional structure of AIRS from P. horikoshii OT3 was constructed based on the crystal structure from E. coli and the modeled protein is verified for stability using molecular dynamics for a time frame of 100 ns. Virtual screening and induced fit docking were performed to identify the best antagonists based on their binding mode and affinity. Through mutational studies, the residues necessary for catalytic activity of AIRS were identified and among which the following residues Lys35, Asp103, Glu137, and Thr138 are important in determination of AIRS function. The mutational studies help to understand the structural and energetic characteristics of the specified residues. In addition to Molecular Dynamics, ADME properties, binding free-energy, and density functional theory calculations of the compounds were carried out to find the best lead molecule. Based on these analyses, the compound from the NCI database, NCI_121957 was adjudged as the best molecule and could be suggested as the suitable inhibitor of AIRS. In future studies, experimental validation of these ligands as AIRS inhibitors will be carried out.

  3. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

    PubMed

    Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions. PMID:26908260

  4. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach

    PubMed Central

    Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P.

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions. PMID:26908260

  5. Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

    PubMed

    Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P

    2016-01-01

    Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.

  6. De novo design, synthesis, and pharmacology of alpha-melanocyte stimulating hormone analogues derived from somatostatin by a hybrid approach.

    PubMed

    Han, Guoxia; Haskell-Luevano, Carrie; Kendall, Laura; Bonner, Gregg; Hadley, Mac E; Cone, Roger D; Hruby, Victor J

    2004-03-11

    A number of alpha-melanotropin (alpha-MSH) analogues have been designed de novo, synthesized, and bioassayed at different melanocortin receptors from frog skin (fMC1R) and mouse/rat (mMC1R, rMC3R, mMC4R, and mMC5R). These ligands were designed from somatostatin by a hybrid approach, which utilizes a modified cyclic structure (H-d-Phe-c[Cys---Cys]-Thr-NH(2)) related to somatostatin analogues (e.g. sandostatin) acting at somatostatin receptors, CTAP which binds specifically to micro opioid receptors, and the core pharmacophore of alpha-MSH (His-Phe-Arg-Trp). Ligands designed were H-d-Phe-c[XXX-YYY-ZZZ-Arg-Trp-AAA]-Thr-NH(2) [XXX and AAA = Cys, d-Cys, Hcy, Pen, d-Pen; YYY = His, His(1'-Me), His(3'-Me); ZZZ = Phe and side chain halogen substituted Phe, d-Phe, d-Nal(1'), and d-Nal(2')]. The compounds showed a wide range of bioactivities at the frog skin MC1R; e.g. H-d-Phe-c[Hcy-His-d-Phe-Arg-Trp-Cys]-Thr-NH(2) (6, EC(50) = 0.30 nM) and H-d-Phe-c[Cys-His-d-Phe-Arg-Trp-d-Cys]-Thr-NH(2) (8, EC(50) = 0.10 nM). In addition, when a lactam bridge was used as in H-d-Phe-c[Asp-His-d-Phe-Arg-Trp-Lys]-Thr-NH(2) (7, EC(50) = 0.10 nM), the analogue obtained is as potent as alpha-MSH in the frog skin MC1R assay. Interestingly, switching the bridge of 6 to give H-d-Phe-c[Cys-His-d-Phe-Arg-Trp-Hcy]-Thr-NH(2) (5, EC(50) = 1000 nM) led to a 3000-fold decrease in agonist activity. An increase in steric size in the side chain of d-Phe(7) reduced the bioactivity significantly. For example, H-d-Phe-c[Cys-His-d-Nal(1')-Arg-Trp-d-Cys]-Thr-NH(2) (24) is 2000-fold less active than 9. On the other hand, H-d-Phe-c[Cys-His-d-Phe(p-I)-Arg-Trp-d-Cys]-Thr-NH(2) (23) lost all agonist activity and became a weak antagonist (IC(50) = 1 x 10(-5) M). Furthermore, the modified CTAP analogues with a d-Trp at position 7 all showed weak antagonist activities (EC(50) = 10(-6) to 10(-7) M). Compounds bioassayed at mouse/rat MCRs displayed intriguing results. Most of them are potent at all four receptors tested (m

  7. De novo sequencing, assembly, and analysis of the root transcriptome of Persea americana (Mill.) in response to Phytophthora cinnamomi and flooding.

    PubMed

    Reeksting, Bianca J; Coetzer, Nanette; Mahomed, Waheed; Engelbrecht, Juanita; van den Berg, Noëlani

    2014-01-01

    Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR) causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies. PMID:24563685

  8. De Novo Sequencing, Assembly, and Analysis of the Root Transcriptome of Persea americana (Mill.) in Response to Phytophthora cinnamomi and Flooding

    PubMed Central

    Reeksting, Bianca J.; Coetzer, Nanette; Mahomed, Waheed; Engelbrecht, Juanita; van den Berg, Noëlani

    2014-01-01

    Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR) causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies. PMID:24563685

  9. De novo sequencing, assembly, and analysis of the root transcriptome of Persea americana (Mill.) in response to Phytophthora cinnamomi and flooding.

    PubMed

    Reeksting, Bianca J; Coetzer, Nanette; Mahomed, Waheed; Engelbrecht, Juanita; van den Berg, Noëlani

    2014-01-01

    Avocado is a diploid angiosperm containing 24 chromosomes with a genome estimated to be around 920 Mb. It is an important fruit crop worldwide but is susceptible to a root rot caused by the ubiquitous oomycete Phytophthora cinnamomi. Phytophthora root rot (PRR) causes damage to the feeder roots of trees, causing necrosis. This leads to branch-dieback and eventual tree death, resulting in severe losses in production. Control strategies are limited and at present an integrated approach involving the use of phosphite, tolerant rootstocks, and proper nursery management has shown the best results. Disease progression of PRR is accelerated under high soil moisture or flooding conditions. In addition, avocado is highly susceptible to flooding, with even short periods of flooding causing significant losses. Despite the commercial importance of avocado, limited genomic resources are available. Next generation sequencing has provided the means to generate sequence data at a relatively low cost, making this an attractive option for non-model organisms such as avocado. The aims of this study were to generate sequence data for the avocado root transcriptome and identify stress-related genes. Tissue was isolated from avocado infected with P. cinnamomi, avocado exposed to flooding and avocado exposed to a combination of these two stresses. Three separate sequencing runs were performed on the Roche 454 platform and produced approximately 124 Mb of data. This was assembled into 7685 contigs, with 106 448 sequences remaining as singletons. Genes involved in defence pathways such as the salicylic acid and jasmonic acid pathways as well as genes associated with the response to low oxygen caused by flooding, were identified. This is the most comprehensive study of transcripts derived from root tissue of avocado to date and will provide a useful resource for future studies.

  10. Artificial intelligence approach in analysis of DNA sequences.

    PubMed

    Brézillon, P J; Zaraté, P; Saci, F

    1993-01-01

    We present an approach for designing a knowledge-based system, called Sequence Acquisition In Context (SAIC), that will be able to cooperate with a biologist in the analysis of DNA sequences. The main task of the system is the acquisition of the expert knowledge that the biologist uses for solving ambiguities from gel autoradiograms, with the aim of re-using it later for solving similar ambiguities. The various types of expert knowledge constitute what we call the contextual knowledge of the sequence analysis. Contextual knowledge deals with the unavoidable problems that are common in the study of the living material (eg noise on data, difficulties of observations). Indeed, the analysis of DNA sequences from autoradiograms belongs to an emerging and promising area of investigation, namely reasoning with images. The SAIC project is developed in a theoretical framework that is shared with other applications. Not all tasks have the same importance in each application. We use this observation for designing an intelligent assistant system with three applications. In the SAIC project, we focus on knowledge acquisition, human-computer interaction and explanation. The project will benefit research in the two other applications. We also discuss our SAIC project in the context of large international projects that aim to re-use and share knowledge in a repository.

  11. Deep sequencing approach for investigating infectious agents causing fever.

    PubMed

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis.

  12. Deep sequencing approach for investigating infectious agents causing fever.

    PubMed

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis. PMID:27180244

  13. PRO_LIGAND: an approach to de novo molecular design. 6. Flexible fitting in the design of peptides.

    PubMed

    Murray, C W; Clark, D E; Byrne, D G

    1995-10-01

    This paper describes the further development of the functionality of our in-house de novo design program, PRO_LIGAND. In particular, attention is focused on the implementation and validation of the 'direct tweak' method for the construction of conformationally flexible molecules, such as peptides, from molecular fragments. This flexible fitting method is compared to the original method based on libraries of prestored conformations for each fragment. It is shown that the directed tweak method produces results of comparable quality, with significant time savings. By removing the need to generate a set of representative conformers for any new library fragment, the flexible fitting method increases the speed and simplicity with which new fragments can be included in a fragment library and also reduces the disk space required for library storage. A further improvement to the molecular construction process within PRO_LIGAND is the inclusion of a constrained minimisation procedure which relaxes fragments onto the design model and can be used to reject highly strained structures during the structure generation phase. This relaxation is shown to be very useful in simple test cases, but restricts diversity for more realistic examples. The advantages and disadvantages of these additions to the PRO_LIGAND methodology are illustrated by three examples: similar design to an alpha helix region of dihydrofolate reductase, complementary design to the active site of HIV-1 protease and similar design to an epitope region of lysozyme. PMID:8594156

  14. An approach to pediatric exome and genome sequencing

    PubMed Central

    Biesecker, Leslie G.; Biesecker, Barbara B.

    2014-01-01

    Purpose of review Exome and genome sequencing have recently emerged as clinical tools to resolve undiagnosed genetic conditions. Protocols are critically needed to identify proper patients for testing; select a test and laboratory; engage parents in shared decision making; and the return of results. Recent findings Among well-selected patients, the likelihood for identifying the causative gene change may be as high as 30%. It is key for pediatricians to consider whether sequencing should be the primary line of pursuit of a molecular diagnosis. Parents should understand the uncertainties inherent in this sequencing and the preference-based nature of testing. Pediatricians can engage in shared decision making for this process and work to help parents make decisions consistent with their priorities and values. Upon receipt of a pathogenic mutation, discussion of the likelihood for future treatment is paramount to parents, as are the implications for recurrence within the family. Uncertainties inherent to genomic results need to be explained in the context of the likelihood of future research and discoveries. Summary Pediatricians should make a deliberate decision with each patient whether to manage genomic testing on their own, refer the patient for such testing, or initiate the process and refer simultaneously. Regardless of which approach is taken, understanding the basics of this testing will allow the pediatrician to support the parents through the diagnostic process. PMID:25304963

  15. Breakpoint mapping by whole genome sequencing identifies PTH2R gene disruption in a patient with midline craniosynostosis and a de novo balanced chromosomal rearrangement

    PubMed Central

    Kim, Juwon; Won, Hong-Hee; Kim, Yoonjung; Choi, Jong Rak; Yu, Nae; Lee, Kyung-A

    2015-01-01

    Background Craniosynostosis (CRS) is a premature closure of calvarial sutures caused by gene mutation or environmental factors or interaction between the two. Only a small proportion of non-syndromic CRS (NSC) patients have a known genetic cause, and thus, it would be meaningful to search for a causative gene disruption for the development NSC. We applied a whole genome sequencing approach on a 15-month-old boy with sagittal and metopic synostosis to identify a gene responsible for the development of the disease. Methods and results Conventional chromosome study revealed a complex paracentric inversion involving 2q14.3 and 2q34. Array comparative genomic hybridisation did not show any copy number variation. Multicolour banding analysis was carried out and the breakpoints were refined to 2q14 and 2q34. An intronic break of the PTH2R gene was detected by whole genome sequencing and fluorescence in situ hybridisation analysis confirmed disruption of PTH2R. Conclusions We report PTH2R as a gene that is disrupted in NSC. The disruption of the PTH2R gene may cause uncontrolled proliferation and differentiation of chondrocytes, which in turn results in premature closure of sutures. This addition of PTH2R to the list of genes associated with NSC expands our understanding of the development of NSC. PMID:26044810

  16. Development of an Electrochemistry Teaching Sequence using a Phenomenographic Approach

    NASA Astrophysics Data System (ADS)

    Rodriguez-Velazquez, Sorangel

    the core concepts from discipline-specific models and theories serve as visual tools to describe reversible redox half-reactions at equilibrium, predict the spontaneity of the electrochemical process and explain interfacial equilibrium between redox species and electrodes in solution. The integration of physics concepts into electrochemistry instruction facilitated describing the interactions between the chemical system (e.g., redox species) and the external circuit (e.g., voltmeter). The "Two worlds" theoretical framework was chosen to anchor a robust educational design where the world of objects and events is deliberately connected to the world of theories and models. The core concepts in Marcus theory and density of states (DOS) provided the scientific foundations to connect both worlds. The design of this teaching sequence involved three phases; the selection of the content to be taught, the determination of a coherent and explicit connection among concepts and the development of educational activities to engage students in the learning process. The reduction-oxidation and electrochemistry chapters of three of the most popular general chemistry textbooks were revised in order to identify potential gaps during instruction, taking into consideration learning and teaching difficulties. The electrochemistry curriculum was decomposed into manageable sections contained in modules. Thirteen modules were developed and each module addresses specific conceptions with regard to terminology, redox reactions in electrochemical cells, and the function of the external circuit in electrochemical process. The electrochemistry teaching sequence was evaluated using a phenomenographic approach. This approach allows describing the qualitative variation in instructors' consciousness about the teaching of electrochemistry. A phenomenographic analysis revealed that the most relevant aspect of variation came from instructors' expertise. Participant A expertise (electrochemist) promoted in

  17. Impact significance determination--Basic considerations and a sequenced approach

    SciTech Connect

    Canter, L.W.; Canty, G.A. . Environmental and Ground Water Inst.)

    1993-09-01

    Determination of the significance of anticipated impacts of proposed projects is a key component in the overall environmental impact assessment (EIA) process. Definitions of significance and/or significant impacts are now included in the EIA guidelines or regulations of many countries and international organizations. Where possible in an EIA study, it is desirable to identify and/or establish the significance determination criteria prior to actual study conduction. This paper summarizes some findings of a survey of such definitions resulting from American, European, and other international experiences; both generic definitions and substantive area definitions are highlighted. Traditional perspectives on significance determination have involved institutional, technical, and public interest considerations. A sequenced approach for impact significance determination is described, with this approach organized around ten groups of issues or questions. Finally, the uses of significance criteria can be noted; included in such uses are: (1) determining if an environmental impact statement (EIS) will be required, or if an environmental assessment/finding of no significant impact (EA/FONSI) will suffice; (2) identifying the impacts that should be mitigated; (3) planning a baseline and/or post-EIS environmental monitoring program; and (4) documenting the interpretive rationale used in the conduction of the environmental impact study.

  18. A Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families

    PubMed Central

    Li, Bingshan; Chen, Wei; Zhan, Xiaowei; Busonero, Fabio; Sanna, Serena; Sidore, Carlo; Cucca, Francesco; Kang, Hyun M.; Abecasis, Gonçalo R.

    2012-01-01

    Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases. PMID:23055937

  19. De novo mutations in epileptic encephalopathies.

    PubMed

    Allen, Andrew S; Berkovic, Samuel F; Cossette, Patrick; Delanty, Norman; Dlugos, Dennis; Eichler, Evan E; Epstein, Michael P; Glauser, Tracy; Goldstein, David B; Han, Yujun; Heinzen, Erin L; Hitomi, Yuki; Howell, Katherine B; Johnson, Michael R; Kuzniecky, Ruben; Lowenstein, Daniel H; Lu, Yi-Fan; Madou, Maura R Z; Marson, Anthony G; Mefford, Heather C; Esmaeeli Nieh, Sahar; O'Brien, Terence J; Ottman, Ruth; Petrovski, Slavé; Poduri, Annapurna; Ruzzo, Elizabeth K; Scheffer, Ingrid E; Sherr, Elliott H; Yuskaitis, Christopher J; Abou-Khalil, Bassel; Alldredge, Brian K; Bautista, Jocelyn F; Berkovic, Samuel F; Boro, Alex; Cascino, Gregory D; Consalvo, Damian; Crumrine, Patricia; Devinsky, Orrin; Dlugos, Dennis; Epstein, Michael P; Fiol, Miguel; Fountain, Nathan B; French, Jacqueline; Friedman, Daniel; Geller, Eric B; Glauser, Tracy; Glynn, Simon; Haut, Sheryl R; Hayward, Jean; Helmers, Sandra L; Joshi, Sucheta; Kanner, Andres; Kirsch, Heidi E; Knowlton, Robert C; Kossoff, Eric H; Kuperman, Rachel; Kuzniecky, Ruben; Lowenstein, Daniel H; McGuire, Shannon M; Motika, Paul V; Novotny, Edward J; Ottman, Ruth; Paolicchi, Juliann M; Parent, Jack M; Park, Kristen; Poduri, Annapurna; Scheffer, Ingrid E; Shellhaas, Renée A; Sherr, Elliott H; Shih, Jerry J; Singh, Rani; Sirven, Joseph; Smith, Michael C; Sullivan, Joseph; Lin Thio, Liu; Venkat, Anu; Vining, Eileen P G; Von Allmen, Gretchen K; Weisenberg, Judith L; Widdess-Walsh, Peter; Winawer, Melodie R

    2013-09-12

    Epileptic encephalopathies are a devastating group of severe childhood epilepsy disorders for which the cause is often unknown. Here we report a screen for de novo mutations in patients with two classical epileptic encephalopathies: infantile spasms (n = 149) and Lennox-Gastaut syndrome (n = 115). We sequenced the exomes of 264 probands, and their parents, and confirmed 329 de novo mutations. A likelihood analysis showed a significant excess of de novo mutations in the ∼4,000 genes that are the most intolerant to functional genetic variation in the human population (P = 2.9 × 10(-3)). Among these are GABRB3, with de novo mutations in four patients, and ALG13, with the same de novo mutation in two patients; both genes show clear statistical evidence of association with epileptic encephalopathy. Given the relevant site-specific mutation rates, the probabilities of these outcomes occurring by chance are P = 4.1 × 10(-10) and P = 7.8 × 10(-12), respectively. Other genes with de novo mutations in this cohort include CACNA1A, CHD2, FLNA, GABRA1, GRIN1, GRIN2B, HNRNPU, IQSEC2, MTOR and NEDD4L. Finally, we show that the de novo mutations observed are enriched in specific gene sets including genes regulated by the fragile X protein (P < 10(-8)), as has been reported previously for autism spectrum disorders.

  20. Development of an Electrochemistry Teaching Sequence using a Phenomenographic Approach

    NASA Astrophysics Data System (ADS)

    Rodriguez-Velazquez, Sorangel

    the core concepts from discipline-specific models and theories serve as visual tools to describe reversible redox half-reactions at equilibrium, predict the spontaneity of the electrochemical process and explain interfacial equilibrium between redox species and electrodes in solution. The integration of physics concepts into electrochemistry instruction facilitated describing the interactions between the chemical system (e.g., redox species) and the external circuit (e.g., voltmeter). The "Two worlds" theoretical framework was chosen to anchor a robust educational design where the world of objects and events is deliberately connected to the world of theories and models. The core concepts in Marcus theory and density of states (DOS) provided the scientific foundations to connect both worlds. The design of this teaching sequence involved three phases; the selection of the content to be taught, the determination of a coherent and explicit connection among concepts and the development of educational activities to engage students in the learning process. The reduction-oxidation and electrochemistry chapters of three of the most popular general chemistry textbooks were revised in order to identify potential gaps during instruction, taking into consideration learning and teaching difficulties. The electrochemistry curriculum was decomposed into manageable sections contained in modules. Thirteen modules were developed and each module addresses specific conceptions with regard to terminology, redox reactions in electrochemical cells, and the function of the external circuit in electrochemical process. The electrochemistry teaching sequence was evaluated using a phenomenographic approach. This approach allows describing the qualitative variation in instructors' consciousness about the teaching of electrochemistry. A phenomenographic analysis revealed that the most relevant aspect of variation came from instructors' expertise. Participant A expertise (electrochemist) promoted in

  1. Evolution of Viral Proteins Originated De Novo by Overprinting

    PubMed Central

    Sabath, Niv; Wagner, Andreas; Karlin, David

    2012-01-01

    New protein-coding genes can originate either through modification of existing genes or de novo. Recently, the importance of de novo origination has been recognized in eukaryotes, although eukaryotic genes originated de novo are relatively rare and difficult to identify. In contrast, viruses contain many de novo genes, namely those in which an existing gene has been “overprinted” by a new open reading frame, a process that generates a new protein-coding gene overlapping the ancestral gene. We analyzed the evolution of 12 experimentally validated viral genes that originated de novo and estimated their relative ages. We found that young de novo genes have a different codon usage from the rest of the genome. They evolve rapidly and are under positive or weak purifying selection. Thus, young de novo genes might have strain-specific functions, or no function, and would be difficult to detect using current genome annotation methods that rely on the sequence signature of purifying selection. In contrast to young de novo genes, older de novo genes have a codon usage that is similar to the rest of the genome. They evolve slowly and are under stronger purifying selection. Some of the oldest de novo genes evolve under stronger selection pressure than the ancestral gene they overlap, suggesting an evolutionary tug of war between the ancestral and the de novo gene. PMID:22821011

  2. Suggested Involvement of PP1/PP2A Activity and De Novo Gene Expression in Anhydrobiotic Survival in a Tardigrade, Hypsibius dujardini, by Chemical Genetic Approach.

    PubMed

    Kondo, Koyuki; Kubo, Takeo; Kunieda, Takekazu

    2015-01-01

    Upon desiccation, some tardigrades enter an ametabolic dehydrated state called anhydrobiosis and can survive a desiccated environment in this state. For successful transition to anhydrobiosis, some anhydrobiotic tardigrades require pre-incubation under high humidity conditions, a process called preconditioning, prior to exposure to severe desiccation. Although tardigrades are thought to prepare for transition to anhydrobiosis during preconditioning, the molecular mechanisms governing such processes remain unknown. In this study, we used chemical genetic approaches to elucidate the regulatory mechanisms of anhydrobiosis in the anhydrobiotic tardigrade, Hypsibius dujardini. We first demonstrated that inhibition of transcription or translation drastically impaired anhydrobiotic survival, suggesting that de novo gene expression is required for successful transition to anhydrobiosis in this tardigrade. We then screened 81 chemicals and identified 5 chemicals that significantly impaired anhydrobiotic survival after severe desiccation, in contrast to little or no effect on survival after high humidity exposure only. In particular, cantharidic acid, a selective inhibitor of protein phosphatase (PP) 1 and PP2A, exhibited the most profound inhibitory effects. Another PP1/PP2A inhibitor, okadaic acid, also significantly and specifically impaired anhydrobiotic survival, suggesting that PP1/PP2A activity plays an important role for anhydrobiosis in this species. This is, to our knowledge, the first report of the required activities of signaling molecules for desiccation tolerance in tardigrades. The identified inhibitory chemicals could provide novel clues to elucidate the regulatory mechanisms underlying anhydrobiosis in tardigrades.

  3. Suggested Involvement of PP1/PP2A Activity and De Novo Gene Expression in Anhydrobiotic Survival in a Tardigrade, Hypsibius dujardini, by Chemical Genetic Approach

    PubMed Central

    Kondo, Koyuki; Kubo, Takeo; Kunieda, Takekazu

    2015-01-01

    Upon desiccation, some tardigrades enter an ametabolic dehydrated state called anhydrobiosis and can survive a desiccated environment in this state. For successful transition to anhydrobiosis, some anhydrobiotic tardigrades require pre-incubation under high humidity conditions, a process called preconditioning, prior to exposure to severe desiccation. Although tardigrades are thought to prepare for transition to anhydrobiosis during preconditioning, the molecular mechanisms governing such processes remain unknown. In this study, we used chemical genetic approaches to elucidate the regulatory mechanisms of anhydrobiosis in the anhydrobiotic tardigrade, Hypsibius dujardini. We first demonstrated that inhibition of transcription or translation drastically impaired anhydrobiotic survival, suggesting that de novo gene expression is required for successful transition to anhydrobiosis in this tardigrade. We then screened 81 chemicals and identified 5 chemicals that significantly impaired anhydrobiotic survival after severe desiccation, in contrast to little or no effect on survival after high humidity exposure only. In particular, cantharidic acid, a selective inhibitor of protein phosphatase (PP) 1 and PP2A, exhibited the most profound inhibitory effects. Another PP1/PP2A inhibitor, okadaic acid, also significantly and specifically impaired anhydrobiotic survival, suggesting that PP1/PP2A activity plays an important role for anhydrobiosis in this species. This is, to our knowledge, the first report of the required activities of signaling molecules for desiccation tolerance in tardigrades. The identified inhibitory chemicals could provide novel clues to elucidate the regulatory mechanisms underlying anhydrobiosis in tardigrades. PMID:26690982

  4. Data compression of discrete sequence: A tree based approach using dynamic programming

    NASA Technical Reports Server (NTRS)

    Shivaram, Gurusrasad; Seetharaman, Guna; Rao, T. R. N.

    1994-01-01

    A dynamic programming based approach for data compression of a ID sequence is presented. The compression of an input sequence of size N to that of a smaller size k is achieved by dividing the input sequence into k subsequences and replacing the subsequences by their respective average values. The partitioning of the input sequence is carried with the intention of reducing the mean squared error in the reconstructed sequence. The complexity involved in finding the partitions which would result in such an optimal compressed sequence is reduced by using the dynamic programming approach, which is presented.

  5. A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y.

    PubMed

    Tomaszkiewicz, Marta; Rangavittal, Samarth; Cechova, Monika; Campos Sanchez, Rebeca; Fescemyer, Howard W; Harris, Robert; Ye, Danling; O'Brien, Patricia C M; Chikhi, Rayan; Ryder, Oliver A; Ferguson-Smith, Malcolm A; Medvedev, Paul; Makova, Kateryna D

    2016-04-01

    The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats and palindromes, and thus, is the most difficult component of the genome to assemble. Previously, expensive and labor-intensive BAC-based techniques were used to sequence the Y for a handful of mammalian species. Here, we present a much faster and more affordable strategy for sequencing and assembling mammalian Y Chromosomes of sufficient quality for most comparative genomics analyses and for conservation genetics applications. The strategy combines flow sorting, short- and long-read genome and transcriptome sequencing, and droplet digital PCR with novel and existing computational methods. It can be used to reconstruct sex chromosomes in a heterogametic sex of any species. We applied our strategy to produce a draft of the gorilla Y sequence. The resulting assembly allowed us to refine gene content, evaluate copy number of ampliconic gene families, locate species-specific palindromes, examine the repetitive element content, and produce sequence alignments with human and chimpanzee Y Chromosomes. Our results inform the evolution of the hominine (human, chimpanzee, and gorilla) Y Chromosomes. Surprisingly, we found the gorilla Y Chromosome to be similar to the human Y Chromosome, but not to the chimpanzee Y Chromosome. Moreover, we have utilized the assembled gorilla Y Chromosome sequence to design genetic markers for studying the male-specific dispersal of this endangered species.

  6. A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y

    PubMed Central

    Tomaszkiewicz, Marta; Rangavittal, Samarth; Cechova, Monika; Sanchez, Rebeca Campos; Fescemyer, Howard W.; Harris, Robert; Ye, Danling; O'Brien, Patricia C.M.; Chikhi, Rayan; Ryder, Oliver A.; Ferguson-Smith, Malcolm A.; Medvedev, Paul; Makova, Kateryna D.

    2016-01-01

    The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats and palindromes, and thus, is the most difficult component of the genome to assemble. Previously, expensive and labor-intensive BAC-based techniques were used to sequence the Y for a handful of mammalian species. Here, we present a much faster and more affordable strategy for sequencing and assembling mammalian Y Chromosomes of sufficient quality for most comparative genomics analyses and for conservation genetics applications. The strategy combines flow sorting, short- and long-read genome and transcriptome sequencing, and droplet digital PCR with novel and existing computational methods. It can be used to reconstruct sex chromosomes in a heterogametic sex of any species. We applied our strategy to produce a draft of the gorilla Y sequence. The resulting assembly allowed us to refine gene content, evaluate copy number of ampliconic gene families, locate species-specific palindromes, examine the repetitive element content, and produce sequence alignments with human and chimpanzee Y Chromosomes. Our results inform the evolution of the hominine (human, chimpanzee, and gorilla) Y Chromosomes. Surprisingly, we found the gorilla Y Chromosome to be similar to the human Y Chromosome, but not to the chimpanzee Y Chromosome. Moreover, we have utilized the assembled gorilla Y Chromosome sequence to design genetic markers for studying the male-specific dispersal of this endangered species. PMID:26934921

  7. A time- and cost-effective strategy to sequence mammalian Y Chromosomes: an application to the de novo assembly of gorilla Y.

    PubMed

    Tomaszkiewicz, Marta; Rangavittal, Samarth; Cechova, Monika; Campos Sanchez, Rebeca; Fescemyer, Howard W; Harris, Robert; Ye, Danling; O'Brien, Patricia C M; Chikhi, Rayan; Ryder, Oliver A; Ferguson-Smith, Malcolm A; Medvedev, Paul; Makova, Kateryna D

    2016-04-01

    The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats and palindromes, and thus, is the most difficult component of the genome to assemble. Previously, expensive and labor-intensive BAC-based techniques were used to sequence the Y for a handful of mammalian species. Here, we present a much faster and more affordable strategy for sequencing and assembling mammalian Y Chromosomes of sufficient quality for most comparative genomics analyses and for conservation genetics applications. The strategy combines flow sorting, short- and long-read genome and transcriptome sequencing, and droplet digital PCR with novel and existing computational methods. It can be used to reconstruct sex chromosomes in a heterogametic sex of any species. We applied our strategy to produce a draft of the gorilla Y sequence. The resulting assembly allowed us to refine gene content, evaluate copy number of ampliconic gene families, locate species-specific palindromes, examine the repetitive element content, and produce sequence alignments with human and chimpanzee Y Chromosomes. Our results inform the evolution of the hominine (human, chimpanzee, and gorilla) Y Chromosomes. Surprisingly, we found the gorilla Y Chromosome to be similar to the human Y Chromosome, but not to the chimpanzee Y Chromosome. Moreover, we have utilized the assembled gorilla Y Chromosome sequence to design genetic markers for studying the male-specific dispersal of this endangered species. PMID:26934921

  8. Solving the Curriculum Sequencing Problem with DNA Computing Approach

    ERIC Educational Resources Information Center

    Debbah, Amina; Ben Ali, Yamina Mohamed

    2014-01-01

    In the e-learning systems, a learning path is known as a sequence of learning materials linked to each others to help learners achieving their learning goals. As it is impossible to have the same learning path that suits different learners, the Curriculum Sequencing problem (CS) consists of the generation of a personalized learning path for each…

  9. Social and behavioral research in genomic sequencing: approaches from the Clinical Sequencing Exploratory Research Consortium Outcomes and Measures Working Group.

    PubMed

    Gray, Stacy W; Martins, Yolanda; Feuerman, Lindsay Z; Bernhardt, Barbara A; Biesecker, Barbara B; Christensen, Kurt D; Joffe, Steven; Rini, Christine; Veenstra, David; McGuire, Amy L

    2014-10-01

    The routine use of genomic sequencing in clinical medicine has the potential to dramatically alter patient care and medical outcomes. To fully understand the psychosocial and behavioral impact of sequencing integration into clinical practice, it is imperative that we identify the factors that influence sequencing-related decision making and patient outcomes. In an effort to develop a collaborative and conceptually grounded approach to studying sequencing adoption, members of the National Human Genome Research Institute's Clinical Sequencing Exploratory Research Consortium formed the Outcomes and Measures Working Group. Here we highlight the priority areas of investigation and psychosocial and behavioral outcomes identified by the Working Group. We also review some of the anticipated challenges to measurement in social and behavioral research related to genomic sequencing; opportunities for instrument development; and the importance of qualitative, quantitative, and mixed-method approaches. This work represents the early, shared efforts of multiple research teams as we strive to understand individuals' experiences with genomic sequencing. The resulting body of knowledge will guide recommendations for the optimal use of sequencing in clinical practice.

  10. De novo assembly of a haplotype-resolved human genome.

    PubMed

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine. PMID:26006006

  11. De novo assembly of a haplotype-resolved human genome.

    PubMed

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  12. SLAF-seq: an efficient method of large-scale de novo SNP discovery and genotyping using high-throughput sequencing.

    PubMed

    Sun, Xiaowen; Liu, Dongyuan; Zhang, Xiaofeng; Li, Wenbin; Liu, Hui; Hong, Weiguo; Jiang, Chuanbei; Guan, Ning; Ma, Chouxian; Zeng, Huaping; Xu, Chunhua; Song, Jun; Huang, Long; Wang, Chunmei; Shi, Junjie; Wang, Rui; Zheng, Xianhu; Lu, Cuiyun; Wang, Xiaowu; Zheng, Hongkun

    2013-01-01

    Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.

  13. De novo assembly and characterization of global transcriptome of coconut palm (Cocos nucifera L.) embryogenic calli using Illumina paired-end sequencing.

    PubMed

    Rajesh, M K; Fayas, T P; Naganeeswaran, S; Rachana, K E; Bhavyashree, U; Sajini, K K; Karun, Anitha

    2016-05-01

    Production and supply of quality planting material is significant to coconut cultivation but is one of the major constraints in coconut productivity. Rapid multiplication of coconut through in vitro techniques, therefore, is of paramount importance. Although somatic embryogenesis in coconut is a promising technique that will allow for the mass production of high quality palms, coconut is highly recalcitrant to in vitro culture. In order to overcome the bottlenecks in coconut somatic embryogenesis and to develop a repeatable protocol, it is imperative to understand, identify, and characterize molecular events involved in coconut somatic embryogenesis pathway. Transcriptome analysis (RNA-Seq) of coconut embryogenic calli, derived from plumular explants of West Coast Tall cultivar, was undertaken on an Illumina HiSeq 2000 platform. After de novo transcriptome assembly and functional annotation, we have obtained 40,367 transcripts which showed significant BLASTx matches with similarity greater than 40 % and E value of ≤10(-5). Fourteen genes known to be involved in somatic embryogenesis were identified. Quantitative real-time PCR (qRT-PCR) analyses of these 14 genes were carried in six developmental stages. The result showed that CLV was upregulated in the initial stage of callogenesis. Transcripts GLP, GST, PKL, WUS, and WRKY were expressed more in somatic embryo stage. The expression of SERK, MAPK, AP2, SAUR, ECP, AGP, LEA, and ANT were higher in the embryogenic callus stage compared to initial culture and somatic embryo stages. This study provides the first insights into the gene expression patterns during somatic embryogenesis in coconut.

  14. De novo assembly and characterization of global transcriptome of coconut palm (Cocos nucifera L.) embryogenic calli using Illumina paired-end sequencing.

    PubMed

    Rajesh, M K; Fayas, T P; Naganeeswaran, S; Rachana, K E; Bhavyashree, U; Sajini, K K; Karun, Anitha

    2016-05-01

    Production and supply of quality planting material is significant to coconut cultivation but is one of the major constraints in coconut productivity. Rapid multiplication of coconut through in vitro techniques, therefore, is of paramount importance. Although somatic embryogenesis in coconut is a promising technique that will allow for the mass production of high quality palms, coconut is highly recalcitrant to in vitro culture. In order to overcome the bottlenecks in coconut somatic embryogenesis and to develop a repeatable protocol, it is imperative to understand, identify, and characterize molecular events involved in coconut somatic embryogenesis pathway. Transcriptome analysis (RNA-Seq) of coconut embryogenic calli, derived from plumular explants of West Coast Tall cultivar, was undertaken on an Illumina HiSeq 2000 platform. After de novo transcriptome assembly and functional annotation, we have obtained 40,367 transcripts which showed significant BLASTx matches with similarity greater than 40 % and E value of ≤10(-5). Fourteen genes known to be involved in somatic embryogenesis were identified. Quantitative real-time PCR (qRT-PCR) analyses of these 14 genes were carried in six developmental stages. The result showed that CLV was upregulated in the initial stage of callogenesis. Transcripts GLP, GST, PKL, WUS, and WRKY were expressed more in somatic embryo stage. The expression of SERK, MAPK, AP2, SAUR, ECP, AGP, LEA, and ANT were higher in the embryogenic callus stage compared to initial culture and somatic embryo stages. This study provides the first insights into the gene expression patterns during somatic embryogenesis in coconut. PMID:26210639

  15. BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

  16. De Novo Transcriptome Sequencing Analysis of cDNA Library and Large-Scale Unigene Assembly in Japanese Red Pine (Pinus densiflora)

    PubMed Central

    Liu, Le; Zhang, Shijie; Lian, Chunlan

    2015-01-01

    Japanese red pine (Pinus densiflora) is extensively cultivated in Japan, Korea, China, and Russia and is harvested for timber, pulpwood, garden, and paper markets. However, genetic information and molecular markers were very scarce for this species. In this study, over 51 million sequencing clean reads from P. densiflora mRNA were produced using Illumina paired-end sequencing technology. It yielded 83,913 unigenes with a mean length of 751 bp, of which 54,530 (64.98%) unigenes showed similarity to sequences in the NCBI database. Among which the best matches in the NCBI Nr database were Picea sitchensis (41.60%), Amborella trichopoda (9.83%), and Pinus taeda (4.15%). A total of 1953 putative microsatellites were identified in 1784 unigenes using MISA (MicroSAtellite) software, of which the tri-nucleotide repeats were most abundant (50.18%) and 629 EST-SSR (expressed sequence tag- simple sequence repeats) primer pairs were successfully designed. Among 20 EST-SSR primer pairs randomly chosen, 17 markers yielded amplification products of the expected size in P. densiflora. Our results will provide a valuable resource for gene-function analysis, germplasm identification, molecular marker-assisted breeding and resistance-related gene(s) mapping for pine for P. densiflora. PMID:26690126

  17. De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing.

    PubMed

    Hu, Zhuang; Zhang, Tian; Gao, Xiao-Xiao; Wang, Yang; Zhang, Qiang; Zhou, Hui-Juan; Zhao, Gui-Fang; Wang, Ma-Li; Woeste, Keith E; Zhao, Peng

    2016-04-01

    Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome information from RNA-Seq to understand development in J. mandshurica and develop polymorphic simple sequence repeats (SSRs, microsatellites) to understand the species' population genetics. In this study, more than 47.7 million clean reads were generated using Illumina sequencing technology. De novo assembly yielded 99,869 unigenes with an average length of 747 bp. Based on sequence similarity search with known proteins, a total of 39,708 (42.32 %) genes were identified. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) identified 15,903 (16.9 %) unigenes. Further, we identified and characterized 63 new transcriptome-derived microsatellite markers. By testing the markers on 4 to 14 individuals from four populations, we found that 20 were polymorphic and easily amplified. The number of alleles per locus ranged from 2 to 8. The observed and expected heterozygosity per locus ranged from 0.209 to 0.813 and 0.335 to 0.842, respectively. These twenty microsatellite markers will be useful for studies of population genetics, diversity, and genetic structure, and they will undoubtedly benefit future breeding studies of this walnut species. Moreover, the information uncovered in this research will also serve as a useful genetic resource for understanding the transcriptome and development of J. mandshurica and other Juglans species.

  18. Towards a molecular characterization of autism spectrum disorders: an exome sequencing and systems approach.

    PubMed

    An, J Y; Cristino, A S; Zhao, Q; Edson, J; Williams, S M; Ravine, D; Wray, J; Marshall, V M; Hunt, A; Whitehouse, A J O; Claudianos, C

    2014-01-01

    The hypothetical 'AXAS' gene network model that profiles functional patterns of heterogeneous DNA variants overrepresented in autism spectrum disorder (ASD), X-linked intellectual disability, attention deficit and hyperactivity disorder and schizophrenia was used in this current study to analyze whole exome sequencing data from an Australian ASD cohort. An optimized DNA variant filtering pipeline was used to identify loss-of-function DNA variations. Inherited variants from parents with a broader autism phenotype and de novo variants were found to be significantly associated with ASD. Gene ontology analysis revealed that putative rare causal variants cluster in key neurobiological processes and are overrepresented in functions involving neuronal development, signal transduction and synapse development including the neurexin trans-synaptic complex. We also show how a complex gene network model can be used to fine map combinations of inherited and de novo variations in families with ASD that converge in the L1CAM pathway. Our results provide an important step forward in the molecular characterization of ASD with potential for developing a tool to analyze the pathogenesis of individual affected families. PMID:24893065

  19. Novel Approach to Analyzing MFE of Noncoding RNA Sequences

    PubMed Central

    George, Tina P.; Thomas, Tessamma

    2016-01-01

    Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers. PMID:27695341

  20. Novel Approach to Analyzing MFE of Noncoding RNA Sequences

    PubMed Central

    George, Tina P.; Thomas, Tessamma

    2016-01-01

    Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers.

  1. A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads.

    PubMed

    Mbandi, Stanley Kimbung; Hesse, Uljana; Rees, D Jasper G; Christoffels, Alan

    2014-01-01

    Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score based filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms.

  2. A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads.

    PubMed

    Mbandi, Stanley Kimbung; Hesse, Uljana; Rees, D Jasper G; Christoffels, Alan

    2014-01-01

    Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score based filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms. PMID:24575122

  3. Replicating Potato spindle tuber viroid mediates de novo methylation of an intronic viroid sequence but no cleavage of the corresponding pre-mRNA.

    PubMed

    Dalakouras, Athanasios; Dadami, Elena; Bassler, Alexandra; Zwiebel, Michele; Krczal, Gabi; Wassenegger, Michael

    2015-01-01

    In plants, Potato spindle tuber viroid (PSTVd) replication triggers post-transcriptional gene silencing (PTGS) and RNA-directed DNA methylation (RdDM) of homologous RNA and DNA sequences, respectively. PTGS predominantly occurs in the cytoplasm, but nuclear PTGS has been also reported. In this study, we investigated whether the nuclear replicating PSTVd is able to trigger nuclear PTGS. Transgenic tobacco plants carrying cytoplasmic and nuclear PTGS sensor constructs were PSTVd-infected resulting in the generation of abundant PSTVd-derived small interfering RNAs (vd-siRNAs). Northern blot analysis revealed that, in contrast to the cytoplasmic sensor, the nuclear sensor transcript was not targeted for RNA degradation. Bisulfite sequencing analysis showed that the nuclear PTGS sensor transgene was efficiently targeted for RdDM. Our data suggest that PSTVd fails to trigger nuclear PTGS, and that RdDM and nuclear PTGS are not necessarily coupled. PMID:25826660

  4. de novo analysis and functional classification of the transcriptome of the root lesion nematode, Pratylenchus thornei, after 454 GS FLX sequencing.

    PubMed

    Nicol, Paul; Gill, Reetinder; Fosu-Nyarko, John; Jones, Michael G K

    2012-01-01

    The migratory endoparasitic root lesion nematode Pratylenchus thornei is a major pest of the cereals wheat and barley. In what we believe to be the first global transcriptome analysis for P. thornei, using Roche GS FLX sequencing, 787,275 reads were assembled into 34,312 contigs using two assembly programs, to yield 6,989 contigs common to both. These contigs were annotated, resulting in functional assignments for 3,048. Specific transcripts studied in more detail included carbohydrate active enzymes potentially involved in cell wall degradation, neuropeptides, putative plant nematode parasitism genes, and transcripts that could be secreted by the nematode. Transcripts for cell wall degrading enzymes were similar to bacterial genes, suggesting that they were acquired by horizontal gene transfer. Contigs matching 14 parasitism genes found in sedentary endoparasitic nematodes were identified. These genes are thought to function in suppression of host defenses and in feeding site development, but their function in P. thornei may differ. Comparison of the common contigs from P. thornei with other nematodes showed that 2,039 were common to sequences of the Heteroderidae, 1,947 to the Meloidogynidae, 1,218 to Radopholus similis, 1,209 matched expressed sequence tags (ESTs) of Pratylenchus penetrans and Pratylenchus vulnus, and 2,940 to contigs of Pratylenchus coffeae. There were 2,014 contigs common to Caenarhabditis elegans, with 15.9% being common to all three groups. Twelve percent of contigs with matches to the Heteroderidae and the Meloidogynidae had no homology to any C. elegans protein. Fifty-seven percent of the contigs did not match known sequences and some could be unique to P. thornei. These data provide substantial new information on the transcriptome of P. thornei, those genes common to migratory and sedentary endoparasitic nematodes, and provide additional understanding of genes required for different forms of parasitism. The data can also be used to

  5. Strategic Cognitive Sequencing: A Computational Cognitive Neuroscience Approach

    PubMed Central

    Herd, Seth A.; Krueger, Kai A.; Kriete, Trenton E.; Huang, Tsung-Ren; Hazy, Thomas E.; O'Reilly, Randall C.

    2013-01-01

    We address strategic cognitive sequencing, the “outer loop” of human cognition: how the brain decides what cognitive process to apply at a given moment to solve complex, multistep cognitive tasks. We argue that this topic has been neglected relative to its importance for systematic reasons but that recent work on how individual brain systems accomplish their computations has set the stage for productively addressing how brain regions coordinate over time to accomplish our most impressive thinking. We present four preliminary neural network models. The first addresses how the prefrontal cortex (PFC) and basal ganglia (BG) cooperate to perform trial-and-error learning of short sequences; the next, how several areas of PFC learn to make predictions of likely reward, and how this contributes to the BG making decisions at the level of strategies. The third models address how PFC, BG, parietal cortex, and hippocampus can work together to memorize sequences of cognitive actions from instruction (or “self-instruction”). The last shows how a constraint satisfaction process can find useful plans. The PFC maintains current and goal states and associates from both of these to find a “bridging” state, an abstract plan. We discuss how these processes could work together to produce strategic cognitive sequencing and discuss future directions in this area. PMID:23935605

  6. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  7. Stage Sequences of Adolescent Substance Use: A Prospective Longitudinal Approach.

    ERIC Educational Resources Information Center

    Collins, Linda M; And Others

    Past research based primarily on cross-sectional data, has suggested a Guttman scale of substance use onset where marijuana use is preceded by alcohol use, and where sometimes tobacco is placed between alcohol and marijuana. Prospective longitudinal data is needed to determine whether the stages form a temporal sequence. A new statistical…

  8. De novo transcriptome sequence assembly and identification of AP2/ERF transcription factor related to abiotic stress in parsley (Petroselinum crispum).

    PubMed

    Li, Meng-Yao; Tan, Hua-Wei; Wang, Feng; Jiang, Qian; Xu, Zhi-Sheng; Tian, Chang; Xiong, Ai-Sheng

    2014-01-01

    Parsley is an important biennial Apiaceae species that is widely cultivated as herb, spice, and vegetable. Previous studies on parsley principally focused on its physiological and biochemical properties, including phenolic compound and volatile oil contents. However, little is known about the molecular and genetic properties of parsley. In this study, 23,686,707 high-quality reads were obtained and assembled into 81,852 transcripts and 50,161 unigenes for the first time. Functional annotation showed that 30,516 unigenes had sequence similarity to known genes. In addition, 3,244 putative simple sequence repeats were detected in curly parsley. Finally, 1,569 of the identified unigenes belonged to 58 transcription factor families. Various abiotic stresses have a strong detrimental effect on the yield and quality of parsley. AP2/ERF transcription factors have important functions in plant development, hormonal regulation, and abiotic response. A total of 88 putative AP2/ERF factors were identified from the transcriptome sequence of parsley. Seven AP2/ERF transcription factors were selected in this study to analyze the expression profiles of parsley under different abiotic stresses. Our data provide a potentially valuable resource that can be used for intensive parsley research.

  9. Mining biomass-degrading genes through Illumina-based de novo sequencing and metagenomic analysis of free-living bacteria in the gut of the lower termite Coptotermes gestroi harvested in Vietnam.

    PubMed

    Do, Thi Huyen; Nguyen, Thi Thao; Nguyen, Thanh Ngoc; Le, Quynh Giang; Nguyen, Cuong; Kimura, Keitarou; Truong, Nam Hai

    2014-12-01

    The 5.6 Gb metagenome of free-living microbial flora in the gut of the lower termite Coptotermes gestroi, harvested in Vietnam, was sequenced using Illumina technology. Genes related to biomass degradation were mined for a better understanding of biomass digestion in the termite gut and to identify lignocellulolytic enzymes applicable to biofuel production. The sequencing generated 5.4 Gb of useful reads, containing 125,431 ORFs spanning 78,271,365 bp, 80% of which was derived from bacteria. The 12 most abundant bacterial orders were Spirochaetales, Lactobacillales, Bacteroidales, Clostridiales, Enterobacteriales, Pseudomonades, Synergistales, Desulfovibrionales, Xanthomonadales, Burkholderiales, Bacillales, and Actinomycetales, and 1460 species were estimated. Of more than 12,000 ORFs with predicted functions related to carbohydrate metabolism, 587 encoding hydrolytic enzymes for cellulose, hemicellulose, and pectin were identified. Among them, 316 ORFs were related to cellulose degradation, and included β-glucosidases, 6-phospho-β-glucosidases, licheninases, glucan endo-1,3-β-D-glucosidases, endoglucanases, cellulose 1,4-β-cellobiosidases, glucan 1,3-β-glucosidases, and cellobiose phosphorylases. In addition, 259 ORFs were related to hemicellulose degradation, encoding endo-1,4-β-xylanases, α-galactosidases, α-N-arabinofuranosidases, xylan 1,4-β-xylosidases, arabinan endo-1,5-α-L-arabinosidases, endo-1,4-β-mannanases, and α-glucuronidases. Twelve ORFs encoding pectinesterases and pectate lyases were also obtained. To our knowledge, this is the first successful application of Illumina-based de novo sequencing for the analysis of a free-living bacterial community in the gut of a lower termite C. gestroi and for mining genes related to lignocellulose degradation from the gut bacteria.

  10. An observational approach to convection in main sequence stars

    NASA Astrophysics Data System (ADS)

    Régulo, C.; Vázquez Ramió, H.; Roca Cortés, T.

    2005-12-01

    Observational results concerning possible changes in the granulation of Main Sequence stars were found by analysing their seismic power spectra obtained from photometric microvariability. We analysed as many as 178 stars with spectral types F, G, K, and M observed for 54 days. We present evidence of changes in the lifetime and contrast of the granulation, which both increase from F to M stars, although within the limit of resolution.

  11. Genomic DNA enrichment using sequence capture microarrays: a novel approach to discover sequence nucleotide polymorphisms (SNP) in Brassica napus L.

    PubMed

    Clarke, Wayne E; Parkin, Isobel A; Gajardo, Humberto A; Gerhardt, Daniel J; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G; Snowdon, Rod J; Federico, Maria L; Iniguez-Luy, Federico L

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci -QTL- analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.

  12. De novo Transcriptome Generation and Annotation for Two Korean Endemic Land Snails, Aegista chejuensis and Aegista quelpartensis, Using Illumina Paired-End Sequencing Technology

    PubMed Central

    Kang, Se Won; Patnaik, Bharat Bhusan; Hwang, Hee-Ju; Park, So Young; Wang, Tae Hun; Park, Eun Bi; Chung, Jong Min; Song, Dae Kwon; Patnaik, Hongray Howrelia; Lee, Jae Bong; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Lee, Jun Sang; Han, Yeon Soo; Lee, Yong Seok

    2016-01-01

    Aegista chejuensis and Aegista quelpartensis (Family-Bradybaenidae) are endemic to Korea, and are considered vulnerable due to declines in their population. The limited genetic resources for these species restricts the ability to prioritize conservation efforts. We sequenced the transcriptomes of these species using Illumina paired-end technology. Approximately 257 and 240 million reads were obtained and assembled into 198,531 and 230,497 unigenes for A. chejuensis and A. quelpartensis, respectively. The average and N50 unigene lengths were 735.4 and 1073 bp, respectively, for A. chejuensis, and 705.6 and 1001 bp, respectively, for A. quelpartensis. In total, 68,484 (34.5%) and 77,745 (33.73%) unigenes for A. chejuensis and A. quelpartensis, respectively, were annotated to databases. Gene Ontology terms were assigned to 23,778 (11.98%) and 26,396 (11.45) unigenes, for A. chejuensis and A. quelpartensis, respectively, while 5050 and 5838 unigenes were mapped to 117 and 124 pathways in the Kyoto Encyclopedia of Genes and Genomes database. In addition, we identified and annotated 9542 and 10,395 putative simple sequence repeats (SSRs) in unigenes from A. chejuensis and A. quelpartensis, respectively. We designed a list of PCR primers flanking the putative SSR regions. These microsatellites may be utilized for future phylogenetics and conservation initiatives. PMID:26999110

  13. De-novo RNA sequencing and metabolite profiling to identify genes involved in anthocyanin biosynthesis in Korean black raspberry (Rubus coreanus Miquel).

    PubMed

    Hyun, Tae Kyung; Lee, Sarah; Rim, Yeonggil; Kumar, Ritesh; Han, Xiao; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-01-01

    The Korean black raspberry (Rubus coreanus Miquel, KB) on ripening is usually consumed as fresh fruit, whereas the unripe KB has been widely used as a source of traditional herbal medicine. Such a stage specific utilization of KB has been assumed due to the changing metabolite profile during fruit ripening process, but so far molecular and biochemical changes during its fruit maturation are poorly understood. To analyze biochemical changes during fruit ripening process at molecular level, firstly, we have sequenced, assembled, and annotated the transcriptome of KB fruits. Over 4.86 Gb of normalized cDNA prepared from fruits was sequenced using Illumina HiSeq™ 2000, and assembled into 43,723 unigenes. Secondly, we have reported that alterations in anthocyanins and proanthocyanidins are the major factors facilitating variations in these stages of fruits. In addition, up-regulation of F3'H1, DFR4 and LDOX1 resulted in the accumulation of cyanidin derivatives during the ripening process of KB, indicating the positive relationship between the expression of anthocyanin biosynthetic genes and the anthocyanin accumulation. Furthermore, the ability of RcMCHI2 (R. coreanus Miquel chalcone flavanone isomerase 2) gene to complement Arabidopsis transparent testa 5 mutant supported the feasibility of our transcriptome library to provide the gene resources for improving plant nutrition and pigmentation. Taken together, these datasets obtained from transcriptome library and metabolic profiling would be helpful to define the gene-metabolite relationships in this non-model plant.

  14. De-novo RNA Sequencing and Metabolite Profiling to Identify Genes Involved in Anthocyanin Biosynthesis in Korean Black Raspberry (Rubus coreanus Miquel)

    PubMed Central

    Rim, Yeonggil; Kumar, Ritesh; Han, Xiao; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-01-01

    The Korean black raspberry (Rubus coreanus Miquel, KB) on ripening is usually consumed as fresh fruit, whereas the unripe KB has been widely used as a source of traditional herbal medicine. Such a stage specific utilization of KB has been assumed due to the changing metabolite profile during fruit ripening process, but so far molecular and biochemical changes during its fruit maturation are poorly understood. To analyze biochemical changes during fruit ripening process at molecular level, firstly, we have sequenced, assembled, and annotated the transcriptome of KB fruits. Over 4.86 Gb of normalized cDNA prepared from fruits was sequenced using Illumina HiSeq™ 2000, and assembled into 43,723 unigenes. Secondly, we have reported that alterations in anthocyanins and proanthocyanidins are the major factors facilitating variations in these stages of fruits. In addition, up-regulation of F3′H1, DFR4 and LDOX1 resulted in the accumulation of cyanidin derivatives during the ripening process of KB, indicating the positive relationship between the expression of anthocyanin biosynthetic genes and the anthocyanin accumulation. Furthermore, the ability of RcMCHI2 (R. coreanus Miquel chalcone flavanone isomerase 2) gene to complement Arabidopsis transparent testa 5 mutant supported the feasibility of our transcriptome library to provide the gene resources for improving plant nutrition and pigmentation. Taken together, these datasets obtained from transcriptome library and metabolic profiling would be helpful to define the gene-metabolite relationships in this non-model plant. PMID:24505466

  15. Sequencing and Computational Approaches to Identification and Characterization of Microbial Organisms

    PubMed Central

    Yadav, Brijesh Singh; Ronda, Venkateswarlu; Vashista, Dinesh P; Sharma, Bhaskar

    2013-01-01

    The recent advances in sequencing technologies and computational approaches are propelling scientists ever closer towards complete understanding of human-microbial interactions. The powerful sequencing platforms are rapidly producing huge amounts of nucleotide sequence data which are compiled into huge databases. This sequence data can be retrieved, assembled, and analyzed for identification of microbial pathogens and diagnosis of diseases. In this article, we present a commentary on how the metagenomics incorporated with microarray and new sequencing techniques are helping microbial detection and characterization. PMID:25288901

  16. Next generation sequencing and de novo transcriptome analysis of Costus pictus D. Don, a non-model plant with potent anti-diabetic properties

    PubMed Central

    2012-01-01

    Background Phyto-remedies for diabetic control are popular among patients with Type II Diabetes mellitus (DM), in addition to other diabetic control measures. A number of plant species are known to possess diabetic control properties. Costus pictus D. Don is popularly known as “Insulin Plant” in Southern India whose leaves have been reported to increase insulin pools in blood plasma. Next Generation Sequencing is employed as a powerful tool for identifying molecular signatures in the transcriptome related to physiological functions of plant tissues. We sequenced the leaf transcriptome of C. pictus using Illumina reversible dye terminator sequencing technology and used combination of bioinformatics tools for identifying transcripts related to anti-diabetic properties of C. pictus. Results A total of 55,006 transcripts were identified, of which 69.15% transcripts could be annotated. We identified transcripts related to pathways of bixin biosynthesis and geraniol and geranial biosynthesis as major transcripts from the class of isoprenoid secondary metabolites and validated the presence of putative norbixin methyltransferase, a precursor of Bixin. The transcripts encoding these terpenoids are known to be Peroxisome Proliferator-Activated Receptor (PPAR) agonists and anti-glycation agents. Sequential extraction and High Performance Liquid Chromatography (HPLC) confirmed the presence of bixin in C. pictus methanolic extracts. Another significant transcript identified in relation to anti-diabetic, anti-obesity and immuno-modulation is of Abscisic Acid biosynthetic pathway. We also report many other transcripts for the biosynthesis of antitumor, anti-oxidant and antimicrobial metabolites of C. pictus leaves. Conclusion Solid molecular signatures (transcripts related to bixin, abscisic acid, and geranial and geraniol biosynthesis) for the anti-diabetic properties of C. pictus leaves and vital clues related to the other phytochemical functions like antitumor, anti

  17. Informed kmer selection for de novo transcriptome assembly

    PubMed Central

    Durai, Dilip A.; Schulz, Marcel H.

    2016-01-01

    Motivation: De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k. As such no single kmer value leads to optimal results. Instead, DBGs over different kmer values are built and the assemblies are merged to improve sensitivity. However, no studies have investigated thoroughly the problem of automatically learning at which kmer value to stop the assembly. Instead a suboptimal selection of kmer values is often used in practice. Results: Here we investigate the contribution of a single kmer value in a multi-kmer based assembly approach. We find that a comparative clustering of related assemblies can be used to estimate the importance of an additional kmer assembly. Using a model fit based algorithm we predict the kmer value at which no further assemblies are necessary. Our approach is tested with different de novo assemblers for datasets with different coverage values and read lengths. Further, we suggest a simple post processing step that significantly improves the quality of multi-kmer assemblies. Conclusion: We provide an automatic method for limiting the number of kmer values without a significant loss in assembly quality but with savings in assembly time. This is a step forward to making multi-kmer methods more reliable and easier to use. Availability and Implementation:A general implementation of our approach can be found under: https://github.com/SchulzLab/KREATION. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: mschulz@mmci.uni-saarland.de PMID:27153653

  18. De Novo Sequencing and Analysis of the Safflower Transcriptome to Discover Putative Genes Associated with Safflor Yellow in Carthamus tinctorius L.

    PubMed Central

    Liu, Xiuming; Dong, Yuanyuan; Yao, Na; Zhang, Yu; Wang, Nan; Cui, Xiyan; Li, Xiaowei; Wang, Yanfang; Wang, Fawei; Yang, Jing; Guan, Lili; Du, Linna; Li, Haiyan; Li, Xiaokun

    2015-01-01

    Safflower (Carthamus tinctorius L.), an important traditional Chinese medicine, is cultured widely for its pharmacological effects, but little is known regarding the genes related to the metabolic regulation of the safflower’s yellow pigment. To investigate genes related to safflor yellow biosynthesis, 454 pyrosequencing of flower RNA at different developmental stages was performed, generating large databases.In this study, we analyzed 454 sequencing data from different flowering stages in safflower. In total, 1,151,324 raw reads and 1,140,594 clean reads were produced, which were assembled into 51,591 unigenes with an average length of 679 bp and a maximum length of 5109 bp. Among the unigenes, 40,139 were in the early group, 39,768 were obtained from the full group and 28,316 were detected in both samples. With the threshold of “log2 ratio ≥ 1”, there were 34,464 differentially expressed genes, of which 18,043 were up-regulated and 16,421 were down-regulated in the early flower library. Based on the annotations of the unigenes, 281 pathways were predicted. We selected 12 putative genes and analyzed their expression levels using quantitative real time-PCR. The results were consistent with the 454 sequencing results. In addition, the expression of chalcone synthase, chalcone isomerase and anthocyanidin synthase, which are involved in safflor yellow biosynthesis and safflower yellow pigment (SYP) content, were analyzed in different flowering periods, indicating that their expression levels were related to SYP synthesis. Moreover, to further confirm the results of the 454 pyrosequencing, full-length cDNA of chalcone isomerase (CHI) and anthocyanidin synthase (ANS) were cloned from safflower petal by RACE (Rapid-amplification of cDNA ends) method according to fragment of the transcriptome. PMID:26516840

  19. [De novo sequencing and analysis of root transcriptome to reveal regulation of gene expression by moderate drought stress in Glycyrrhiza uralensis].

    PubMed

    Zhang, Chun-rong; Sang, Xue-yu; Qu, Meng; Tang, Xiao-min; Cheng, Xuan-xuan; Pan, Li-ming; Yang, Quan

    2015-12-01

    Moderate drought stress has been found to promote the accumulation of active ingredients in Glycyrrhiza uralensis root and hence improve the medicinal quality. In this study, the transcriptomes of 6-month-old moderate drought stressed and control G. uralensis root (the relative water content in soil was 40%-45% and 70%-75%, respectively) were sequenced using Illumina HiSeq 2000. A total of 80,490 490 and 82 588 278 clean reads, 94,828 and 305,100 unigenes with N50 sequence of 1,007 and 1,125 nt were obtained in drought treated and control transcriptome, respectively. Differentially expressed genes analysis revealed that the genes of some cell wall enzymes such as β-xylosidase, legumain and GDP-L-fucose synthase were down-regulated indicating that moderate drought stress might inhibit the primary cell wall degradation and programmed cell death in root cells. The genes of some key enzymes involved in terpenoid and flavonoid biosynthesis were up-regulated by moderate drought stress might be the reason for the enhancement for the active ingredients accumulation in G. uralensis root. The promotion of the biosynthesis and signal transduction of auxin, ethylene and cytokinins by moderate drought stress might enhance the root formation and cell proliferation. The promotion of the biosynthesis and signal transduction of abscisic acid and jasmonic acid by moderate drought stress might enhance the drought stress tolerance in G. uralensis. The inhibition of the biosynthesis and signal transduction of gibberellin and brassinolide by moderate drought stress might retard the shoot growth in G. uralensis. PMID:27245028

  20. De Novo Sequencing and Analysis of the Safflower Transcriptome to Discover Putative Genes Associated with Safflor Yellow in Carthamus tinctorius L.

    PubMed

    Liu, Xiuming; Dong, Yuanyuan; Yao, Na; Zhang, Yu; Wang, Nan; Cui, Xiyan; Li, Xiaowei; Wang, Yanfang; Wang, Fawei; Yang, Jing; Guan, Lili; Du, Linna; Li, Haiyan; Li, Xiaokun

    2015-10-26

    Safflower (Carthamus tinctorius L.), an important traditional Chinese medicine, is cultured widely for its pharmacological effects, but little is known regarding the genes related to the metabolic regulation of the safflower's yellow pigment. To investigate genes related to safflor yellow biosynthesis, 454 pyrosequencing of flower RNA at different developmental stages was performed, generating large databases.In this study, we analyzed 454 sequencing data from different flowering stages in safflower. In total, 1,151,324 raw reads and 1,140,594 clean reads were produced, which were assembled into 51,591 unigenes with an average length of 679 bp and a maximum length of 5109 bp. Among the unigenes, 40,139 were in the early group, 39,768 were obtained from the full group and 28,316 were detected in both samples. With the threshold of "log2 ratio ≥ 1", there were 34,464 differentially expressed genes, of which 18,043 were up-regulated and 16,421 were down-regulated in the early flower library. Based on the annotations of the unigenes, 281 pathways were predicted. We selected 12 putative genes and analyzed their expression levels using quantitative real time-PCR. The results were consistent with the 454 sequencing results. In addition, the expression of chalcone synthase, chalcone isomerase and anthocyanidin synthase, which are involved in safflor yellow biosynthesis and safflower yellow pigment (SYP) content, were analyzed in different flowering periods, indicating that their expression levels were related to SYP synthesis. Moreover, to further confirm the results of the 454 pyrosequencing, full-length cDNA of chalcone isomerase (CHI) and anthocyanidin synthase (ANS) were cloned from safflower petal by RACE (Rapid-amplification of cDNA ends) method according to fragment of the transcriptome.

  1. De novo transcriptome sequencing of the snail Echinolittorina malaccana: identification of genes responsive to thermal stress and development of genetic markers for population studies.

    PubMed

    Wang, Wei; Hui, Jerome H L; Chan, Ting Fung; Chu, Ka Hou

    2014-10-01

    Echinolittorina snails inhabit the upper intertidal rocky shore and face strong selection pressures from thermal extremes and fluctuations. Revealing the molecular processes of adaptive significance is greatly obstructed by the scarcity of genomic resource for these taxa. Here, we reported the first comprehensive transcriptome dataset for the genus Echinolittorina. Using Illumina HiSeq 2000 platform, about 52 M and 54 M paired-end clean reads were, respectively, generated for the control and heat-stressed libraries. Totally, 115,211 unique transcript fragments (unigenes) were assembled, with an average length of 453 bp and a N50 size of 492 bp. Approximately one third of the unigenes could be annotated according to their homology matches against the Nr, Swiss-Prot, COG, or KEGG databases, and they were found to represent 23,098 non-redundant genes. Gene expression comparison revealed that 1,267 and 6,663 annotated genes were, respectively, up- and downregulated with at least twofold changes upon heat stress. Gene Ontology and KEGG pathway analyses indicated that there were overrepresented amount of genes enriched in a broad spectrum of biological processes and pathways, including those associated with cytoskeleton organization, developmental regulation, signaling transduction, infection, and cardiac function. In addition, a transcriptome-wide search for polymorphic loci yielded a total of 11,228 simple sequence repeats (SSRs) from 9,938 unigenes and 138,631 single nucleotide polymorphism (SNP) and insertion/deletion (INDEL) sites among 22,770 unigenes. The large number of transcript sequences acquired, the biological pathways identified, and the candidate microsatellite and SNP/INDEL loci discovered in the study will serve as valuable resources for further investigations of genetic differentiation and thermal adaptation among populations.

  2. Whole Genome Duplication and Enrichment of Metal Cation Transporters Revealed by De Novo Genome Sequencing of Extremely Halotolerant Black Yeast Hortaea werneckii

    PubMed Central

    Jackman, Shaun; Turk, Martina; Sadowski, Ivan; Nislow, Corey; Jones, Steven; Birol, Inanc; Cimerman, Nina Gunde; Plemenitaš, Ana

    2013-01-01

    Hortaea werneckii, ascomycetous yeast from the order Capnodiales, shows an exceptional adaptability to osmotically stressful conditions. To investigate this unusual phenotype we obtained a draft genomic sequence of a H. werneckii strain isolated from hypersaline water of solar saltern. Two of its most striking characteristics that may be associated with a halotolerant lifestyle are the large genetic redundancy and the expansion of genes encoding metal cation transporters. Although no sexual state of H. werneckii has yet been described, a mating locus with characteristics of heterothallic fungi was found. The total assembly size of the genome is 51.6 Mb, larger than most phylogenetically related fungi, coding for almost twice the usual number of predicted genes (23333). The genome appears to have experienced a relatively recent whole genome duplication, and contains two highly identical gene copies of almost every protein. This is consistent with some previous studies that reported increases in genomic DNA content triggered by exposure to salt stress. In hypersaline conditions transmembrane ion transport is of utmost importance. The analysis of predicted metal cation transporters showed that most types of transporters experienced several gene duplications at various points during their evolution. Consequently they are present in much higher numbers than expected. The resulting diversity of transporters presents interesting biotechnological opportunities for improvement of halotolerance of salt-sensitive species. The involvement of plasma P-type H+ ATPases in adaptation to different concentrations of salt was indicated by their salt dependent transcription. This was not the case with vacuolar H+ ATPases, which were transcribed constitutively. The availability of this genomic sequence is expected to promote the research of H. werneckii. Studying its extreme halotolerance will not only contribute to our understanding of life in hypersaline environments, but should also

  3. De Novo Transcriptome Analysis of Oncomelania hupensis after Molluscicide Treatment by Next-Generation Sequencing: Implications for Biology and Future Snail Interventions

    PubMed Central

    Zhao, Qin Ping; Xiong, Tao; Xu, Xing Jian; Jiang, Ming Sen; Dong, Hui Fen

    2015-01-01

    The freshwater snail Oncomelania hupensis is the only intermediate host of Schistosoma japonicum, which causes schistosomiasis. This disease is endemic in the Far East, especially in mainland China. Because niclosamide is the only molluscicide recommended by the World Health Organization, 50% wettable powder of niclosamide ethanolamine salt (WPN), the only chemical molluscicide available in China, has been widely used as the main snail control method for over two decades. Recently, a novel molluscicide derived from niclosamide, the salt of quinoid-2',5-dichloro-4'-nitro-salicylanilide (Liu Dai Shui Yang An, LDS), has been developed and proven to have the same molluscicidal effect as WPN, with lower cost and significantly lower toxicity to fish than WPN. The mechanism by which these molluscicides cause snail death is not known. Here, we report the next-generation transcriptome sequencing of O. hupensis; 145,008,667 clean reads were generated and assembled into 254,286 unigenes. Using GO and KEGG databases, 14,860 unigenes were assigned GO annotations and 4,686 unigenes were mapped to 250 KEGG pathways. Many sequences involved in key processes associated with biological regulation and innate immunity have been identified. After the snails were exposed to LDS and WPN, 254 unigenes showed significant differential expression. These genes were shown to be involved in cell structure defects and the inhibition of neurohumoral transmission and energy metabolism, which may cause snail death. Gene expression patterns differed after exposure to LDS and WPN, and these differences must be elucidated by the identification and annotation of these unknown unigenes. We believe that this first large-scale transcriptome dataset for O. hupensis will provide an opportunity for the in-depth analysis of this biomedically important freshwater snail at the molecular level and accelerate studies of the O. hupensis genome. The data elucidating the molluscicidal mechanism will be of great

  4. De Novo Sequencing and Analysis of the Safflower Transcriptome to Discover Putative Genes Associated with Safflor Yellow in Carthamus tinctorius L.

    PubMed

    Liu, Xiuming; Dong, Yuanyuan; Yao, Na; Zhang, Yu; Wang, Nan; Cui, Xiyan; Li, Xiaowei; Wang, Yanfang; Wang, Fawei; Yang, Jing; Guan, Lili; Du, Linna; Li, Haiyan; Li, Xiaokun

    2015-01-01

    Safflower (Carthamus tinctorius L.), an important traditional Chinese medicine, is cultured widely for its pharmacological effects, but little is known regarding the genes related to the metabolic regulation of the safflower's yellow pigment. To investigate genes related to safflor yellow biosynthesis, 454 pyrosequencing of flower RNA at different developmental stages was performed, generating large databases.In this study, we analyzed 454 sequencing data from different flowering stages in safflower. In total, 1,151,324 raw reads and 1,140,594 clean reads were produced, which were assembled into 51,591 unigenes with an average length of 679 bp and a maximum length of 5109 bp. Among the unigenes, 40,139 were in the early group, 39,768 were obtained from the full group and 28,316 were detected in both samples. With the threshold of "log2 ratio ≥ 1", there were 34,464 differentially expressed genes, of which 18,043 were up-regulated and 16,421 were down-regulated in the early flower library. Based on the annotations of the unigenes, 281 pathways were predicted. We selected 12 putative genes and analyzed their expression levels using quantitative real time-PCR. The results were consistent with the 454 sequencing results. In addition, the expression of chalcone synthase, chalcone isomerase and anthocyanidin synthase, which are involved in safflor yellow biosynthesis and safflower yellow pigment (SYP) content, were analyzed in different flowering periods, indicating that their expression levels were related to SYP synthesis. Moreover, to further confirm the results of the 454 pyrosequencing, full-length cDNA of chalcone isomerase (CHI) and anthocyanidin synthase (ANS) were cloned from safflower petal by RACE (Rapid-amplification of cDNA ends) method according to fragment of the transcriptome. PMID:26516840

  5. [De novo sequencing and analysis of root transcriptome to reveal regulation of gene expression by moderate drought stress in Glycyrrhiza uralensis].

    PubMed

    Zhang, Chun-rong; Sang, Xue-yu; Qu, Meng; Tang, Xiao-min; Cheng, Xuan-xuan; Pan, Li-ming; Yang, Quan

    2015-12-01

    Moderate drought stress has been found to promote the accumulation of active ingredients in Glycyrrhiza uralensis root and hence improve the medicinal quality. In this study, the transcriptomes of 6-month-old moderate drought stressed and control G. uralensis root (the relative water content in soil was 40%-45% and 70%-75%, respectively) were sequenced using Illumina HiSeq 2000. A total of 80,490 490 and 82 588 278 clean reads, 94,828 and 305,100 unigenes with N50 sequence of 1,007 and 1,125 nt were obtained in drought treated and control transcriptome, respectively. Differentially expressed genes analysis revealed that the genes of some cell wall enzymes such as β-xylosidase, legumain and GDP-L-fucose synthase were down-regulated indicating that moderate drought stress might inhibit the primary cell wall degradation and programmed cell death in root cells. The genes of some key enzymes involved in terpenoid and flavonoid biosynthesis were up-regulated by moderate drought stress might be the reason for the enhancement for the active ingredients accumulation in G. uralensis root. The promotion of the biosynthesis and signal transduction of auxin, ethylene and cytokinins by moderate drought stress might enhance the root formation and cell proliferation. The promotion of the biosynthesis and signal transduction of abscisic acid and jasmonic acid by moderate drought stress might enhance the drought stress tolerance in G. uralensis. The inhibition of the biosynthesis and signal transduction of gibberellin and brassinolide by moderate drought stress might retard the shoot growth in G. uralensis.

  6. New approaches for computer analysis of nucleic acid sequences.

    PubMed

    Karlin, S; Ghandour, G; Ost, F; Tavare, S; Korn, L J

    1983-09-01

    A new high-speed computer algorithm is outlined that ascertains within and between nucleic acid and protein sequences all direct repeats, dyad symmetries, and other structural relationships. Large repeats, repeats of high frequency, dyad symmetries of specified stem length and loop distance, and their distributions are determined. Significance of homologies is assessed by a hierarchy of permutation procedures. Applications are made to papovaviruses, the human papillomavirus HPV, lambda phage, the human and mouse mitochondrial genomes, and the human and mouse immunoglobulin kappa-chain genes. PMID:6577449

  7. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  8. De novo design of the hydrophobic cores of proteins.

    PubMed Central

    Desjarlais, J. R.; Handel, T. M.

    1995-01-01

    We have developed and experimentally tested a novel computational approach for the de novo design of hydrophobic cores. A pair of computer programs has been written, the first of which creates a "custom" rotamer library for potential hydrophobic residues, based on the backbone structure of the protein of interest. The second program uses a genetic algorithm to globally optimize for a low energy core sequence and structure, using the custom rotamer library as input. Success of the programs in predicting the sequences of native proteins indicates that they should be effective tools for protein design. Using these programs, we have designed and engineered several variants of the phage 434 cro protein, containing five, seven, or eight sequence changes in the hydrophobic core. As controls, we have produced a variant consisting of a randomly generated core with six sequence changes but equal volume relative to the native core and a variant with a "minimalist" core containing predominantly leucine residues. Two of the designs, including one with eight core sequence changes, have thermal stabilities comparable to the native protein, whereas the third design and the minimalist protein are significantly destabilized. The randomly designed control is completely unfolded under equivalent conditions. These results suggest that rational de novo design of hydrophobic cores is feasible, and stress the importance of specific packing interactions for the stability of proteins. A surprising aspect of the results is that all of the variants display highly cooperative thermal denaturation curves and reasonably dispersed NMR spectra. This suggests that the non-core residues of a protein play a significant role in determining the uniqueness of the folded structure. PMID:8535237

  9. A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

    PubMed

    García-Remesal, Miguel; Maojo, Victor; Crespo, José

    2010-01-01

    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

  10. A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

    PubMed

    García-Remesal, Miguel; Maojo, Victor; Crespo, José

    2010-01-01

    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences. PMID:21096556

  11. Qualitative De Novo Analysis of Full Length cDNA and Quantitative Analysis of Gene Expression for Common Marmoset (Callithrix jacchus) Transcriptomes Using Parallel Long-Read Technology and Short-Read Sequencing

    PubMed Central

    Uno, Yasuhiro; Uehara, Shotaro; Inoue, Takashi; Murayama, Norie; Onodera, Jun; Sasaki, Erika; Yamazaki, Hiroshi

    2014-01-01

    The common marmoset (Callithrix jacchus) is a non-human primate that could prove useful as human pharmacokinetic and biomedical research models. The cytochromes P450 (P450s) are a superfamily of enzymes that have critical roles in drug metabolism and disposition via monooxygenation of a broad range of xenobiotics; however, information on some marmoset P450s is currently limited. Therefore, identification and quantitative analysis of tissue-specific mRNA transcripts, including those of P450s and flavin-containing monooxygenases (FMO, another monooxygenase family), need to be carried out in detail before the marmoset can be used as an animal model in drug development. De novo assembly and expression analysis of marmoset transcripts were conducted with pooled liver, intestine, kidney, and brain samples from three male and three female marmosets. After unique sequences were automatically aligned by assembling software, the mean contig length was 718 bp (with a standard deviation of 457 bp) among a total of 47,883 transcripts. Approximately 30% of the total transcripts were matched to known marmoset sequences. Gene expression in 18 marmoset P450- and 4 FMO-like genes displayed some tissue-specific patterns. Of these, the three most highly expressed in marmoset liver were P450 2D-, 2E-, and 3A-like genes. In extrahepatic tissues, including brain, gene expressions of these monooxygenases were lower than those in liver, although P450 3A4 (previously P450 3A21) in intestine and P450 4A11- and FMO1-like genes in kidney were relatively highly expressed. By means of massive parallel long-read sequencing and short-read technology applied to marmoset liver, intestine, kidney, and brain, the combined next-generation sequencing analyses reported here were able to identify novel marmoset drug-metabolizing P450 transcripts that have until now been little reported. These results provide a foundation for mechanistic studies and pave the way for the use of marmosets as model animals

  12. A General Approach to Sequence-Controlled Polymers Using Macrocyclic Ring Opening Metathesis Polymerization

    PubMed Central

    2015-01-01

    A new and general strategy for the synthesis of sequence-defined polymers is described that employs relay metathesis to promote the ring opening polymerization of unstrained macrocyclic structures. Central to this approach is the development of a small molecule “polymerization trigger” which when coupled with a diverse range of sequence-defined units allows for the controlled, directional synthesis of sequence controlled polymers. PMID:26053158

  13. PCR amplification of repetitive sequences as a possible approach in relative species quantification.

    PubMed

    Ballin, N Z; Vogensen, F K; Karlsson, A H

    2012-02-01

    Both relative and absolute quantifications are possible in species quantification when single copy genomic DNA is used. However, amplification of single copy genomic DNA does not allow a limit of detection as low as one obtained from amplification of repetitive sequences. Amplification of repetitive sequences is therefore frequently used in absolute quantification but problems occur in relative quantification as the number of repetitive sequences is unknown. A promising approach was developed where data from amplification of repetitive sequences were used in relative quantification of species in binary mixtures. PCR LUX primers were designed that amplify repetitive and single copy sequences to establish the species dependent number (constants) (SDC) of amplified repetitive sequences per genome. The SDCs and data from amplification of repetitive sequences were tested for their applicability to relatively quantify the amount of chicken DNA in a binary mixture of chicken DNA and pig DNA. However, the designed PCR primers lack the specificity required for regulatory species control.

  14. De novo transcriptome sequencing of Acer palmatum and comprehensive analysis of differentially expressed genes under salt stress in two contrasting genotypes.

    PubMed

    Rong, Liping; Li, Qianzhong; Li, Shushun; Tang, Ling; Wen, Jing

    2016-04-01

    Maple (Acer palmatum) is an important species for landscape planting worldwide. Salt stress affects the normal growth of the Maple leaf directly, leading to loss of esthetic value. However, the limited availability of Maple genomic information has hindered research on the mechanisms underlying this tolerance. In this study, we performed comprehensive analyses of the salt tolerance in two genotypes of Maple using RNA-seq. Approximately 146.4 million paired-end reads, representing 181,769 unigenes, were obtained. The N50 length of the unigenes was 738 bp, and their total length over 102.66 Mb. 14,090 simple sequence repeats and over 500,000 single nucleotide polymorphisms were identified, which represent useful resources for marker development. Importantly, 181,769 genes were detected in at least one library, and 303 differentially expressed genes (DEGs) were identified between salt-sensitive and salt-tolerant genotypes. Among these DEGs, 125 were upregulated and 178 were downregulated genes. Two MYB-related proteins and one LEA protein were detected among the first 10 most downregulated genes. Moreover, a methyltransferase-related gene was detected among the first 10 most upregulated genes. The three most significantly enriched pathways were plant hormone signal transduction, arginine and proline metabolism, and photosynthesis. The transcriptome analysis provided a rich genetic resource for gene discovery related to salt tolerance in Maple, and in closely related species. The data will serve as an important public information platform to further our understanding of the molecular mechanisms involved in salt tolerance in Maple.

  15. A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n(2) Pairwise Distances in n(2) Steps for the Sequencing Problem: I. Theory.

    PubMed

    Fomin, Eduard

    2016-09-01

    The problem of the reconstruction of the order of sequence elements in de novo sequencing of linear and cyclic peptides is reduced to the known turnpike and beltway problems, the latter of which having no polynomial time algorithm in the general case. A new simple approach is proposed to solve both problems. It is based on sequential removal of redundancy from the inputs. For the error-free inputs that simulate mass spectra with accuracy to [Formula: see text] Da, the size of inputs decreases from [Formula: see text] to [Formula: see text]. In this way, exhaustive search can be almost completely removed from the algorithms, and the number of steps to reconstruct a sequence is in direct ratio to the input size, [Formula: see text].

  16. A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera, Demospongiae, Poecilosclerida): transcriptome sequencing, characterization and overview of the gene expression along three life cycle stages.

    PubMed

    Pérez-Porro, A R; Navarro-Gómez, D; Uriz, M J; Giribet, G

    2013-05-01

    Sponges can be dominant organisms in many marine and freshwater habitats where they play essential ecological roles. They also represent a key group to address important questions in early metazoan evolution. Recent approaches for improving knowledge on sponge biological and ecological functions as well as on animal evolution have focused on the genetic toolkits involved in ecological responses to environmental changes (biotic and abiotic), development and reproduction. These approaches are possible thanks to newly available, massive sequencing technologies-such as the Illumina platform, which facilitate genome and transcriptome sequencing in a cost-effective manner. Here we present the first NGS (next-generation sequencing) approach to understanding the life cycle of an encrusting marine sponge. For this we sequenced libraries of three different life cycle stages of the Mediterranean sponge Crella elegans and generated de novo transcriptome assemblies. Three assemblies were based on sponge tissue of a particular life cycle stage, including non-reproductive tissue, tissue with sperm cysts and tissue with larvae. The fourth assembly pooled the data from all three stages. By aggregating data from all the different life cycle stages we obtained a higher total number of contigs, contigs with blast hit and annotated contigs than from one stage-based assemblies. In that multi-stage assembly we obtained a larger number of the developmental regulatory genes known for metazoans than in any other assembly. We also advance the differential expression of selected genes in the three life cycle stages to explore the potential of RNA-seq for improving knowledge on functional processes along the sponge life cycle.

  17. SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome projects routinely produce draft sequences for species from diverse evolutionary clades, but generally do not create single nucleotide polymorphism (SNP) resources. We present an approach for de novo SNP discovery based on short-read sequencing of reduced representation libraries (RRL) to ge...

  18. Next-Generation Phylogeography: A Targeted Approach for Multilocus Sequencing of Non-Model Organisms

    PubMed Central

    Puritz, Jonathan B.; Addison, Jason A.; Toonen, Robert J.

    2012-01-01

    The field of phylogeography has long since realized the need and utility of incorporating nuclear DNA (nDNA) sequences into analyses. However, the use of nDNA sequence data, at the population level, has been hindered by technical laboratory difficulty, sequencing costs, and problematic analytical methods dealing with genotypic sequence data, especially in non-model organisms. Here, we present a method utilizing the 454 GS-FLX Titanium pyrosequencing platform with the capacity to simultaneously sequence two species of sea star (Meridiastra calcar and Parvulastra exigua) at five different nDNA loci across 16 different populations of 20 individuals each per species. We compare results from 3 populations with traditional Sanger sequencing based methods, and demonstrate that this next-generation sequencing platform is more time and cost effective and more sensitive to rare variants than Sanger based sequencing. A crucial advantage is that the high coverage of clonally amplified sequences simplifies haplotype determination, even in highly polymorphic species. This targeted next-generation approach can greatly increase the use of nDNA sequence loci in phylogeographic and population genetic studies by mitigating many of the time, cost, and analytical issues associated with highly polymorphic, diploid sequence markers. PMID:22470543

  19. Whole genome sequencing and integrative genomic analysis approach on two 22q11.2 deletion syndrome family trios for genotype to phenotype correlations

    PubMed Central

    Chung, Jonathan H.; Cai, Jinlu; Suskin, Barrie G.; Zhang, Zhengdong; Coleman, Karlene

    2015-01-01

    The 22q11.2 deletion syndrome (22q11DS) affects 1:4000 live births and presents with highly variable phenotype expressivity. In this study, we developed an analytical approach utilizing whole genome sequencing and integrative analysis to discover genetic modifiers. Our pipeline combined available tools in order to prioritize rare, predicted deleterious, coding and non-coding single nucleotide variants (SNVs) and insertion/deletions (INDELs) from whole genome sequencing (WGS). We sequenced two unrelated probands with 22q11DS, with contrasting clinical findings, and their unaffected parents. Proband P1 had cognitive impairment, psychotic episodes, anxiety, and tetralogy of Fallot (TOF); while proband P2 had juvenile rheumatoid arthritis but no other major clinical findings. In P1, we identified common variants in COMT and PRODH on 22q11.2 as well as rare potentially deleterious DNA variants in other behavioral/neurocognitive genes. We also identified a de novo SNV in ADNP2 (NM_014913.3:c.2243G>C), encoding a neuroprotective protein that may be involved in behavioral disorders. In P2, we identified a novel non-synonymous SNV in ZFPM2 (NM_012082.3:c.1576C>T), a known causative gene for TOF, which may act as a protective variant downstream of TBX1, haploinsufficiency of which is responsible for congenital heart disease in individuals with 22q11DS. PMID:25981510

  20. Sequenced Integration and the Identification of a Problem-Solving Approach through a Learning Process

    ERIC Educational Resources Information Center

    Cormas, Peter C.

    2016-01-01

    Preservice teachers (N = 27) in two sections of a sequenced, methodological and process integrated mathematics/science course solved a levers problem with three similar learning processes and a problem-solving approach, and identified a problem-solving approach through one different learning process. Similar learning processes used included:…

  1. De novo assembly and characterization of skin transcriptome using RNAseq in sheep (Ovis aries).

    PubMed

    Yue, Y J; Liu, J B; Yang, M; Han, J L; Guo, T T; Guo, J; Feng, R L; Yang, B H

    2015-01-01

    Wool is produced via synthetic processes of wool follicles, which are embedded in the skin of sheep. The development of new-generation sequencing and RNA sequencing provides new approaches that may elucidate the molecular regulation mechanism of wool follicle development and facilitate enhanced selection for wool traits through gene-assisted selection or targeted gene manipulation. We performed de novo transcriptome sequencing of skin using the Illumina Hiseq 2000 sequencing system in sheep (Ovis aries). Transcriptome de novo assembly was carried out via short-read assembly programs, including SOAPdenovo and ESTScan. The protein function, clusters of orthologous group function, gene ontology function, metabolic pathway analysis, and protein coding region prediction of unigenes were annotated by BLASTx, BLAST2GO, and ESTScan. More than 26,266,670 clean reads were collected and assembled into 79,741 unigene sequences, with a final assembly length of 35,447,962 nucleotides. A total of 22,164 unigenes were annotated, accounting for 36.27% of the total number of unigenes, which were divided into 25 classes belonging to 218 signaling pathways. Among them, there were 17 signal paths related to hair follicle development. Based on mass sequencing data of sheepskin obtained by RNA-Seq, many unigenes were identified and annotated, which provides an excellent platform for future sheep genetic and functional genomic research. The data could be used for improving wool quality and as a model for human hair follicle development or disease prevention.

  2. Spectral-Statistical Approach for Revealing Latent Regular Structures in DNA Sequence.

    PubMed

    Chaley, Maria; Kutyrkin, Vladimir

    2016-01-01

    Methods of the spectral-statistical approach (2S-approach) for revealing latent periodicity in DNA sequences are described. The results of data analysis in the HeteroGenome database which collects the sequences similar to approximate tandem repeats in the genomes of model organisms are adduced. In consequence of further developing of the spectral-statistical approach, the techniques for recognizing latent profile periodicity are considered. These techniques are basing on extension of the notion of approximate tandem repeat. Examples of correlation of latent profile periodicity revealed in the CDSs with structural-functional properties in the proteins are given.

  3. [Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches].

    PubMed

    Lu, Cairui; Zou, Changsong; Song, Guoli

    2015-08-01

    Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.

  4. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach.

    PubMed

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (DeltaG (min)). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate DeltaG (min). This effectively excludes inappropriate sequences before DeltaG (min) is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (DeltaG (exp)) of 126 sequences correlated well with DeltaG (min) (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java.

  5. Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    NASA Astrophysics Data System (ADS)

    Hofmann, Hansjörg; Sakti, Sakriani; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi; Minker, Wolfgang

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  6. De novo proteins from designed combinatorial libraries

    PubMed Central

    Hecht, Michael H.; Das, Aditi; Go, Abigail; Bradley, Luke H.; Wei, Yinan

    2004-01-01

    Combinatorial libraries of de novo amino acid sequences can provide a rich source of diversity for the discovery of novel proteins with interesting and important activities. Randomly generated sequences, however, rarely fold into well-ordered proteinlike structures. To enhance the quality of a library, features of rational design must be used to focus sequence diversity into those regions of sequence space that are most likely to yield folded structures. This review describes how focused libraries can be constructed by designing the binary pattern of polar and nonpolar amino acids to favor proteins that contain abundant secondary structure, while simultaneously burying hydrophobic side chains and exposing hydrophilic side chains to solvent. The “binary code” for protein design was used to construct several libraries of de novo proteins, including both α-helical and β-sheet structures. The recently determined solution structure of a binary patterned four-helix bundle is well ordered, thereby demonstrating that sequences that have neither been selected by evolution (in vivo or in vitro) nor designed by computer can form nativelike proteins. Examples are presented demonstrating how binary patterned libraries have successfully produced well-ordered structures, cofactor binding, catalytic activity, self-assembled monolayers, amyloid-like nanofibrils, and protein-based biomaterials. PMID:15215517

  7. A Simple Multiplex PCR Approach for Target Enrichment in Next-Gen Sequencing

    PubMed Central

    Zhu, Qi; Hebel, Chris; Zhou, Xiaochun

    2014-01-01

    Multiplexing PCR is a simple way to extract genomic regions of interest for various medical and genetic tests. Somatic mutations lead to various diseases including cancer. These mutations are unlikely to be best detected using regular whole genome sequencing. Clinical samples often consist of disease cells, e.g. cancer cells, surrounded by normal cells. Thus, deep sequencing of hundreds to thousands fold coverage is required to detect the mutations. In clinical research many doctors are interested in specific genes or genomic regions and they want to extract the regions from genomic DNA or RNA before sequencing. Many current clinical, forensic, and heretical genetic test workflows start with multiplexing PCR to extract genetic marker carrying regions from whole genomes before running hybridization, sequencing, or electrophoresis tests to identify the markers. Personal medicine and prognosis mostly involve examining sequence variations of a number of targeted genes and metabolic pathway genes so as to predict drug efficacies and drug toxicities. We have developed a new multiplexing PCR approach with a significantly simplified workflow and significantly improved robustness. When applied to sequencing target enrichment application, the workflow for producing amplified targets involves only one hands-on step and one PCR run. The approach is designed to require low sample input and to produce superior amplicon uniformity and sequence specificity. The approach involves a novel primer design and a proprietary reaction composition. A PCR run consists of two functionally separated reaction phases, namely target capture and library amplification, without any hands-on step in between. The performance of the new approach will be demonstrated by a caner panel data.

  8. A flexible and economical barcoding approach for highly multiplexed amplicon sequencing of diverse target genes.

    PubMed

    Herbold, Craig W; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander

    2015-01-01

    High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305

  9. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    PubMed

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from <1% to 53% of amino acids in each gene exhibiting at least one substitution within the pool of samples. The UL23 gene had one of the highest genetic variabilities at 35.2% in keeping with its role in development of drug resistance. The assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal. PMID:27309375

  10. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies

    PubMed Central

    Karamitros, Timokratis; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from <1% to 53% of amino acids in each gene exhibiting at least one substitution within the pool of samples. The UL23 gene had one of the highest genetic variabilities at 35.2% in keeping with its role in development of drug resistance. The assembly of accurate, full-length HHV-1 genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal. PMID:27309375

  11. The potential cost-effectiveness of the Diamondback 360® Coronary Orbital Atherectomy System for treating de novo, severely calcified coronary lesions: an economic modeling approach

    PubMed Central

    Chambers, Jeffrey; Généreux, Philippe; Lee, Arthur; Lewin, Jack; Young, Christopher; Crittendon, Janna; Mann, Marita; Garrison, Louis P.

    2015-01-01

    Background: Patients who undergo percutaneous coronary intervention (PCI) for severely calcified coronary lesions have long been known to have worse clinical and economic outcomes than patients with no or mildly calcified lesions. We sought to assess the likely cost-effectiveness of using the Diamondback 360® Orbital Atherectomy System (OAS) in the treatment of de novo, severely calcified lesions from a health-system perspective. Methods and results: In the absence of a head-to-head trial and long-term follow up, cost-effectiveness was based on a modeled synthesis of clinical and economic data. A cost-effectiveness model was used to project the likely economic impact. To estimate the net cost impact, the cost of using the OAS technology in elderly (⩾ 65 years) Medicare patients with de novo severely calcified lesions was compared with cost offsets. Elderly OAS patients from the ORBIT II trial (Evaluate the Safety and Efficacy of OAS in Treating Severely Calcified Coronary Lesions) [ClinicalTrials.gov identifier: NCT01092426] were indirectly compared with similar patients using observational data. For the index procedure, the comparison was with Medicare data, and for both revascularization and cardiac death in the following year, the comparison was with a pooled analysis of the Harmonizing Outcomes with Revascularization and Stents in Acute Myocardial Infarction (HORIZONS-AMI)/Acute Catheterization and Urgent Intervention Triage Strategy (ACUITY) trials. After adjusting for differences in age, gender, and comorbidities, the ORBIT II mean index procedure costs were 17% (p < 0.001) lower, approximately US$2700. Estimated mean revascularization costs were lower by US$1240 in the base case. These cost offsets in the first year, on average, fully cover the cost of the device with an additional 1.2% cost savings. Even in the low-value scenario, the use of the OAS is cost-effective with a cost per life-year gained of US$11,895. Conclusions: Based on economic modeling

  12. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    PubMed

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  13. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  14. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  15. Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi

    PubMed Central

    Cacho, Ralph A.; Tang, Yi; Chooi, Yit-Heng

    2015-01-01

    Genomics has revolutionized the research on fungal secondary metabolite (SM) biosynthesis. To elucidate the molecular and enzymatic mechanisms underlying the biosynthesis of a specific SM compound, the important first step is often to find the genes that responsible for its synthesis. The accessibility to fungal genome sequences allows the bypass of the cumbersome traditional library construction and screening approach. The advance in next-generation sequencing (NGS) technologies have further improved the speed and reduced the cost of microbial genome sequencing in the past few years, which has accelerated the research in this field. Here, we will present an example work flow for identifying the gene cluster encoding the biosynthesis of SMs of interest using an NGS approach. We will also review the different strategies that can be employed to pinpoint the targeted gene clusters rapidly by giving several examples stemming from our work. PMID:25642215

  16. Identification of molecular motors in the Woods Hole squid, Loligo pealei: an expressed sequence tag approach.

    PubMed

    DeGiorgis, Joseph A; Cavaliere, Kimberly R; Burbach, J Peter H

    2011-10-01

    The squid giant axon and synapse are unique systems for studying neuronal function. While a few nucleotide and amino acid sequences have been obtained from squid, large scale genetic and proteomic information is lacking. We have been particularly interested in motors present in axons and their roles in transport processes. Here, to obtain genetic data and to identify motors expressed in squid, we initiated an expressed sequence tag project by single-pass sequencing mRNAs isolated from the stellate ganglia of the Woods Hole Squid, Loligo pealei. A total of 22,689 high quality expressed sequence tag (EST) sequences were obtained and subjected to basic local alignment search tool analysis. Seventy six percent of these sequences matched genes in the National Center for Bioinformatics databases. By CAP3 analysis this library contained 2459 contigs and 7568 singletons. Mining for motors successfully identified six kinesins, six myosins, a single dynein heavy chain, as well as components of the dynactin complex, and motor light chains and accessory proteins. This initiative demonstrates that EST projects represent an effective approach to obtain sequences of interest.

  17. Magnetism Teaching Sequences Based on an Inductive Approach for First-Year Thai University Science Students

    ERIC Educational Resources Information Center

    Narjaikaew, Pattawan; Emarat, Narumon; Arayathanitkul, Kwan; Cowie, Bronwen

    2010-01-01

    The study investigated the impact on student motivation and understanding of magnetism of teaching sequences based on an inductive approach. The study was conducted in large lecture classes. A pre- and post-Conceptual Survey of Electricity and Magnetism was conducted with just fewer than 700 Thai undergraduate science students, before and after…

  18. Developing Scope and Sequence for the Gifted Learner: A Comprehensive Approach.

    ERIC Educational Resources Information Center

    VanTassel-Baska, Joyce; Campbell, Myrtle

    1988-01-01

    A comprehensive curriculum-development program which covers grades K-12 can ensure a meaningful scope and sequence of experiences for gifted learners. The experience of the Gary Community School Corporation and other Indiana communities with such an approach is described. Eight steps from needs assessment to implementing the model are presented.…

  19. A Time Sequence-Oriented Concept Map Approach to Developing Educational Computer Games for History Courses

    ERIC Educational Resources Information Center

    Chu, Hui-Chun; Yang, Kai-Hsiang; Chen, Jing-Hong

    2015-01-01

    Concept maps have been recognized as an effective tool for students to organize their knowledge; however, in history courses, it is important for students to learn and organize historical events according to the time of their occurrence. Therefore, in this study, a time sequence-oriented concept map approach is proposed for developing a game-based…

  20. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  1. A De novo Transcriptomic Approach to Identify Flavonoids and Anthocyanins "Switch-Off" in Olive (Olea europaea L.) Drupes at Different Stages of Maturation.

    PubMed

    Iaria, Domenico L; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2015-01-01

    Highlights A de novo transcriptome reconstruction of olive drupes was performed in two genotypesGene expression was monitored during drupe development in two olive cultivarsTranscripts involved in flavonoid and anthocyanin pathways were analyzed in Cassanese and Leucocarpa cultivarsBoth cultivar and developmental stage impact gene expression in Olea europaea fruits. During ripening, the fruits of the olive tree (Olea europaea L.) undergo a progressive chromatic change characterized by the formation of a red-brown "spot" which gradually extends on the epidermis and in the innermost part of the mesocarp. This event finds an exception in the Leucocarpa cultivar, in which we observe a destabilized equilibrium between the metabolisms of chlorophyll and other pigments, particularly the anthocyanins whose switch-off during maturation promotes the white coloration of fruits. Despite its importance, genomic information on the olive tree is still lacking. Different RNA-seq libraries were generated from drupes of "Leucocarpa" and "Cassanese" olive genotypes, sampled at 100 and 130 days after flowering (DAF), and were used in order to identify transcripts involved in the main phenotypic changes of fruits during maturation and their corresponding expression patterns. A total of 103,359 transcripts were obtained and 3792 and 3064 were differentially expressed in "Leucocarpa" and "Cassanese" genotypes, respectively, during 100-130 DAF transition. Among them flavonoid and anthocyanin related transcripts such as phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate-CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonol 3'-hydrogenase (F3'H), flavonol 3'5 '-hydrogenase (F3'5'H), flavonol synthase (FLS), dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS), UDP-glucose:anthocianidin: flavonoid glucosyltransferase (UFGT) were identified. These results contribute to reducing the current gap in

  2. A De novo Transcriptomic Approach to Identify Flavonoids and Anthocyanins "Switch-Off" in Olive (Olea europaea L.) Drupes at Different Stages of Maturation.

    PubMed

    Iaria, Domenico L; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2015-01-01

    Highlights A de novo transcriptome reconstruction of olive drupes was performed in two genotypesGene expression was monitored during drupe development in two olive cultivarsTranscripts involved in flavonoid and anthocyanin pathways were analyzed in Cassanese and Leucocarpa cultivarsBoth cultivar and developmental stage impact gene expression in Olea europaea fruits. During ripening, the fruits of the olive tree (Olea europaea L.) undergo a progressive chromatic change characterized by the formation of a red-brown "spot" which gradually extends on the epidermis and in the innermost part of the mesocarp. This event finds an exception in the Leucocarpa cultivar, in which we observe a destabilized equilibrium between the metabolisms of chlorophyll and other pigments, particularly the anthocyanins whose switch-off during maturation promotes the white coloration of fruits. Despite its importance, genomic information on the olive tree is still lacking. Different RNA-seq libraries were generated from drupes of "Leucocarpa" and "Cassanese" olive genotypes, sampled at 100 and 130 days after flowering (DAF), and were used in order to identify transcripts involved in the main phenotypic changes of fruits during maturation and their corresponding expression patterns. A total of 103,359 transcripts were obtained and 3792 and 3064 were differentially expressed in "Leucocarpa" and "Cassanese" genotypes, respectively, during 100-130 DAF transition. Among them flavonoid and anthocyanin related transcripts such as phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate-CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonol 3'-hydrogenase (F3'H), flavonol 3'5 '-hydrogenase (F3'5'H), flavonol synthase (FLS), dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS), UDP-glucose:anthocianidin: flavonoid glucosyltransferase (UFGT) were identified. These results contribute to reducing the current gap in

  3. Characterization of GM events by insert knowledge adapted re-sequencing approaches.

    PubMed

    Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

    2013-01-01

    Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events. PMID:24088728

  4. Characterization of GM events by insert knowledge adapted re-sequencing approaches

    PubMed Central

    Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

    2013-01-01

    Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events. PMID:24088728

  5. Deciphering the microbiota of Tuwa hot spring, India using shotgun metagenomic sequencing approach.

    PubMed

    Mangrola, Amitsinh; Dudhagara, Pravin; Koringa, Prakash; Joshi, C G; Parmar, Mansi; Patel, Rajesh

    2015-06-01

    Here, we report metagenome from the Tuwa hot spring, India using shotgun sequencing approach. Metagenome consisted of 541,379 sequences with 98.7 Mbps size with 46% G + C content. Metagenomic sequence reads were deposited into the EMBL database under accession number ERP009321. Community analysis presented 99.1% sequences belong to bacteria, 0.3% of eukaryotic origin, 0.2% virus derived and 0.05% from archea. Unclassified and unidentified sequences were 0.4% and 0.07% respectively. A total of 22 bacterial phyla include 90 families and 201 species were observed in the hot spring metagenome. Firmicutes (97.0%), Proteobacteria (1.3%) and Actinobacteria (0.4%) were reported as dominant bacterial phyla. In functional analysis using Cluster of Orthologous Group (COG), 21.5% drops in the poorly characterized group. Using subsystem based annotation, 4.0% genes were assigned for stress responses and 3% genes were fit into the metabolism of aromatic compounds. The hot spring metagenome is very rich with novel sequences affiliated to unclassified and unidentified lineages, suggesting the potential source for novel microbial species and their products. PMID:26484204

  6. Deciphering the microbiota of Tuwa hot spring, India using shotgun metagenomic sequencing approach

    PubMed Central

    Mangrola, Amitsinh; Dudhagara, Pravin; Koringa, Prakash; Joshi, C.G.; Parmar, Mansi; Patel, Rajesh

    2015-01-01

    Here, we report metagenome from the Tuwa hot spring, India using shotgun sequencing approach. Metagenome consisted of 541,379 sequences with 98.7 Mbps size with 46% G + C content. Metagenomic sequence reads were deposited into the EMBL database under accession number ERP009321. Community analysis presented 99.1% sequences belong to bacteria, 0.3% of eukaryotic origin, 0.2% virus derived and 0.05% from archea. Unclassified and unidentified sequences were 0.4% and 0.07% respectively. A total of 22 bacterial phyla include 90 families and 201 species were observed in the hot spring metagenome. Firmicutes (97.0%), Proteobacteria (1.3%) and Actinobacteria (0.4%) were reported as dominant bacterial phyla. In functional analysis using Cluster of Orthologous Group (COG), 21.5% drops in the poorly characterized group. Using subsystem based annotation, 4.0% genes were assigned for stress responses and 3% genes were fit into the metabolism of aromatic compounds. The hot spring metagenome is very rich with novel sequences affiliated to unclassified and unidentified lineages, suggesting the potential source for novel microbial species and their products. PMID:26484204

  7. A long PCR–based approach for DNA enrichment prior to next-generation sequencing for systematic studies1

    PubMed Central

    Uribe-Convers, Simon; Duke, Justin R.; Moore, Michael J.; Tank, David C.

    2014-01-01

    • Premise of the study: We present an alternative approach for molecular systematic studies that combines long PCR and next-generation sequencing. Our approach can be used to generate templates from any DNA source for next-generation sequencing. Here we test our approach by amplifying complete chloroplast genomes, and we present a set of 58 potentially universal primers for angiosperms to do so. Additionally, this approach is likely to be particularly useful for nuclear and mitochondrial regions. • Methods and Results: Chloroplast genomes of 30 species across angiosperms were amplified to test our approach. Amplification success varied depending on whether PCR conditions were optimized for a given taxon. To further test our approach, some amplicons were sequenced on an Illumina HiSeq 2000. • Conclusions: Although here we tested this approach by sequencing plastomes, long PCR amplicons could be generated using DNA from any genome, expanding the possibilities of this approach for molecular systematic studies. PMID:25202592

  8. Open questions in the study of de novo genes: what, how and why.

    PubMed

    McLysaght, Aoife; Hurst, Laurence D

    2016-09-01

    The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes. PMID:27452112

  9. Interpreting de novo variation in human disease using denovolyzeR

    PubMed Central

    Samocha, Kaitlin E.; Homsy, Jason; Daly, Mark J.

    2015-01-01

    Spontaneously arising (de novo) genetic variants are important in human disease, yet every individual carries many such variants, with a median of 1 de novo variant affecting the protein-coding portion of the genome. A recently described mutational model (Samocha et al., 2014) provides a powerful framework for the robust statistical evaluation of such coding variants, enabling the interpretation of de novo variation in human disease. Here we describe a new open-source software package, denovolyzeR, that implements this model and provides tools for the analysis of de novo coding sequence variants. PMID:26439716

  10. A De novo Transcriptomic Approach to Identify Flavonoids and Anthocyanins “Switch-Off” in Olive (Olea europaea L.) Drupes at Different Stages of Maturation

    PubMed Central

    Iaria, Domenico L.; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2016-01-01

    Highlights A de novo transcriptome reconstruction of olive drupes was performed in two genotypesGene expression was monitored during drupe development in two olive cultivarsTranscripts involved in flavonoid and anthocyanin pathways were analyzed in Cassanese and Leucocarpa cultivarsBoth cultivar and developmental stage impact gene expression in Olea europaea fruits. During ripening, the fruits of the olive tree (Olea europaea L.) undergo a progressive chromatic change characterized by the formation of a red-brown “spot” which gradually extends on the epidermis and in the innermost part of the mesocarp. This event finds an exception in the Leucocarpa cultivar, in which we observe a destabilized equilibrium between the metabolisms of chlorophyll and other pigments, particularly the anthocyanins whose switch-off during maturation promotes the white coloration of fruits. Despite its importance, genomic information on the olive tree is still lacking. Different RNA-seq libraries were generated from drupes of “Leucocarpa” and “Cassanese” olive genotypes, sampled at 100 and 130 days after flowering (DAF), and were used in order to identify transcripts involved in the main phenotypic changes of fruits during maturation and their corresponding expression patterns. A total of 103,359 transcripts were obtained and 3792 and 3064 were differentially expressed in “Leucocarpa” and “Cassanese” genotypes, respectively, during 100–130 DAF transition. Among them flavonoid and anthocyanin related transcripts such as phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate-CoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), flavonol 3′-hydrogenase (F3′H), flavonol 3′5 ′-hydrogenase (F3′5′H), flavonol synthase (FLS), dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS), UDP-glucose:anthocianidin: flavonoid glucosyltransferase (UFGT) were identified. These results contribute

  11. "Patient counselling" by pharmacists: four approaches to the delivery of counselling sequences and their interactional reception.

    PubMed

    Pilnick, Alison

    2003-02-01

    'Patient counselling' by pharmacists is a diverse and ill-defined activity. It is also an activity which is achieving more prominence as part of the 'extended role' which is seen as the way forward for the profession. This paper uses data from a hospital paediatric outpatient clinic in the United Kingdom to examine the process of patient counselling from a conversation analytic standpoint, with a particular focus on the varying ways in which these sequences are set up and the ways in which patients or carers respond. Four types of interactional approach to negotiating entry into a broadly defined 'patient counselling' sequence are identified. These approaches are considered within the broader frameworks of delicacy, morality and competence which impact upon the giving and receiving of advice and information more generally, as well as in this setting, and in the light of the continued development of the 'extended role'. PMID:12560016

  12. A Mixed-Integer Optimization Framework for De Novo Peptide Identification

    PubMed Central

    DiMaggio, Peter A.

    2009-01-01

    A novel methodology for the de novo identification of peptides by mixed-integer optimization and tandem mass spectrometry is presented in this article. The various features of the mathematical model are presented and examples are used to illustrate the key concepts of the proposed approach. Several problems are examined to illustrate the proposed method's ability to address (1) residue-dependent fragmentation properties and (2) the variability of resolution in different mass analyzers. A preprocessing algorithm is used to identify important m/z values in the tandem mass spectrum. Missing peaks, resulting from residue-dependent fragmentation characteristics, are dealt with using a two-stage algorithmic framework. A cross-correlation approach is used to resolve missing amino acid assignments and to identify the most probable peptide by comparing the theoretical spectra of the candidate sequences that were generated from the MILP sequencing stages with the experimental tandem mass spectrum. PMID:19412358

  13. Critical review of NGS analyses for de novo genotyping multigene families.

    PubMed

    Lighten, Jackie; van Oosterhout, Cock; Bentzen, Paul

    2014-08-01

    The genotyping of highly polymorphic multigene families across many individuals used to be a particularly challenging task because of methodological limitations associated with traditional approaches. Next-generation sequencing (NGS) can overcome most of these limitations, and it is increasingly being applied in population genetic studies of multigene families. Here, we critically review NGS bioinformatic approaches that have been used to genotype the major histocompatibility complex (MHC) immune genes, and we discuss how the significant advances made in this field are applicable to population genetic studies of gene families. Increasingly, approaches are introduced that apply thresholds of sequencing depth and sequence similarity to separate alleles from methodological artefacts. We explain why these approaches are particularly sensitive to methodological biases by violating fundamental genotyping assumptions. An alternative strategy that utilizes ultra-deep sequencing (hundreds to thousands of sequences per amplicon) to reconstruct genotypes and applies statistical methods on the sequencing depth to separate alleles from artefacts appears to be more robust. Importantly, the 'degree of change' (DOC) method avoids using arbitrary cut-off thresholds by looking for statistical boundaries between the sequencing depth for alleles and artefacts, and hence, it is entirely repeatable across studies. Although the advances made in generating NGS data are still far ahead of our ability to perform reliable processing, analysis and interpretation, the community is developing statistically rigorous protocols that will allow us to address novel questions in evolution, ecology and genetics of multigene families. Future developments in third-generation single molecule sequencing may potentially help overcome problems that still persist in de novo multigene amplicon genotyping when using current second-generation sequencing approaches.

  14. A novel conceptual approach to read-filtering in high-throughput amplicon sequencing studies.

    PubMed

    Puente-Sánchez, Fernando; Aguirre, Jacobo; Parro, Víctor

    2016-02-29

    Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate reconstructions of the original biological sequences. Herein we present the Poisson binomial filtering algorithm (PBF), which minimizes both problems by calculating the error-probability distribution of a sequence from its quality scores. In order to validate our method, we quality-filtered 37 publicly available datasets obtained by sequencing mock and environmental microbial communities with the Roche 454, Illumina MiSeq and IonTorrent PGM platforms, and compared our results to those obtained with previous approaches such as the ones included in mothur, QIIME and USEARCH. Our algorithm retained substantially more reads than its predecessors, while resulting in fewer and more accurate OTUs. This improved sensitiveness produced more faithful representations, both quantitatively and qualitatively, of the true microbial diversity present in the studied samples. Furthermore, the method introduced in this work is computationally inexpensive and can be readily applied in conjunction with any existent analysis pipeline.

  15. A novel conceptual approach to read-filtering in high-throughput amplicon sequencing studies

    PubMed Central

    Puente-Sánchez, Fernando; Aguirre, Jacobo; Parro, Víctor

    2016-01-01

    Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate reconstructions of the original biological sequences. Herein we present the Poisson binomial filtering algorithm (PBF), which minimizes both problems by calculating the error-probability distribution of a sequence from its quality scores. In order to validate our method, we quality-filtered 37 publicly available datasets obtained by sequencing mock and environmental microbial communities with the Roche 454, Illumina MiSeq and IonTorrent PGM platforms, and compared our results to those obtained with previous approaches such as the ones included in mothur, QIIME and USEARCH. Our algorithm retained substantially more reads than its predecessors, while resulting in fewer and more accurate OTUs. This improved sensitiveness produced more faithful representations, both quantitatively and qualitatively, of the true microbial diversity present in the studied samples. Furthermore, the method introduced in this work is computationally inexpensive and can be readily applied in conjunction with any existent analysis pipeline. PMID:26553806

  16. A work stealing based approach for enabling scalable optimal sequence homology detection

    SciTech Connect

    Daily, Jeffrey A.; Kalyanaraman, Anantharaman; Krishnamoorthy, Sriram; Vishnu, Abhinav

    2015-05-01

    Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. Here, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for 2.56M sequences on up to 8K cores show parallel efficiencies of ~ 75-100%, a time-to-solution of 33s, and a rate of ~ 2.0M alignments per second.

  17. DECIPHER, a search-based approach to chimera identification for 16S rRNA sequences.

    PubMed

    Wright, Erik S; Yilmaz, L Safak; Noguera, Daniel R

    2012-02-01

    DECIPHER is a new method for finding 16S rRNA chimeric sequences by the use of a search-based approach. The method is based upon detecting short fragments that are uncommon in the phylogenetic group where a query sequence is classified but frequently found in another phylogenetic group. The algorithm was calibrated for full sequences (fs_DECIPHER) and short sequences (ss_DECIPHER) and benchmarked against WigeoN (Pintail), ChimeraSlayer, and Uchime using artificially generated chimeras. Overall, ss_DECIPHER and Uchime provided the highest chimera detection for sequences 100 to 600 nucleotides long (79% and 81%, respectively), but Uchime's performance deteriorated for longer sequences, while ss_DECIPHER maintained a high detection rate (89%). Both methods had low false-positive rates (1.3% and 1.6%). The more conservative fs_DECIPHER, benchmarked only for sequences longer than 600 nucleotides, had an overall detection rate lower than that of ss_DECIPHER (75%) but higher than those of the other programs. In addition, fs_DECIPHER had the lowest false-positive rate among all the benchmarked programs (<0.20%). DECIPHER was outperformed only by ChimeraSlayer and Uchime when chimeras were formed from closely related parents (less than 10% divergence). Given the differences in the programs, it was possible to detect over 89% of all chimeras with just the combination of ss_DECIPHER and Uchime. Using fs_DECIPHER, we detected between 1% and 2% additional chimeras in the RDP, SILVA, and Greengenes databases from which chimeras had already been removed with Pintail or Bellerophon. DECIPHER was implemented in the R programming language and is directly accessible through a webpage or by downloading the program as an R package (http://DECIPHER.cee.wisc.edu).

  18. Identification of purple sea urchin telomerase RNA using a next-generation sequencing based approach.

    PubMed

    Li, Yang; Podlevsky, Joshua D; Marz, Manja; Qi, Xiaodong; Hoffmann, Steve; Stadler, Peter F; Chen, Julian J-L

    2013-06-01

    Telomerase is a ribonucleoprotein (RNP) enzyme essential for telomere maintenance and chromosome stability. While the catalytic telomerase reverse transcriptase (TERT) protein is well conserved across eukaryotes, telomerase RNA (TR) is extensively divergent in size, sequence, and structure. This diversity prohibits TR identification from many important organisms. Here we report a novel approach for TR discovery that combines in vitro TR enrichment from total RNA, next-generation sequencing, and a computational screening pipeline. With this approach, we have successfully identified TR from Strongylocentrotus purpuratus (purple sea urchin) from the phylum Echinodermata. Reconstitution of activity in vitro confirmed that this RNA is an integral component of sea urchin telomerase. Comparative phylogenetic analysis against vertebrate TR sequences revealed that the purple sea urchin TR contains vertebrate-like template-pseudoknot and H/ACA domains. While lacking a vertebrate-like CR4/5 domain, sea urchin TR has a unique central domain critical for telomerase activity. This is the first TR identified from the previously unexplored invertebrate clade and provides the first glimpse of TR evolution in the deuterostome lineage. Moreover, our TR discovery approach is a significant step toward the comprehensive understanding of telomerase RNP evolution.

  19. De novo variants in sporadic cases of childhood onset schizophrenia

    PubMed Central

    Ambalavanan, Amirthagowri; Girard, Simon L; Ahn, Kwangmi; Zhou, Sirui; Dionne-Laporte, Alexandre; Spiegelman, Dan; Bourassa, Cynthia V; Gauthier, Julie; Hamdan, Fadi F; Xiong, Lan; Dion, Patrick A; Joober, Ridha; Rapoport, Judith; Rouleau, Guy A

    2016-01-01

    Childhood-onset schizophrenia (COS), defined by the onset of illness before age 13 years, is a rare severe neurodevelopmental disorder of unknown etiology. Recently, sequencing studies have identified rare, potentially causative de novo variants in sporadic cases of adult-onset schizophrenia and autism. In this study, we performed exome sequencing of 17 COS trios in order to test whether de novo variants could contribute to this disease. We identified 20 de novo variants in 17 COS probands, which is consistent with the de novo mutation rate reported in the adult form of the disease. Interestingly, the missense de novo variants in COS have a high likelihood for pathogenicity and were enriched for genes that are less tolerant to variants. Among the genes found disrupted in our study, SEZ6, RYR2, GPR153, GTF2IRD1, TTBK1 and ITGA6 have been previously linked to neuronal function or to psychiatric disorders, and thus may be considered as COS candidate genes. PMID:26508570

  20. De novo variants in sporadic cases of childhood onset schizophrenia.

    PubMed

    Ambalavanan, Amirthagowri; Girard, Simon L; Ahn, Kwangmi; Zhou, Sirui; Dionne-Laporte, Alexandre; Spiegelman, Dan; Bourassa, Cynthia V; Gauthier, Julie; Hamdan, Fadi F; Xiong, Lan; Dion, Patrick A; Joober, Ridha; Rapoport, Judith; Rouleau, Guy A

    2016-06-01

    Childhood-onset schizophrenia (COS), defined by the onset of illness before age 13 years, is a rare severe neurodevelopmental disorder of unknown etiology. Recently, sequencing studies have identified rare, potentially causative de novo variants in sporadic cases of adult-onset schizophrenia and autism. In this study, we performed exome sequencing of 17 COS trios in order to test whether de novo variants could contribute to this disease. We identified 20 de novo variants in 17 COS probands, which is consistent with the de novo mutation rate reported in the adult form of the disease. Interestingly, the missense de novo variants in COS have a high likelihood for pathogenicity and were enriched for genes that are less tolerant to variants. Among the genes found disrupted in our study, SEZ6, RYR2, GPR153, GTF2IRD1, TTBK1 and ITGA6 have been previously linked to neuronal function or to psychiatric disorders, and thus may be considered as COS candidate genes. PMID:26508570

  1. Interpreting the role of de novo protein-coding mutations in neuropsychiatric disease.

    PubMed

    Gratten, Jacob; Visscher, Peter M; Mowry, Bryan J; Wray, Naomi R

    2013-03-01

    Pedigree, linkage and association studies are consistent with heritable variation for complex disease due to the segregation of genetic factors in families and in the population. In contrast, de novo mutations make only minor contributions to heritability estimates for complex traits. Nonetheless, some de novo variants are known to be important in disease etiology. The identification of risk-conferring de novo variants will contribute to the discovery of etiologically relevant genes and pathways and may help in genetic counseling. There is considerable interest in the role of such mutations in complex neuropsychiatric disease, largely driven by new genotyping and sequencing technologies. An important role for large de novo copy number variations has been established. Recently, whole-exome sequencing has been used to extend the investigation of de novo variation to point mutations in protein-coding regions. Here, we consider several challenges for the interpretation of such mutations in the context of their role in neuropsychiatric disease. PMID:23438595

  2. De novo gene mutations highlight patterns of genetic and neural complexity in schizophrenia.

    PubMed

    Xu, Bin; Ionita-Laza, Iuliana; Roos, J Louw; Boone, Braden; Woodrick, Scarlet; Sun, Yan; Levy, Shawn; Gogos, Joseph A; Karayiorgou, Maria

    2012-12-01

    To evaluate evidence for de novo etiologies in schizophrenia, we sequenced at high coverage the exomes of families recruited from two populations with distinct demographic structures and history. We sequenced a total of 795 exomes from 231 parent-proband trios enriched for sporadic schizophrenia cases, as well as 34 unaffected trios. We observed in cases an excess of de novo nonsynonymous single-nucleotide variants as well as a higher prevalence of gene-disruptive de novo mutations relative to controls. We found four genes (LAMA2, DPYD, TRRAP and VPS39) affected by recurrent de novo events within or across the two populations, which is unlikely to have occurred by chance. We show that de novo mutations affect genes with diverse functions and developmental profiles, but we also find a substantial contribution of mutations in genes with higher expression in early fetal life. Our results help define the genomic and neural architecture of schizophrenia. PMID:23042115

  3. MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.

    PubMed

    Wang, Ying; Hu, Haiyan; Li, Xiaoman

    2016-08-01

    Metagenomics is a next-generation omics field currently impacting postgenomic life sciences and medicine. Binning metagenomic reads is essential for the understanding of microbial function, compositions, and interactions in given environments. Despite the existence of dozens of computational methods for metagenomic read binning, it is still very challenging to bin reads. This is especially true for reads from unknown species, from species with similar abundance, and/or from low-abundance species in environmental samples. In this study, we developed a novel taxonomy-dependent and alignment-free approach called MBMC (Metagenomic Binning by Markov Chains). Different from all existing methods, MBMC bins reads by measuring the similarity of reads to the trained Markov chains for different taxa instead of directly comparing reads with known genomic sequences. By testing on more than 24 simulated and experimental datasets with species of similar abundance, species of low abundance, and/or unknown species, we report here that MBMC reliably grouped reads from different species into separate bins. Compared with four existing approaches, we demonstrated that the performance of MBMC was comparable with existing approaches when binning reads from sequenced species, and superior to existing approaches when binning reads from unknown species. MBMC is a pivotal tool for binning metagenomic reads in the current era of Big Data and postgenomic integrative biology. The MBMC software can be freely downloaded at http://hulab.ucf.edu/research/projects/metagenomics/MBMC.html . PMID:27447888

  4. Comparison of two approaches for the classification of 16S rRNA gene sequences.

    PubMed

    Chatellier, Sonia; Mugnier, Nathalie; Allard, Françoise; Bonnaud, Bertrand; Collin, Valérie; van Belkum, Alex; Veyrieras, Jean-Baptiste; Emler, Stefan

    2014-10-01

    The use of 16S rRNA gene sequences for microbial identification in clinical microbiology is accepted widely, and requires databases and algorithms. We compared a new research database containing curated 16S rRNA gene sequences in combination with the lca (lowest common ancestor) algorithm (RDB-LCA) to a commercially available 16S rDNA Centroid approach. We used 1025 bacterial isolates characterized by biochemistry, matrix-assisted laser desorption/ionization time-of-flight MS and 16S rDNA sequencing. Nearly 80 % of isolates were identified unambiguously at the species level by both classification platforms used. The remaining isolates were mostly identified correctly at the genus level due to the limited resolution of 16S rDNA sequencing. Discrepancies between both 16S rDNA platforms were due to differences in database content and the algorithm used, and could amount to up to 10.5 %. Up to 1.4 % of the analyses were found to be inconclusive. It is important to realize that despite the overall good performance of the pipelines for analysis, some inconclusive results remain that require additional in-depth analysis performed using supplementary methods.

  5. An unsupervised approach for measuring myocardial perfusion in MR image sequences

    NASA Astrophysics Data System (ADS)

    Discher, Antoine; Rougon, Nicolas; Preteux, Francoise

    2005-08-01

    Quantitatively assessing myocardial perfusion is a key issue for the diagnosis, therapeutic planning and patient follow-up of cardio-vascular diseases. To this end, perfusion MRI (p-MRI) has emerged as a valuable clinical investigation tool thanks to its ability of dynamically imaging the first pass of a contrast bolus in the framework of stress/rest exams. However, reliable techniques for automatically computing regional first pass curves from 2D short-axis cardiac p-MRI sequences remain to be elaborated. We address this problem and develop an unsupervised four-step approach comprising: (i) a coarse spatio-temporal segmentation step, allowing to automatically detect a region of interest for the heart over the whole sequence, and to select a reference frame with maximal myocardium contrast; (ii) a model-based variational segmentation step of the reference frame, yielding a bi-ventricular partition of the heart into left ventricle, right ventricle and myocardium components; (iii) a respiratory/cardiac motion artifacts compensation step using a novel region-driven intensity-based non rigid registration technique, allowing to elastically propagate the reference bi-ventricular segmentation over the whole sequence; (iv) a measurement step, delivering first-pass curves over each region of a segmental model of the myocardium. The performance of this approach is assessed over a database of 15 normal and pathological subjects, and compared with perfusion measurements delivered by a MRI manufacturer software package based on manual delineations by a medical expert.

  6. Comparative analysis of de novo transcriptome assembly.

    PubMed

    Clarke, Kaitlin; Yang, Yi; Marsh, Ronald; Xie, Linglin; Zhang, Ke K

    2013-02-01

    The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis. A fast algorithm, de Bruijn graph has been successfully used for genome DNA de novo assembly; nevertheless, its performance for transcriptome assembly is unclear. In this study, we used both simulated and real RNA-Seq data, from either artificial RNA templates or human transcripts, to evaluate five de novo assemblers, ABySS, Mira, Trinity, Velvet and Oases. Of these assemblers, ABySS, Trinity, Velvet and Oases are all based on de Bruijn graph, and Mira uses an overlap graph algorithm. Various numbers of RNA short reads were selected from the External RNA Control Consortium (ERCC) data and human chromosome 22. A number of statistics were then calculated for the resulting contigs from each assembler. Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate. Trinity had relative good performance for both ERCC and human data, but it may not consistently generate full length transcripts. ABySS was the fastest method but its assembly quality was low. Mira gave a good rate for mapping its contigs onto human chromosome 22, but its computational speed is not satisfactory. Our results suggest that transcript assembly remains a challenge problem for bioinformatics society. Therefore, a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique.

  7. Comparative analysis of de novo transcriptome assembly.

    PubMed

    Clarke, Kaitlin; Yang, Yi; Marsh, Ronald; Xie, Linglin; Zhang, Ke K

    2013-02-01

    The fast development of next-generation sequencing technology presents a major computational challenge for data processing and analysis. A fast algorithm, de Bruijn graph has been successfully used for genome DNA de novo assembly; nevertheless, its performance for transcriptome assembly is unclear. In this study, we used both simulated and real RNA-Seq data, from either artificial RNA templates or human transcripts, to evaluate five de novo assemblers, ABySS, Mira, Trinity, Velvet and Oases. Of these assemblers, ABySS, Trinity, Velvet and Oases are all based on de Bruijn graph, and Mira uses an overlap graph algorithm. Various numbers of RNA short reads were selected from the External RNA Control Consortium (ERCC) data and human chromosome 22. A number of statistics were then calculated for the resulting contigs from each assembler. Each experiment was repeated multiple times to obtain the mean statistics and standard error estimate. Trinity had relative good performance for both ERCC and human data, but it may not consistently generate full length transcripts. ABySS was the fastest method but its assembly quality was low. Mira gave a good rate for mapping its contigs onto human chromosome 22, but its computational speed is not satisfactory. Our results suggest that transcript assembly remains a challenge problem for bioinformatics society. Therefore, a novel assembler is in need for assembling transcriptome data generated by next generation sequencing technique. PMID:23393031

  8. Prospects for de novo phasing with de novo protein models

    SciTech Connect

    Das, Rhiju Baker, David

    2009-02-01

    In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. The prospect of phasing diffraction data sets ‘de novo’ for proteins with previously unseen folds is appealing but largely untested. In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. 15 new cases of diffraction data sets that are unambiguously phased with de novo models are presented. These diffraction data sets represent nine space groups and span a large range of solvent contents (33–79%) and asymmetric unit copy numbers (1–4). No correlation is observed between the ease of phasing and the solvent content or asymmetric unit copy number. Instead, a weak correlation is found with the length of the modeled protein: larger proteins required somewhat less accurate models to give successful molecular replacement. Overall, the results of this survey suggest that de novo models can phase diffraction data for approximately one sixth of proteins with sizes of 100 residues or less. However, for many of these cases, ‘de novo phasing with de novo models’ requires significant investment of computational power, much greater than 10{sup 3} CPU days per target. Improvements in conformational search methods will be necessary if molecular replacement with de novo models is to become a practical tool for targets without homology to previously solved protein structures.

  9. Using next generation sequencing approaches for the isolation of simple sequence repeats (SSF) in the plant sciences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The application of next-generation sequencing (NGS) technologies for the development of simple sequence repeat (SSR) or microsatellite loci for genetic research in the botanical sciences is described. The major advantage of using NGS methods to isolate SSR loci is their ability to quickly and cost-e...

  10. Medical target prediction from genome sequence: combining different sequence analysis algorithms with expert knowledge and input from artificial intelligence approaches.

    PubMed

    Dandekar, T; Du, F; Schirmer, R H; Schmidt, S

    2001-12-01

    By exploiting the rapid increase in available sequence data, the definition of medically relevant protein targets has been improved by a combination of: (i) differential genome analysis (target list): and (ii) analysis of individual proteins (target analysis). Fast sequence comparisons, data mining, and genetic algorithms further promote these procedures. Mycobacterium tuberculosis proteins were chosen as applied examples.

  11. Genovo: De Novo Assembly for Metagenomes

    NASA Astrophysics Data System (ADS)

    Laserson, Jonathan; Jojic, Vladimir; Koller, Daphne

    Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A Chinese restaurant process prior accounts for the unknown number of genomes in the sample. Inference is made by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo's reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly score.

  12. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches

    PubMed Central

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J.; Sommer, Simone

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele’s amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  13. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    PubMed

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J; Sommer, Simone; Godoy, José A

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  14. Installing hydrolytic activity into a completely de novo protein framework

    NASA Astrophysics Data System (ADS)

    Burton, Antony J.; Thomson, Andrew R.; Dawson, William M.; Brady, R. Leo; Woolfson, Derek N.

    2016-09-01

    The design of enzyme-like catalysts tests our understanding of sequence-to-structure/function relationships in proteins. Here we install hydrolytic activity predictably into a completely de novo and thermostable α-helical barrel, which comprises seven helices arranged around an accessible channel. We show that the lumen of the barrel accepts 21 mutations to functional polar residues. The resulting variant, which has cysteine-histidine-glutamic acid triads on each helix, hydrolyses p-nitrophenyl acetate with catalytic efficiencies that match the most-efficient redesigned hydrolases based on natural protein scaffolds. This is the first report of a functional catalytic triad engineered into a de novo protein framework. The flexibility of our system also allows the facile incorporation of unnatural side chains to improve activity and probe the catalytic mechanism. Such a predictable and robust construction of truly de novo biocatalysts holds promise for applications in chemical and biochemical synthesis.

  15. Installing hydrolytic activity into a completely de novo protein framework

    NASA Astrophysics Data System (ADS)

    Burton, Antony J.; Thomson, Andrew R.; Dawson, William M.; Brady, R. Leo; Woolfson, Derek N.

    2016-09-01

    The design of enzyme-like catalysts tests our understanding of sequence-to-structure/function relationships in proteins. Here we install hydrolytic activity predictably into a completely de novo and thermostable α-helical barrel, which comprises seven helices arranged around an accessible channel. We show that the lumen of the barrel accepts 21 mutations to functional polar residues. The resulting variant, which has cysteine–histidine–glutamic acid triads on each helix, hydrolyses p-nitrophenyl acetate with catalytic efficiencies that match the most-efficient redesigned hydrolases based on natural protein scaffolds. This is the first report of a functional catalytic triad engineered into a de novo protein framework. The flexibility of our system also allows the facile incorporation of unnatural side chains to improve activity and probe the catalytic mechanism. Such a predictable and robust construction of truly de novo biocatalysts holds promise for applications in chemical and biochemical synthesis.

  16. Installing hydrolytic activity into a completely de novo protein framework.

    PubMed

    Burton, Antony J; Thomson, Andrew R; Dawson, William M; Brady, R Leo; Woolfson, Derek N

    2016-09-01

    The design of enzyme-like catalysts tests our understanding of sequence-to-structure/function relationships in proteins. Here we install hydrolytic activity predictably into a completely de novo and thermostable α-helical barrel, which comprises seven helices arranged around an accessible channel. We show that the lumen of the barrel accepts 21 mutations to functional polar residues. The resulting variant, which has cysteine-histidine-glutamic acid triads on each helix, hydrolyses p-nitrophenyl acetate with catalytic efficiencies that match the most-efficient redesigned hydrolases based on natural protein scaffolds. This is the first report of a functional catalytic triad engineered into a de novo protein framework. The flexibility of our system also allows the facile incorporation of unnatural side chains to improve activity and probe the catalytic mechanism. Such a predictable and robust construction of truly de novo biocatalysts holds promise for applications in chemical and biochemical synthesis. PMID:27554410

  17. From the double-helix to novel approaches to the sequencing of large genomes.

    PubMed

    Szybalski, W

    1993-12-15

    Elucidation of the structure of DNA by Watson and Crick [Nature 171 (1953) 737-738] has led to many crucial molecular experiments, including studies on DNA replication, transcription, physical mapping, and most recently to serious attempts directed toward the sequencing of large genomes [Watson, Science 248 (1990) 44-49]. I am totally convinced of the great importance of the Human Genome Project, and toward achieving this goal I strongly favor 'top-down' approaches consisting of the physical mapping and preparation of contiguous 50-100-kb fragments directly from the genome, followed by their automated sequencing based on the rapid assembly of primers by hexamer ligation together with primer walking. Our 'top-down' procedures totally avoids conventional cloning, subcloning and random sequencing, which are the elements of the present 'bottom-up' procedures. Fragments of 50-100 kb are prepared in sufficient quantities either by in vitro excision with rare-cutting restriction systems (including Achilles' heel cleavage [AC] or the RecA-AC procedures of Koob et al. [Nucleic Acids Res. 20 (1992) 5831-5836]) or by in vivo excision and amplification using the yeast FRT/Flp system or the phage lambda att/Int system. Such fragments, when derived directly from the Escherichia coli genome, are arranged in consecutive order, so that 50 specially constructed strains of E. coli would supply 50 end-to-end arranged approx. 100-kb fragments, which will cover the entire approx. 5-Mb E. coli genome. For the 150-Mb Drosophila melanogaster genome, 1500 of such consecutive 100-kb fragments (supplied by 1500 strains) are required to cover the entire genome. The fragments will be sequenced by the SPEL-6 method involving hexamer ligation [Szybalski, Gene 90 (1990) 177-178; Fresenius J. Anal. Chem. 4 (1992) 343] and primer walking. The 18-mer primers are synthesized in only a few minutes from three contiguous hexamers annealed to the DNA strand to be sequenced when using an over 100-fold

  18. Computational Approach to Annotating Variants of Unknown Significance in Clinical Next Generation Sequencing.

    PubMed

    Schulz, Wade L; Tormey, Christopher A; Torres, Richard

    2015-01-01

    Next generation sequencing (NGS) has become a common technology in the clinical laboratory, particularly for the analysis of malignant neoplasms. However, most mutations identified by NGS are variants of unknown clinical significance (VOUS). Although the approach to define these variants differs by institution, software algorithms that predict variant effect on protein function may be used. However, these algorithms commonly generate conflicting results, potentially adding uncertainty to interpretation. In this review, we examine several computational tools used to predict whether a variant has clinical significance. In addition to describing the role of these tools in clinical diagnostics, we assess their efficacy in analyzing known pathogenic and benign variants in hematologic malignancies.

  19. De Novo Lipogenesis Products and Endogenous Lipokines.

    PubMed

    Yilmaz, Mustafa; Claiborn, Kathryn C; Hotamisligil, Gökhan S

    2016-07-01

    Recent studies have shown that in addition to their traditionally recognized functions as building blocks, energy stores, or hazardous intermediates, lipids also have the ability to act as signaling molecules with potent effects on systemic metabolism and metabolic diseases. This Perspective highlights this somewhat less apparent biology of lipids, especially focusing on de novo lipogenesis as a process that gives rise to key messenger molecules mediating interorgan communication. Elucidating the mechanisms of lipid-dependent coordination of metabolism promises invaluable insights into the understanding of metabolic diseases and may contribute to the development of a new generation of preventative and therapeutic approaches. PMID:27288005

  20. An RNA-based approach to sequence the mitogenome of Hypoptopoma incognitum (Siluriformes: Loricariidae).

    PubMed

    Moreira, Daniel Andrade; Magalhães, Maithê G P; de Andrade, Paula C C; Furtado, Carolina; Val, Adalberto L; Parente, Thiago Estevam

    2016-09-01

    Hypoptopoma incognitum is a fish of the fifth most species-rich family of vertebrates and abundant in rivers from the Brazilian Amazon. Only two species of Loricariidae fish have their complete mitogenomes sequence deposited in the Genbank. An innovative RNA-based approach was used to assemble the complete mitogenome of H. incognitum with an average coverage depth of 5292×. The typical vertebrate mitochondrial features were found; 22 tRNA genes, two rRNA genes, 13 protein-coding genes, and a non-coding control region. Moreover, the use of this approach allowed the measurement of mtRNA expression levels, the punctuation pattern of editing, and the detection of heteroplasmies. PMID:26370305

  1. A 454 sequencing approach for large scale phylogenomic analysis of the common emperor scorpion (Pandinus imperator).

    PubMed

    Roeding, Falko; Borner, Janus; Kube, Michael; Klages, Sven; Reinhardt, Richard; Burmester, Thorsten

    2009-12-01

    In recent years, phylogenetic tree reconstructions that rely on multiple gene alignments that had been deduced from expressed sequence tags (ESTs) have become a popular method in molecular systematics. Here, we present a 454 pyrosequencing approach to infer the transcriptome of the Emperor scorpion Pandinus imperator. We obtained 428,844 high-quality reads (mean length=223+/-50 b) from total cDNA, which were assembled into 8334 contigs (mean length 422+/-313 bp) and 26,147 singletons. About 1200 contigs were successfully annotated by BLAST and orthology search. Specific analyses of eight distinct hemocyanin sequences provided further proof for the quality of the 454 reads and the assembly process. The P. imperator sequences were included in a concatenated alignment of 149 orthologous genes of 67 metazoan taxa that covers 39,842 amino acids. After removal of low-quality regions, 11,168 positions were employed for phylogenetic reconstructions. Using Bayesian and maximum likelihood methods, we obtained strongly supported monophyletic Ecdysozoa, Arthropoda (excluding Tardigrada), Euarthropoda, Pancrustacea and Hexapoda. We also recovered the Myriochelata (Chelicerata+Myriapoda). Within the chelicerates, Pycnogonida form the sister group of Euchelicerata. However, Arachnida were found paraphyletic because the Acari (mites and ticks) were recovered as sister group of a clade comprising Xiphosura, Scorpiones and Araneae. In summary, we have shown that 454 pyrosequencing is a cost-effective method that provides sufficient data and coverage depth for gene detection and multigene-based phylogenetic analyses. PMID:19695333

  2. Statistical Approaches to Detecting and Analyzing Tandem Repeats in Genomic Sequences

    PubMed Central

    Anisimova, Maria; Pečerska, Julija; Schaper, Elke

    2015-01-01

    Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features. PMID:25853125

  3. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

    PubMed

    Chatterjee, Piyali; Basu, Subhadip; Zubek, Julian; Kundu, Mahantapas; Nasipuri, Mita; Plewczynski, Dariusz

    2016-04-01

    The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

  4. Enzyme-like replication de novo in a microcontroller environment.

    PubMed

    Tangen, Uwe

    2010-01-01

    The desire to start evolution from scratch inside a computer memory is as old as computing. Here we demonstrate how viable computer programs can be established de novo in a Precambrian environment without supplying any specific instantiation, just starting with random bit sequences. These programs are not self-replicators, but act much more like catalysts. The microcontrollers used in the end are the result of a long series of simplifications. The objective of this simplification process was to produce universal machines with a human-readable interface, allowing software and/or hardware evolution to be studied. The power of the instruction set can be modified by introducing a secondary structure-folding mechanism, which is a state machine, allowing nontrivial replication to emerge with an instruction width of only a few bits. This state-machine approach not only attenuates the problems of brittleness and encoding functionality (too few bits available for coding, and too many instructions needed); it also enables the study of hardware evolution as such. Furthermore, the instruction set is sufficiently powerful to permit external signals to be processed. This information-theoretic approach forms one vertex of a triangle alongside artificial cell research and experimental research on the creation of life. Hopefully this work helps develop an understanding of how information—in a similar sense to the account of functional information described by Hazen et al.—is created by evolution and how this information interacts with or is embedded in its physico-chemical environment. PMID:20712511

  5. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  6. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques

    PubMed Central

    Veneziano, Dario; Nigita, Giovanni; Ferro, Alfredo

    2015-01-01

    The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living organisms and have been discovered to play important roles in many biological processes, there has been an ever increasing need to investigate the entire ncRNAome in further unbiased detail. Recently, the advent of next-generation sequencing (NGS) technologies has substantially increased the throughput of transcriptome studies, allowing an unprecedented investigation of ncRNAs, as regulatory pathways and novel functions involving ncRNAs are now also emerging. The huge amount of transcript data produced by NGS has progressively required the development and implementation of suitable bioinformatics workflows, complemented by knowledge-based approaches, to identify, classify, and evaluate the expression of hundreds of ncRNAs in normal and pathological conditions, such as cancer. In this mini-review, we present and discuss current bioinformatics advances in the development of such computational approaches to analyze and classify the ncRNA component of human transcriptome sequence data obtained from NGS technologies. PMID:26090362

  7. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques.

    PubMed

    Veneziano, Dario; Nigita, Giovanni; Ferro, Alfredo

    2015-01-01

    The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living organisms and have been discovered to play important roles in many biological processes, there has been an ever increasing need to investigate the entire ncRNAome in further unbiased detail. Recently, the advent of next-generation sequencing (NGS) technologies has substantially increased the throughput of transcriptome studies, allowing an unprecedented investigation of ncRNAs, as regulatory pathways and novel functions involving ncRNAs are now also emerging. The huge amount of transcript data produced by NGS has progressively required the development and implementation of suitable bioinformatics workflows, complemented by knowledge-based approaches, to identify, classify, and evaluate the expression of hundreds of ncRNAs in normal and pathological conditions, such as cancer. In this mini-review, we present and discuss current bioinformatics advances in the development of such computational approaches to analyze and classify the ncRNA component of human transcriptome sequence data obtained from NGS technologies. PMID:26090362

  8. Approaches to the detection of recessive effects using next generation sequencing data from outbred populations.

    PubMed

    Curtis, David

    2013-01-01

    Conventional methods to analyze genome-wide association studies and whole exome or whole genome sequencing studies would be prone to overlook variants which might exert a recessive effect on risk of disease, either as homozygotes or compound heterozygotes. It is plausible that such effects may be common even in outbred populations. An approach is described which is based on identifying a set of variants in a gene as being potentially of interest and then testing whether there is an excess of cases who are either homozygotes or complex heterozygotes for these variants. Methods based on departure from Hardy-Weinberg equilibrium are more powerful than those which compare cases to controls. However, linkage disequilibrium between variants can be difficult to deal with if phase is unknown. A simple approach for discarding variants apparently in strong linkage disequilibrium with others is proposed. The procedure is simple and quick to apply so can be used in the context of whole genome or exome sequencing studies and is implemented in the SCOREASSOC program.

  9. New massive parallel sequencing approach improves the genetic characterization of congenital myopathies.

    PubMed

    Oliveira, Jorge; Gonçalves, Ana; Taipa, Ricardo; Melo-Pires, Manuel; Oliveira, Márcia E; Costa, José Luís; Machado, José Carlos; Medeiros, Elmira; Coelho, Teresa; Santos, Manuela; Santos, Rosário; Sousa, Mário

    2016-06-01

    Congenital myopathies (CMs) are a heterogeneous group of muscle diseases characterized by hypotonia, delayed motor skills and muscle weakness with onset during the first years of life. The diagnostic workup of CM is highly dependent on the interpretation of the muscle histology, where typical pathognomonic findings are suggestive of a CM but are not necessarily gene specific. Over 20 loci have been linked to these myopathies, including three exceptionally large genes (TTN, NEB and RYR1), which are a challenge for molecular diagnosis. We developed a new approach using massive parallel sequencing (MPS) technology to simultaneously analyze 20 genes linked to CMs. Assay design was based on the Ion AmpliSeq strategy and sequencing runs were performed on an Ion PGM system. A total of 12 patients were analyzed in this study. Among the 2534 variants detected, 14 pathogenic mutations were successfully identified in the DNM2, NEB, RYR1, SEPN1 and TTN genes. Most of these had not been documented and/or fully characterized, hereby contributing to expand the CM mutational spectrum. The utility of this approach was demonstrated by the identification of mutations in 70% of the patients included in this study, which is relevant for CMs especially considering its wide phenotypic and genetic heterogeneity. PMID:26841830

  10. "Polymeromics": Mass spectrometry based strategies in polymer science toward complete sequencing approaches: a review.

    PubMed

    Altuntaş, Esra; Schubert, Ulrich S

    2014-01-15

    Mass spectrometry (MS) is the most versatile and comprehensive method in "OMICS" sciences (i.e. in proteomics, genomics, metabolomics and lipidomics). The applications of MS and tandem MS (MS/MS or MS(n)) provide sequence information of the full complement of biological samples in order to understand the importance of the sequences on their precise and specific functions. Nowadays, the control of polymer sequences and their accurate characterization is one of the significant challenges of current polymer science. Therefore, a similar approach can be very beneficial for characterizing and understanding the complex structures of synthetic macromolecules. MS-based strategies allow a relatively precise examination of polymeric structures (e.g. their molar mass distributions, monomer units, side chain substituents, end-group functionalities, and copolymer compositions). Moreover, tandem MS offer accurate structural information from intricate macromolecular structures; however, it produces vast amount of data to interpret. In "OMICS" sciences, the software application to interpret the obtained data has developed satisfyingly (e.g. in proteomics), because it is not possible to handle the amount of data acquired via (tandem) MS studies on the biological samples manually. It can be expected that special software tools will improve the interpretation of (tandem) MS output from the investigations of synthetic polymers as well. Eventually, the MS/MS field will also open up for polymer scientists who are not MS-specialists. In this review, we dissect the overall framework of the MS and MS/MS analysis of synthetic polymers into its key components. We discuss the fundamentals of polymer analyses as well as recent advances in the areas of tandem mass spectrometry, software developments, and the overall future perspectives on the way to polymer sequencing, one of the last Holy Grail in polymer science.

  11. Assessing protein kinase target similarity: Comparing sequence, structure, and cheminformatics approaches.

    PubMed

    Gani, Osman A; Thakkar, Balmukund; Narayanan, Dilip; Alam, Kazi A; Kyomuhendo, Peter; Rothweiler, Ulli; Tello-Franco, Veronica; Engh, Richard A

    2015-10-01

    In just over two decades, structure based protein kinase inhibitor discovery has grown from trial and error approaches, using individual target structures, to structure and data driven approaches that may aim to optimize inhibition properties across several targets. This is increasingly enabled by the growing availability of potent compounds and kinome-wide binding data. Assessing the prospects for adapting known compounds to new therapeutic uses is thus a key priority for current drug discovery efforts. Tools that can successfully link the diverse information regarding target sequence, structure, and ligand binding properties now accompany a transformation of protein kinase inhibitor research, away from single, block-buster drug models, and toward "personalized medicine" with niche applications and highly specialized research groups. Major hurdles for the transformation to data driven drug discovery include mismatches in data types, and disparities of methods and molecules used; at the core remains the problem that ligand binding energies cannot be predicted precisely from individual structures. However, there is a growing body of experimental data for increasingly successful focussing of efforts: focussed chemical libraries, drug repurposing, polypharmacological design, to name a few. Protein kinase target similarity is easily quantified by sequence, and its relevance to ligand design includes broad classification by key binding sites, evaluation of resistance mutations, and the use of surrogate proteins. Although structural evaluation offers more information, the flexibility of protein kinases, and differences between the crystal and physiological environments may make the use of crystal structures misleading when structures are considered individually. Cheminformatics may enable the "calibration" of sequence and crystal structure information, with statistical methods able to identify key correlates to activity but also here, "the devil is in the details

  12. De Novo Proteins with Life-Sustaining Functions Are Structurally Dynamic.

    PubMed

    Murphy, Grant S; Greisman, Jack B; Hecht, Michael H

    2016-01-29

    Designing and producing novel proteins that fold into stable structures and provide essential biological functions are key goals in synthetic biology. In initial steps toward achieving these goals, we constructed a combinatorial library of de novo proteins designed to fold into 4-helix bundles. As described previously, screening this library for sequences that function in vivo to rescue conditionally lethal mutants of Escherichia coli (auxotrophs) yielded several de novo sequences, termed SynRescue proteins, which rescued four different E. coli auxotrophs. In an effort to understand the structural requirements necessary for auxotroph rescue, we investigated the biophysical properties of the SynRescue proteins, using both computational and experimental approaches. Results from circular dichroism, size-exclusion chromatography, and NMR demonstrate that the SynRescue proteins are α-helical and relatively stable. Surprisingly, however, they do not form well-ordered structures. Instead, they form dynamic structures tha