average sequence length: Topics by Science.gov

Sample records for average sequence length

Final progress report, Construction of a genome-wide highly characterized clone resource for genome sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nierman, William C.

At TIGR, the human Bacterial Artificial Chromosome (BAC) end sequencing and trimming were with an overall sequencing success rate of 65%. CalTech human BAC libraries A, B, C and D as well as Roswell Park Cancer Institute's library RPCI-11 were used. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length {approx}460 bp for a total of 141 Mb covering {approx}4.7% of the genome. Over sixty percent of the clones have BAC end sequences (BESs) from both ends representing over five-fold coverage of the genome by the paired-end clones. The average phredmore » Q20 length is {approx}400 bp. This high accuracy makes our BESs match the human finished sequences with an average identity of 99% and a match length of 450 bp, and a frequency of one match per 12.8 kb contig sequence. Our sample tracking has ensured a clone tracking accuracy of >90%, which gives researchers a high confidence in (1) retrieving the right clone from the BA C libraries based on the sequence matches; and (2) building a minimum tiling path of sequence-ready clones across the genome and genome assembly scaffolds.« less
The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes.

PubMed

Angly, Florent E; Willner, Dana; Prieto-Davó, Alejandra; Edwards, Robert A; Schmieder, Robert; Vega-Thurber, Rebecca; Antonopoulos, Dionysios A; Barott, Katie; Cottrell, Matthew T; Desnues, Christelle; Dinsdale, Elizabeth A; Furlan, Mike; Haynes, Matthew; Henn, Matthew R; Hu, Yongfei; Kirchman, David L; McDole, Tracey; McPherson, John D; Meyer, Folker; Miller, R Michael; Mundt, Egbert; Naviaux, Robert K; Rodriguez-Mueller, Beltran; Stevens, Rick; Wegley, Linda; Zhang, Lixin; Zhu, Baoli; Rohwer, Forest

2009-12-01

Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.
Genome Survey Sequencing for the Characterization of the Genetic Background of Rosa roxburghii Tratt and Leaf Ascorbate Metabolism Genes.

PubMed

Lu, Min; An, Huaming; Li, Liangliang

2016-01-01

Rosa roxburghii Tratt is an important commercial horticultural crop in China that is recognized for its nutritional and medicinal values. In spite of the economic significance, genomic information on this rose species is currently unavailable. In the present research, a genome survey of R. roxburghii was carried out using next-generation sequencing (NGS) technologies. Total 30.29 Gb sequence data was obtained by HiSeq 2500 sequencing and an estimated genome size of R. roxburghii was 480.97 Mb, in which the guanine plus cytosine (GC) content was calculated to be 38.63%. All of these reads were technically assembled and a total of 627,554 contigs with a N50 length of 1.484 kb and furthermore 335,902 scaffolds with a total length of 409.36 Mb were obtained. Transposable elements (TE) sequence of 90.84 Mb which comprised 29.20% of the genome, and 167,859 simple sequence repeats (SSRs) were identified from the scaffolds. Among these, the mono-(66.30%), di-(25.67%), and tri-(6.64%) nucleotide repeats contributed to nearly 99% of the SSRs, and sequence motifs AG/CT (28.81%) and GAA/TTC (14.76%) were the most abundant among the dinucleotide and trinucleotide repeat motifs, respectively. Genome analysis predicted a total of 22,721 genes which have an average length of 2311.52 bp, an average exon length of 228.15 bp, and average intron length of 401.18 bp. Eleven genes putatively involved in ascorbate metabolism were identified and its expression in R. roxburghii leaves was validated by quantitative real-time PCR (qRT-PCR). This is the first report of genome-wide characterization of this rose species.
Using specific length amplified fragment sequencing to construct the high-density genetic map for Vitis (Vitis vinifera L. × Vitis amurensis Rupr.).

PubMed

Guo, Yinshan; Shi, Guangli; Liu, Zhendong; Zhao, Yuhui; Yang, Xiaoxu; Zhu, Junchi; Li, Kun; Guo, Xiuwu

2015-01-01

In this study, 149 F1 plants from the interspecific cross between 'Red Globe' (Vitis vinifera L.) and 'Shuangyou' (Vitis amurensis Rupr.) and the parent were used to construct a molecular genetic linkage map by using the specific length amplified fragment sequencing technique. DNA sequencing generated 41.282 Gb data consisting of 206,411,693 paired-end reads. The average sequencing depths were 68.35 for 'Red Globe,' 63.65 for 'Shuangyou,' and 8.01 for each progeny. In all, 115,629 high-quality specific length amplified fragments were detected, of which 42,279 were polymorphic. The genetic map was constructed using 7,199 of these polymorphic markers. These polymorphic markers were assigned to 19 linkage groups; the total length of the map was 1929.13 cm, with an average distance of 0.28 cm between each maker. To our knowledge, the genetic maps constructed in this study contain the largest number of molecular markers. These high-density genetic maps might form the basis for the fine quantitative trait loci mapping and molecular-assisted breeding of grape.
Using specific length amplified fragment sequencing to construct the high-density genetic map for Vitis (Vitis vinifera L. × Vitis amurensis Rupr.)

PubMed Central

Guo, Yinshan; Shi, Guangli; Liu, Zhendong; Zhao, Yuhui; Yang, Xiaoxu; Zhu, Junchi; Li, Kun; Guo, Xiuwu

2015-01-01

In this study, 149 F1 plants from the interspecific cross between ‘Red Globe’ (Vitis vinifera L.) and ‘Shuangyou’ (Vitis amurensis Rupr.) and the parent were used to construct a molecular genetic linkage map by using the specific length amplified fragment sequencing technique. DNA sequencing generated 41.282 Gb data consisting of 206,411,693 paired-end reads. The average sequencing depths were 68.35 for ‘Red Globe,’ 63.65 for ‘Shuangyou,’ and 8.01 for each progeny. In all, 115,629 high-quality specific length amplified fragments were detected, of which 42,279 were polymorphic. The genetic map was constructed using 7,199 of these polymorphic markers. These polymorphic markers were assigned to 19 linkage groups; the total length of the map was 1929.13 cm, with an average distance of 0.28 cm between each maker. To our knowledge, the genetic maps constructed in this study contain the largest number of molecular markers. These high-density genetic maps might form the basis for the fine quantitative trait loci mapping and molecular-assisted breeding of grape. PMID:26089826
[Intestinal fungal diversity of sub-adult giant panda].

PubMed

Ai, Shengquan; Zhong, Zhijun; Peng, Guangneng; Wang, Chengdong; Luo, Yongjiu; He, Tingmei; Gu, Wuyang; Li, Caiwu; Li, Gangshi; Wu, Honglin; Liu, Xuehan; Xia, Yu; Liu, Yanhong; Zhou, Xiaoxiao

2014-11-04

The fungi diversity in the guts of five sub-adult giant pandas was analyzed. We analyzed the fungal internal transcribed spacer sequences (ITS) using restriction fragment length polymorphism (RFLP). ITS regions were amplified with fungal universal primers to construct ITS clone libraries. The fingerprints were analyzed by restriction fragment length polymorphism using the Hha I and Hae III enzymes. The cloned PCR products were analyzed by sequencing and diversities were demonstrated by phylogenetic tree. The gut fungi of 5 sub-adult giant pandas were mainly composed of Ascomycota (average of 46.24%), Basidiomycota ( average of 15.79%), unclassified (average of 29.14%), uncultured fungus (average of 8.83% ). Ascomycota was mainly composed of Saccharomycetes (average of 63.74%) and Dothideomycetes ( average of 35.91%); Basidiomycota was mainly composed of Tremellomycetes (average of 65.80%) and Microbotryomycetes (average of 33.15%). Four classes were mainly composed of Candida and Debaryomyces; Pleosporales and Myriangium; Cystofilobasidium and Trichosporon; Leucosporidium, and Leucosporidiella, whereas the proportions were different for each sample. Fungal flora existing in the intestines of sub-adult giant pandas expand our knowledge on the structure of the giant panda gut microbes and also help us to further study whether fungal flora can help giant pandas digest high-fiber foods.
SMRT sequencing of the Vitis vinifera cv. ‘Flame seedless’ genome using a SMRTbell-free library preparation from Swift Biosciences

USDA-ARS?s Scientific Manuscript database

Single Molecule Real-Time (SMRT) sequencing provides advantages to the sequencing of complex genomes. The long reads generated are superior for resolving complex genomic regions and provide highly contiguous de novo assemblies. Current SMRTbell libraries generate average read lengths of 10-15kb. How...
Modeling participation duration, with application to the North American Breeding Bird Survey

USGS Publications Warehouse

Link, William; Sauer, John

2014-01-01

We consider “participation histories,” binary sequences consisting of alternating finite sequences of 1s and 0s, ending with an infinite sequence of 0s. Our work is motivated by a study of observer tenure in the North American Breeding Bird Survey (BBS). In our analysis, j indexes an observer’s years of service and Xj is an indicator of participation in the survey; 0s interspersed among 1s correspond to years when observers did not participate, but subsequently returned to service. Of interest is the observer’s duration D = max {j: Xj = 1}. Because observed records X = (X1, X2,..., Xn)1 are of finite length, all that we can directly infer about duration is that D ⩾ max {j ⩽n: Xj = 1}; model-based analysis is required for inference about D. We propose models in which lengths of 0s and 1s sequences have distributions determined by the index j at which they begin; 0s sequences are infinite with positive probability, an estimable parameter. We found that BBS observers’ lengths of service vary greatly, with 25.3% participating for only a single year, 49.5% serving for 4 or fewer years, and an average duration of 8.7 years, producing an average of 7.7 counts.
End-to-end distance and contour length distribution functions of DNA helices

NASA Astrophysics Data System (ADS)

Zoli, Marco

2018-06-01

I present a computational method to evaluate the end-to-end and the contour length distribution functions of short DNA molecules described by a mesoscopic Hamiltonian. The method generates a large statistical ensemble of possible configurations for each dimer in the sequence, selects the global equilibrium twist conformation for the molecule, and determines the average base pair distances along the molecule backbone. Integrating over the base pair radial and angular fluctuations, I derive the room temperature distribution functions as a function of the sequence length. The obtained values for the most probable end-to-end distance and contour length distance, providing a measure of the global molecule size, are used to examine the DNA flexibility at short length scales. It is found that, also in molecules with less than ˜60 base pairs, coiled configurations maintain a large statistical weight and, consistently, the persistence lengths may be much smaller than in kilo-base DNA.
On the normalization of the minimum free energy of RNAs by sequence length.

PubMed

Trotta, Edoardo

2014-01-01

The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.
On the Normalization of the Minimum Free Energy of RNAs by Sequence Length

PubMed Central

Trotta, Edoardo

2014-01-01

The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size. PMID:25405875
On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

PubMed Central

Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

2013-01-01

The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608
Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

PubMed Central

Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

2012-01-01

To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944
Transcriptome of the Caribbean stony coral Porites astreoides from three developmental stages.

PubMed

Mansour, Tamer A; Rosenthal, Joshua J C; Brown, C Titus; Roberson, Loretta M

2016-08-02

Porites astreoides is a ubiquitous species of coral on modern Caribbean reefs that is resistant to increasing temperatures, overfishing, and other anthropogenic impacts that have threatened most other coral species. We assembled and annotated a transcriptome from this coral using Illumina sequences from three different developmental stages collected over several years: free-swimming larvae, newly settled larvae, and adults (>10 cm in diameter). This resource will aid understanding of coral calcification, larval settlement, and host-symbiont interactions. A de novo transcriptome for the P. astreoides holobiont (coral plus algal symbiont) was assembled using 594 Mbp of raw Illumina sequencing data generated from five age-specific cDNA libraries. The new transcriptome consists of 867 255 transcript elements with an average length of 685 bases. The isolated P. astreoides assembly consists of 129 718 transcript elements with an average length of 811 bases, and the isolated Symbiodinium sp. assembly had 186 177 transcript elements with an average length of 1105 bases. This contribution to coral transcriptome data provides a valuable resource for researchers studying the ontogeny of gene expression patterns within both the coral and its dinoflagellate symbiont.
Formation of rings from segments of HeLa-cell nuclear deoxyribonucleic acid

PubMed Central

Hardman, Norman

1974-01-01

Duplex segments of HeLa-cell nuclear DNA were generated by cleavage with DNA restriction endonuclease from Haemophilus influenzae. About 20–25% of the DNA segments produced, when partly degraded with exonuclease III and annealed, were found to form rings visible in the electron microscope. A further 5% of the DNA segments formed structures that were branched in configuration. Similar structures were generated from HeLa-cell DNA, without prior treatment with restriction endonuclease, when the complementary polynucleotide chains were exposed by exonuclease III action at single-chain nicks. After exposure of an average single-chain length of 1400 nucleotides per terminus at nicks in HeLa-cell DNA by exonuclease III, followed by annealing, the physical length of ring closures was estimated and found to be 0.02–0.1μm, or 50–300 base pairs. An almost identical distribution of lengths was recorded for the regions of complementary base sequence responsible for branch formation. It is proposed that most of the rings and branches are formed from classes of reiterated base sequence with an average length of 180 base pairs arranged intermittenly in HeLa-cell DNA. From the rate of formation of branched structures when HeLa-cell DNA segments were heat-denatured and annealed, it is estimated that the reiterated sequences are in families containing approximately 2400–24000 copies. ImagesPLATE 2PLATE 1 PMID:4462738
Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

PubMed

Seal, B S; Neill, J D; Ridpath, J F

1994-07-01

Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.
Asymptotically optimum multialternative sequential procedures for discernment of processes minimizing average length of observations

NASA Astrophysics Data System (ADS)

Fishman, M. M.

1985-01-01

The problem of multialternative sequential discernment of processes is formulated in terms of conditionally optimum procedures minimizing the average length of observations, without any probabilistic assumptions about any one occurring process, rather than in terms of Bayes procedures minimizing the average risk. The problem is to find the procedure that will transform inequalities into equalities. The problem is formulated for various models of signal observation and data processing: (1) discernment of signals from background interference by a multichannel system; (2) discernment of pulse sequences with unknown time delay; (3) discernment of harmonic signals with unknown frequency. An asymptotically optimum sequential procedure is constructed which compares the statistics of the likelihood ratio with the mean-weighted likelihood ratio and estimates the upper bound for conditional average lengths of observations. This procedure is shown to remain valid as the upper bound for the probability of erroneous partial solutions decreases approaching zero and the number of hypotheses increases approaching infinity. It also remains valid under certain special constraints on the probability such as a threshold. A comparison with a fixed-length procedure reveals that this sequential procedure decreases the length of observations to one quarter, on the average, when the probability of erroneous partial solutions is low.
Bioinformatic analysis of phage AB3, a phiKMV-like virus infecting Acinetobacter baumannii.

PubMed

Zhang, J; Liu, X; Li, X-J

2015-01-16

The phages of Acinetobacter baumannii has drawn increasing attention because of the multi-drug resistance of A. baumanni. The aim of this study was to sequence Acinetobacter baumannii phage AB3 and conduct bioinformatic analysis to lay a foundation for genome remodeling and phage therapy. We isolated and sequenced A. baumannii phage AB3 and attempted to annotate and analyze its genome. The results showed that the genome is a double-stranded DNA with a total length of 31,185 base pairs (bp) and 97 open reading frames greater than 100 bp. The genome includes 28 predicted genes, of which 24 are homologous to phage AB1. The entire coding sequence is located on the negative strand, representing 90.8% of the total length. The G+C mol% was 39.18%, without areas of high G+C content over 200 bp in length. No GC island, tRNA gene, or repeated sequence was identified. Gene lengths were 120-3099 bp, with an average of 1011 bp. Six genes were found to be greater than 2000 bp in length. Genomic alignment and phylogenetic analysis of the RNA polymerase gene showed that similar to phage AB1, phage AB3 is a phiKMV-like virus in the T7 phage family.
Protein contact prediction using patterns of correlation.

PubMed

Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas

2004-09-01

We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.
On the error probability of general tree and trellis codes with applications to sequential decoding

NASA Technical Reports Server (NTRS)

Johannesson, R.

1973-01-01

An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.

Importance of length and sequence order on magnesium binding to surface-bound oligonucleotides studied by second harmonic generation and atomic force microscopy.

PubMed

Holland, Joseph G; Geiger, Franz M

2012-06-07

The binding of magnesium ions to surface-bound single-stranded oligonucleotides was studied under aqueous conditions using second harmonic generation (SHG) and atomic force microscopy (AFM). The effect of strand length on the number of Mg(II) ions bound and their free binding energy was examined for 5-, 10-, 15-, and 20-mers of adenine and guanine at pH 7, 298 K, and 10 mM NaCl. The binding free energies for adenine and guanine sequences were calculated to be -32.1(4) and -35.6(2) kJ/mol, respectively, and invariant with strand length. Furthermore, the ion density for adenine oligonucleotides did not change as strand length increased, with an average value of 2(1) ions/strand. In sharp contrast, guanine oligonucleotides displayed a linear relationship between strand length and ion density, suggesting that cooperativity is important. This data gives predictive capabilities for mixed strands of various lengths, which we exploit for 20-mers of adenines and guanines. In addition, the role sequence order plays in strands of hetero-oligonucleotides was examined for 5'-A(10)G(10)-3', 5'-(AG)(10)-3', and 5'-G(10)A(10)-3' (here the -3' end is chemically modified to bind to the surface). Although the free energy of binding is the same for these three strands (averaged to be -33.3(4) kJ/mol), the total ion density increases when several guanine residues are close to the 3' end (and thus close to the solid support substrate). To further understand these results, we analyzed the height profiles of the functionalized surfaces with tapping-mode atomic force microscopy (AFM). When comparing the average surface height profiles of the oligonucleotide surfaces pre- and post- Mg(II) binding, a positive correlation was found between ion density and the subsequent height decrease following Mg(II) binding, which we attribute to reductions in Coulomb repulsion and strand collapse once a critical number of Mg(II) ions are bound to the strand.
US Medical Student Performance on the NBME Subject Examination in Internal Medicine: Do Clerkship Sequence and Clerkship Length Matter?

PubMed

Ouyang, Wenli; Cuddy, Monica M; Swanson, David B

2015-09-01

Prior to graduation, US medical students are required to complete clinical clerkship rotations, most commonly in the specialty areas of family medicine, internal medicine, obstetrics and gynecology (ob/gyn), pediatrics, psychiatry, and surgery. Within a school, the sequence in which students complete these clerkships varies. In addition, the length of these rotations varies, both within a school for different clerkships and between schools for the same clerkship. The present study investigated the effects of clerkship sequence and length on performance on the National Board of Medical Examiner's subject examination in internal medicine. The study sample included 16,091 students from 67 US Liaison Committee on Medical Education (LCME)-accredited medical schools who graduated in 2012 or 2013. Student-level measures included first-attempt internal medicine subject examination scores, first-attempt USMLE Step 1 scores, and five dichotomous variables capturing whether or not students completed rotations in family medicine, ob/gyn, pediatrics, psychiatry, and surgery prior to taking the internal medicine rotation. School-level measures included clerkship length and average Step 1 score. Multilevel models with students nested in schools were estimated with internal medicine subject examination scores as the dependent measure. Step 1 scores and the five dichotomous variables were treated as student-level predictors. Internal medicine clerkship length and average Step 1 score were used to predict school-to-school variation in average internal medicine subject examination scores. Completion of rotations in surgery, pediatrics and family medicine prior to taking the internal medicine examination significantly improved scores, with the largest benefit observed for surgery (coefficient = 1.58 points; p value < 0.01); completion of rotations in ob/gyn and psychiatry were unrelated to internal medicine subject examination performance. At the school level, longer internal medicine clerkships were associated with higher scores on the internal medicine examination (coefficient = 0.23 points/week; p value < 0.01). The order in which students complete clinical clerkships and the length of the internal medicine clerkship are associated with their internal medicine subject examination scores. Findings may have implications for curriculum re-design.
BAC end sequencing of Pacific white shrimp Litopenaeus vannamei: a glimpse into the genome of Penaeid shrimp

NASA Astrophysics Data System (ADS)

Zhao, Cui; Zhang, Xiaojun; Liu, Chengzhang; Huan, Pin; Li, Fuhua; Xiang, Jianhai; Huang, Chao

2012-05-01

Little is known about the genome of Pacific white shrimp ( Litopenaeus vannamei). To address this, we conducted BAC (bacterial artificial chromosome) end sequencing of L. vannamei. We selected and sequenced 7 812 BAC clones from the BAC library LvHE from the two ends of the inserts by Sanger sequencing. After trimming and quality filtering, 11 279 BAC end sequences (BESs) including 4 609 pairedends BESs were obtained. The total length of the BESs was 4 340 753 bp, representing 0.18% of the L. vannamei haploid genome. The lengths of the BESs ranged from 100 bp to 660 bp with an average length of 385 bp. Analysis of the BESs indicated that the L. vannamei genome is AT-rich and that the primary repeats patterns were simple sequence repeats (SSRs) and low complexity sequences. Dinucleotide and hexanucleotide repeats were the most common SSR types in the BESs. The most abundant transposable element was gypsy, which may contribute to the generation of the large genome size of L. vannamei. We successfully annotated 4 519 BESs by BLAST searching, including genes involved in immunity and sex determination. Our results provide an important resource for functional gene studies, map construction and integration, and complete genome assembly for this species.
Power law tails in phylogenetic systems.

PubMed

Qin, Chongli; Colwell, Lucy J

2018-01-23

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
Development of the expressed Ig CDR-H3 repertoire is marked by focusing of constraints in length, amino acid use, and charge that are first established in early B cell progenitors.

PubMed

Ivanov, Ivaylo I; Schelonka, Robert L; Zhuang, Yingxin; Gartland, G Larry; Zemlin, Michael; Schroeder, Harry W

2005-06-15

To gain insight into the mechanisms that regulate the development of the H chain CDR3 (CDR-H3), we used the scheme of Hardy to sort mouse bone marrow B lineage cells into progenitor, immature, and mature B cell fractions, and then performed sequence analysis on V(H)7183-containing Cmu transcripts. The essential architecture of the CDR-H3 repertoire observed in the mature B cell fraction F was already established in the early pre-B cell fraction C. These architectural features include V(H) gene segment use preference, D(H) family usage, J(H) rank order, predicted structures of the CDR-H3 base and loop, and the amino acid composition and average hydrophobicity of the CDR-H3 loop. With development, the repertoire was focused by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity. Unlike humans, the average length of CDR-H3 increased during development. The majority of this increase came from enhanced preservation of J(H) sequence. This was associated with an increase in the prevalence of tyrosine. With an accompanying increase in glycine, a shift in hydrophobicity was observed in the CDR-H3 loop from near neutral in fraction C (-0.08 +/- 0.03) to mild hydrophilic in fraction F (-0.17 +/- 0.02). Fundamental constraints on the sequence and structure of CDR-H3 are thus established before surface IgM expression.
Minimap2: pairwise alignment for nucleotide sequences.

PubMed

Li, Heng

2018-05-10

Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.
CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

PubMed Central

Liu, Yongchao; Maskell, Douglas L; Schmidt, Bertil

2009-01-01

Background The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Findings Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. Conclusion CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:19416548
Reducing elective general surgery cancellations at a Canadian hospital

PubMed Central

Azari-Rad, Solmaz; Yontef, Alanna L.; Aleman, Dionne M.; Urbach, David R.

2013-01-01

Background In Canadian hospitals, which are typically financed by global annual budgets, overuse of operating rooms is a financial risk that is frequently managed by cancelling elective surgical procedures. It is uncertain how different scheduling rules affect the rate of elective surgery cancellations. Methods We used discrete event simulation modelling to represent perioperative processes at a hospital in Toronto, Canada. We tested the effects of the following 3 scenarios on the number of surgical cancellations: scheduling surgeons’ operating days based on their patients’ average length of stay in hospital, sequencing surgical procedures by average duration and variance, and increasing the number of post-surgical ward beds. Results The number of elective cancellations was reduced by scheduling surgeons whose patients had shorter average lengths of stay in hospital earlier in the week, sequencing shorter surgeries and those with less variance in duration earlier in the day, and by adding up to 2 additional beds to the postsurgical ward. Conclusion Discrete event simulation modelling can be used to develop strategies for improving efficiency in operating rooms. PMID:23351498
Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.)

PubMed Central

2011-01-01

Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean. PMID:22118559
Optimizing de novo transcriptome assembly and extending genomic resources for striped catfish (Pangasianodon hypophthalmus).

PubMed

Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Njaci, Isaac; Yoon, Byoung-Ha; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter

2015-10-01

Striped catfish (Pangasianodon hypophthalmus) is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The culture industry is facing a significant challenge however from saltwater intrusion into many low topographical coastal provinces across the Mekong Delta as a result of predicted climate change impacts. Developing genomic resources for this species can facilitate the production of improved culture lines that can withstand raised salinity conditions, and so we have applied high-throughput Ion Torrent sequencing of transcriptome libraries from six target osmoregulatory organs from striped catfish as a genomic resource for use in future selection strategies. We obtained 12,177,770 reads after trimming and processing with an average length of 97bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 66,451 contigs with an average length of 478bp and N50 length of 506bp. A total of 37,969 contigs (57%) possessed significant similarity with proteins in the non-redundant database. Comparative analyses revealed that a significant number of contigs matched sequences reported in other teleost fishes, ranging in similarity from 45.2% with Atlantic cod to 52% with zebrafish. In addition, 28,879 simple sequence repeats (SSRs) and 55,721 single nucleotide polymorphisms (SNPs) were detected in the striped catfish transcriptome. The sequence collection generated in the current study represents the most comprehensive genomic resource for P. hypophthalmus available to date. Our results illustrate the utility of next-generation sequencing as an efficient tool for constructing a large genomic database for marker development in non-model species. Copyright © 2015 Elsevier B.V. All rights reserved.
Construction of an SNP-based high-density linkage map for flax (Linum usitatissimum L.) using specific length amplified fragment sequencing (SLAF-seq) technology.

PubMed

Yi, Liuxi; Gao, Fengyun; Siqin, Bateer; Zhou, Yu; Li, Qiang; Zhao, Xiaoqing; Jia, Xiaoyun; Zhang, Hui

2017-01-01

Flax is an important crop for oil and fiber, however, no high-density genetic maps have been reported for this species. Specific length amplified fragment sequencing (SLAF-seq) is a high-resolution strategy for large scale de novo discovery and genotyping of single nucleotide polymorphisms. In this study, SLAF-seq was employed to develop SNP markers in an F2 population to construct a high-density genetic map for flax. In total, 196.29 million paired-end reads were obtained. The average sequencing depth was 25.08 in male parent, 32.17 in the female parent, and 9.64 in each F2 progeny. In total, 389,288 polymorphic SLAFs were detected, from which 260,380 polymorphic SNPs were developed. After filtering, 4,638 SNPs were found suitable for genetic map construction. The final genetic map included 4,145 SNP markers on 15 linkage groups and was 2,632.94 cM in length, with an average distance of 0.64 cM between adjacent markers. To our knowledge, this map is the densest SNP-based genetic map for flax. The SNP markers and genetic map reported in here will serve as a foundation for the fine mapping of quantitative trait loci (QTLs), map-based gene cloning and marker assisted selection (MAS) for flax.
Sequencing, annotation and comparative analysis of nine BACs of giant panda (Ailuropoda melanoleuca).

PubMed

Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun

2010-01-01

A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.

PubMed

Edgar, Robert C

2004-01-01

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
Characterization of species-specific repeated DNA sequences from B. nigra.

PubMed

Gupta, V; Lakshmisita, G; Shaila, M S; Jagannathan, V; Lakshmikumaran, M S

1992-07-01

The construction and characterization of two genome-specific recombinant DNA clones from B. nigra are described. Southern analysis showed that the two clones belong to a dispersed repeat family. They differ from each other in their length, distribution and sequence, though the average GC content is nearly the same (45%). These B genome-specific repeats have been used to analyse the phylogenetic relationships between cultivated and wild species of the family Brassicaceae.
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

PubMed

Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

2012-11-20

Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies.
Method and apparatus for biological sequence comparison

DOEpatents

Marr, T.G.; Chang, W.I.

1997-12-23

A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.
Method and apparatus for biological sequence comparison

DOEpatents

Marr, Thomas G.; Chang, William I-Wei

1997-01-01

A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.
A genome-wide BAC-end sequence survey provides first insights into sweetpotato (Ipomoea batatas (L.) Lam.) genome composition.

PubMed

Si, Zengzhi; Du, Bing; Huo, Jinxi; He, Shaozhen; Liu, Qingchang; Zhai, Hong

2016-11-21

Sweetpotato, Ipomoea batatas (L.) Lam., is an important food crop widely grown in the world. However, little is known about the genome of this species because it is a highly heterozygous hexaploid. Gaining a more in-depth knowledge of sweetpotato genome is therefore necessary and imperative. In this study, the first bacterial artificial chromosome (BAC) library of sweetpotato was constructed. Clones from the BAC library were end-sequenced and analyzed to provide genome-wide information about this species. The BAC library contained 240,384 clones with an average insert size of 101 kb and had a 7.93-10.82 × coverage of the genome, and the probability of isolating any single-copy DNA sequence from the library was more than 99%. Both ends of 8310 BAC clones randomly selected from the library were sequenced to generate 11,542 high-quality BAC-end sequences (BESs), with an accumulative length of 7,595,261 bp and an average length of 658 bp. Analysis of the BESs revealed that 12.17% of the sweetpotato genome were known repetitive DNA, including 7.37% long terminal repeat (LTR) retrotransposons, 1.15% Non-LTR retrotransposons and 1.42% Class II DNA transposons etc., 18.31% of the genome were identified as sweetpotato-unique repetitive DNA and 10.00% of the genome were predicted to be coding regions. In total, 3,846 simple sequences repeats (SSRs) were identified, with a density of one SSR per 1.93 kb, from which 288 SSRs primers were designed and tested for length polymorphism using 20 sweetpotato accessions, 173 (60.07%) of them produced polymorphic bands. Sweetpotato BESs had significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum than those of Vitis vinifera, Theobroma cacao and Arabidopsis thaliana. The first BAC library for sweetpotato has been successfully constructed. The high quality BESs provide first insights into sweetpotato genome composition, and have significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum. These resources as a robust platform will be used in high-resolution mapping, gene cloning, assembly of genome sequences, comparative genomics and evolution for sweetpotato.
Abundance and Characterization of Perfect Microsatellites on the Cattle Y Chromosome.

PubMed

Ma, Zhi-Jie

2017-07-03

Microsatellites or simple sequence repeats (SSRs) are found in most organisms and play an important role in genomic organization and function. To characterize the abundance of SSRs (1-6 base-pairs [bp]) on the cattle Y chromsome, the relative frequency and density of perfect or uninterrupted SSRs based on the published Y chromosome sequence were examined. A total of 17,273 perfect SSRs were found, with total length of 324.78 kb, indicating that approximately 0.75% of the cattle Y chromosome sequence (43.30 Mb) comprises perfect SSRs, with an average length of 18.80 bp. The relative frequency and density were 398.92 loci/Mb and 7500.62 bp/Mb, respectively. The proportions of the six classes of perfect SSRs were highly variable on the cattle Y chromosome. Mononucleotide repeats had a total number of 8073 (46.74%) and an average length of 15.45 bp, and were the most abundant SSRs class, while the percentages of di-, tetra-, tri-, penta-, and hexa-nucleotide repeats were 22.86%, 11.98%, 11.58%, 6.65%, and 0.19%, respectively. Different classes of SSRs varied in their repeat number, with the highest being 42 for dinucleotides. Results reveal that repeat categories A, AC, AT, AAC, AGC, GTTT, CTTT, ATTT, and AACTG predominate on the Y chromosome. This study provides insight into the organization of cattle Y chromosome repetitive DNA, as well as information useful for developing more polymorphic cattle Y-chromosome-specific SSRs.
Hidden Markov models of biological primary sequence information.

PubMed Central

Baldi, P; Chauvin, Y; Hunkapiller, T; McClure, M A

1994-01-01

Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences. PMID:8302831

DNA sequencing up to 1300 bases in two hours by capillary electrophoresis with mixed replaceable linear polyacrylamide solutions.

PubMed

Zhou, H; Miller, A W; Sosic, Z; Buchholz, B; Barron, A E; Kotler, L; Karger, B L

2000-03-01

This paper presents results on ultralong read DNA sequencing with relatively short separation times using capillary electrophoresis with replaceable polymer matrixes. In previous work, the effectiveness of mixed replaceable solutions of linear polyacrylamide (LPA) was demonstrated, and 1000 bases were routinely obtained in less than 1 h. Substantially longer read lengths have now been achieved by a combination of improved formulation of LPA mixtures, optimization of temperature and electric field, adjustment of the sequencing reaction, and refinement of the base-caller. The average molar masses of LPA used as DNA separation matrixes were measured by gel permeation chromatography and multiangle laser light scattering. Newly formulated matrixes comprising 0.5% (w/w) 270 kDa and 2% (w/w) 10 or 17 MDa LPA raised the optimum column temperature from 60 to 70 degrees C, increasing the selectivity for large DNA fragments, while maintaining high selectivity for small fragments as well. This improved resolution was further enhanced by reducing the electric field strength from 200 to 125 V/cm. In addition, because sequencing accuracy beyond 1000 bases was diminished by the low signal from G-terminated fragments when the standard reaction protocol for a commercial dye primer kit was used, the amount of these fragments was doubled. Augmenting the base-calling expert system with rules specific for low peak resolution also had a significant effect, contributing slightly less than half of the total increase in read length. With full optimization, this read length reached up to 1300 bases (average 1250) with 98.5% accuracy in 2 h for a single-stranded M13 template.
[cDNA library construction from panicle meristem of finger millet].

PubMed

Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B

2014-01-01

The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.
P. berghei Telomerase Subunit TERT is Essential for Parasite Survival

PubMed Central

Religa, Agnieszka A.; Ramesar, Jai; Janse, Chris J.; Scherf, Artur; Waters, Andrew P.

2014-01-01

Telomeres define the ends of chromosomes protecting eukaryotic cells from chromosome instability and eventual cell death. The complex regulation of telomeres involves various proteins including telomerase, which is a specialized ribonucleoprotein responsible for telomere maintenance. Telomeres of chromosomes of malaria parasites are kept at a constant length during blood stage proliferation. The 7-bp telomere repeat sequence is universal across different Plasmodium species (GGGTTT/CA), though the average telomere length varies. The catalytic subunit of telomerase, telomerase reverse transcriptase (TERT), is present in all sequenced Plasmodium species and is approximately three times larger than other eukaryotic TERTs. The Plasmodium RNA component of TERT has recently been identified in silico. A strategy to delete the gene encoding TERT via double cross-over (DXO) homologous recombination was undertaken to study the telomerase function in P. berghei. Expression of both TERT and the RNA component (TR) in P. berghei blood stages was analysed by Western blotting and Northern analysis. Average telomere length was measured in several Plasmodium species using Telomere Restriction Fragment (TRF) analysis. TERT and TR were detected in blood stages and an average telomere length of ∼950 bp established. Deletion of the tert gene was performed using standard transfection methodologies and we show the presence of tert − mutants in the transfected parasite populations. Cloning of tert- mutants has been attempted multiple times without success. Thorough analysis of the transfected parasite populations and the parasite obtained from extensive parasite cloning from these populations provide evidence for a so called delayed death phenotype as observed in different organisms lacking TERT. The findings indicate that TERT is essential for P. berghei cell survival. The study extends our current knowledge on telomere biology in malaria parasites and validates further investigations to identify telomerase inhibitors to induce parasite cell death. PMID:25275500
Complete mitochondrial genome sequences of the northern spotted owl (Strix occidentalis caurina) and the barred owl (Strix varia; Aves: Strigiformes: Strigidae) confirm the presence of a duplicated control region

PubMed Central

Henderson, James B.; Sellas, Anna B.; Fuchs, Jérôme; Bowie, Rauri C.K.; Dumbacher, John P.

2017-01-01

We report here the successful assembly of the complete mitochondrial genomes of the northern spotted owl (Strix occidentalis caurina) and the barred owl (S. varia). We utilized sequence data from two sequencing methodologies, Illumina paired-end sequence data with insert lengths ranging from approximately 250 nucleotides (nt) to 9,600 nt and read lengths from 100–375 nt and Sanger-derived sequences. We employed multiple assemblers and alignment methods to generate the final assemblies. The circular genomes of S. o. caurina and S. varia are comprised of 19,948 nt and 18,975 nt, respectively. Both code for two rRNAs, twenty-two tRNAs, and thirteen polypeptides. They both have duplicated control region sequences with complex repeat structures. We were not able to assemble the control regions solely using Illumina paired-end sequence data. By fully spanning the control regions, Sanger-derived sequences enabled accurate and complete assembly of these mitochondrial genomes. These are the first complete mitochondrial genome sequences of owls (Aves: Strigiformes) possessing duplicated control regions. We searched the nuclear genome of S. o. caurina for copies of mitochondrial genes and found at least nine separate stretches of nuclear copies of gene sequences originating in the mitochondrial genome (Numts). The Numts ranged from 226–19,522 nt in length and included copies of all mitochondrial genes except tRNAPro, ND6, and tRNAGlu. Strix occidentalis caurina and S. varia exhibited an average of 10.74% (8.68% uncorrected p-distance) divergence across the non-tRNA mitochondrial genes. PMID:29038757
Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts

PubMed Central

Cheng, Bing; Furtado, Agnelo

2017-01-01

Abstract Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. PMID:29048540
Pervasive sequence patents cover the entire human genome.

PubMed

Rosenfeld, Jeffrey A; Mason, Christopher E

2013-01-01

The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays.
Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences.

PubMed

Bergman, C M; Kreitman, M

2001-08-01

Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.
Construction of a high-density genetic map for grape using specific length amplified fragment (SLAF) sequencing

PubMed Central

Guo, Yinshan; Xing, Huiyang; Zhao, Yuhui; Liu, Zhendong; Li, Kun; Guo, Xiuwu

2017-01-01

Genetic maps are important tools in plant genomics and breeding. We report a large-scale discovery of single nucleotide polymorphisms (SNPs) using the specific length amplified fragment sequencing (SLAF-seq) technique for the construction of high-density genetic maps for two elite wine grape cultivars, ‘Chardonnay’ and ‘Beibinghong’, and their 130 F1 plants. A total of 372.53 M paired-end reads were obtained after preprocessing. The average sequencing depth was 33.81 for ‘Chardonnay’ (the female parent), 48.20 for ‘Beibinghong’ (the male parent), and 12.66 for the F1 offspring. We detected 202,349 high-quality SLAFs of which 144,972 were polymorphic; 10,042 SNPs were used to construct a genetic map that spanned 1,969.95 cM, with an average genetic distance of 0.23 cM between adjacent markers. This genetic map contains the largest molecular marker number of the grape maps so far reported. We thus demonstrate that SLAF-seq is a promising strategy for the construction of high-density genetic maps; the map that we report here is a good potential resource for QTL mapping of genes linked to major economic and agronomic traits, map-based cloning, and marker-assisted selection of grape. PMID:28746364
RESTRICTION FRAGMENT LENGTH POLYMORPHISM ANALYSIS OF PCR-AMPLIFIED NIFH SEQUENCES FROM WETLAND PLANT RHIZOSPHERE COMMUNITIES

EPA Science Inventory

We describe a method to assess the community structure of N2-fixing bacteria in the rhizosphere. Total DNA was extracted from Spartina alterniflora and Sesbania macrocarpa root zones by bead-beating and purified by CsCl-EtBr gradient centrifugation. The average DNA yield was 5.5 ...
Virtual Northern analysis of the human genome.

PubMed

Hurowitz, Evan H; Drori, Iddo; Stodden, Victoria C; Donoho, David L; Brown, Patrick O

2007-05-23

We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.
Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

PubMed Central

2014-01-01

Background The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Results We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. Conclusions In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied. PMID:24647006
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

PubMed Central

2012-01-01

Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies. PMID:23167289
The Development of a High Density Linkage Map for Black Tiger Shrimp (Penaeus monodon) Based on cSNPs

PubMed Central

Baranski, Matthew; Gopikrishna, Gopalapillay; Robinson, Nicholas A.; Katneni, Vinaya Kumar; Shekhar, Mudagandur S.; Shanmugakarthik, Jayakani; Jothivel, Sarangapani; Gopal, Chavali; Ravichandran, Pitchaiyappan; Kent, Matthew; Arnyasi, Mariann; Ponniah, Alphis G.

2014-01-01

Transcriptome sequencing using Illumina RNA-seq was performed on populations of black tiger shrimp from India. Samples were collected from (i) four landing centres around the east coastline (EC) of India, (ii) survivors of a severe WSSV infection during pond culture (SUR) and (iii) the Andaman Islands (AI) in the Bay of Bengal. Equal quantities of purified total RNA from homogenates of hepatopancreas, muscle, nervous tissue, intestinal tract, heart, gonad, gills, pleopod and lymphoid organs were combined to create AI, EC and SUR pools for RNA sequencing. De novo transcriptome assembly resulted in 136,223 contigs (minimum size 100 base pairs, bp) with a total length 61 Mb, an average length of 446 bp and an average coverage of 163× across all pools. Approximately 16% of contigs were annotated with BLAST hit information and gene ontology annotations. A total of 473,620 putative SNPs/indels were identified. An Illumina iSelect genotyping array containing 6,000 SNPs was developed and used to genotype 1024 offspring belonging to seven full-sibling families. A total of 3959 SNPs were mapped to 44 linkage groups. The linkage groups consisted of between 16–129 and 13–130 markers, of length between 139–10.8 and 109.1–10.5 cM and with intervals averaging between 1.2 and 0.9 cM for the female and male maps respectively. The female map was 28% longer than the male map (4060 and 2917 cM respectively) with a 1.6 higher recombination rate observed for female compared to male meioses. This approach has substantially increased expressed sequence and DNA marker resources for tiger shrimp and is a useful resource for QTL mapping and association studies for evolutionarily and commercially important traits. PMID:24465553
Identification of Simple Sequence Repeats in Chloroplast Genomes of Magnoliids Through Bioinformatics Approach.

PubMed

Srivastava, Deepika; Shanker, Asheesh

2016-12-01

Basal angiosperms or Magnoliids is an important clade of commercially important plants which mainly include spices and edible fruits. In this study, 17 chloroplast genome sequences belonging to clade Magnoliids were screened for the identification of chloroplast simple sequence repeats (cpSSRs). Simple sequence repeats or microsatellites are short stretches of DNA up to 1-6 base pair in length. These repeats are ubiquitous and play important role in the development of molecular markers and to study the mapping of traits of economic, medical or ecological interest. A total of 479 SSRs were detected, showing average density of 1 SSR/6.91 kb. Depending on the repeat units, the length of SSRs ranged from 12 to 24 bp for mono-, 12 to 18 bp for di-, 12 to 26 bp for tri-, 12 to 24 bp for tetra-, 15 bp for penta- and 18 bp for hexanucleotide repeats. Mononucleotide repeats were the most frequent (207, 43.21 %) followed by tetranucleotide repeats (130, 27.13 %). Penta- and hexanucleotide repeats were least frequent or absent in these chloroplast genomes.
Gene and translation initiation site prediction in metagenomic sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

2012-01-01

Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
The complete mitochondrial genome sequence of the Tibetan red fox (Vulpes vulpes montana).

PubMed

Zhang, Jin; Zhang, Honghai; Zhao, Chao; Chen, Lei; Sha, Weilai; Liu, Guangshuai

2015-01-01

In this study, the complete mitochondrial genome of the Tibetan red fox (Vulpes Vulpes montana) was sequenced for the first time using blood samples obtained from a wild female red fox captured from Lhasa in Tibet, China. Qinghai--Tibet Plateau is the highest plateau in the world with an average elevation above 3500 m. Sequence analysis showed it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region (CR). The variable tandem repeats in CR is the main reason of the length variability of mitochondrial genome among canide animals.
Sequences characterization of microsatellite DNA sequences in Pacific abalone ( Haliotis discus hannai)

NASA Astrophysics Data System (ADS)

Li, Qi; Akihiro, Kijima

2007-01-01

The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber (1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats (13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (<20 repeats) were most abundant, accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatellite isolation in other abalone species.
Identification of genes associated with reproduction in the Mud Crab (Scylla olivacea) and their differential expression following serotonin stimulation.

PubMed

Kornthong, Napamanee; Cummins, Scott F; Chotwiwatthanakun, Charoonroj; Khornchatri, Kanjana; Engsusophon, Attakorn; Hanna, Peter J; Sobhon, Prasert

2014-01-01

The central nervous system (CNS) is often intimately involved in reproduction control and is therefore a target organ for transcriptomic investigations to identify reproduction-associated genes. In this study, 454 transcriptome sequencing was performed on pooled brain and ventral nerve cord of the female mud crab (Scylla olivacea) following serotonin injection (5 µg/g BW). A total of 197,468 sequence reads was obtained with an average length of 828 bp. Approximately 38.7% of 2,183 isotigs matched with significant similarity (E value < 1e-4) to sequences within the Genbank non-redundant (nr) database, with most significant matches being to crustacean and insect sequences. Approximately 32 putative neuropeptide genes were identified from nonmatching blast sequences. In addition, we identified full-length transcripts for crustacean reproductive-related genes, namely farnesoic acid o-methyltransferase (FAMeT), estrogen sulfotransferase (ESULT) and prostaglandin F synthase (PGFS). Following serotonin injection, which would normally initiate reproductive processes, we found up-regulation of FAMeT, ESULT and PGFS expression in the female CNS and ovary. Our data here provides an invaluable new resource for understanding the molecular role of the CNS on reproduction in S. olivacea.
Polymer Characterization by 13C Nuclear Magnetic Resonance (NMR) spectroscopy: Acryloid K125-EA and High Molecular Weight PIBM

DTIC Science & Technology

1980-02-01

average sequence length Butyl acrylate Methyl methacrylate PIBM Carbon-13 Monomer distribution PMMA (Continued on reverse side) 120. ASTACT (VC•T e m...TACTEC Attic SARPH.FTA 1 505 KIng Avenue Ihlle ithlt, AR 71611 Columbus, OH 43201 US ARMY TRAINING & DOCTRINE COMMAND Director or Toxicology
Information theory analysis of Australian humpback whale song.

PubMed

Miksis-Olds, Jennifer L; Buck, John R; Noad, Michael J; Cato, Douglas H; Stokes, M Dale

2008-10-01

Songs produced by migrating whales were recorded off the coast of Queensland, Australia, over six consecutive weeks in 2003. Forty-eight independent song sessions were analyzed using information theory techniques. The average length of the songs estimated by correlation analysis was approximately 100 units, with song sessions lasting from 300 to over 3100 units. Song entropy, a measure of structural constraints, was estimated using three different methodologies: (1) the independently identically distributed model, (2) a first-order Markov model, and (3) the nonparametric sliding window match length (SWML) method, as described by Suzuki et al. [(2006). "Information entropy of humpback whale song," J. Acoust. Soc. Am. 119, 1849-1866]. The analysis finds that the song sequences of migrating Australian whales are consistent with the hierarchical structure proposed by Payne and McVay [(1971). "Songs of humpback whales," Science 173, 587-597], and recently supported mathematically by Suzuki et al. (2006) for singers on the Hawaiian breeding grounds. Both the SWML entropy estimates and the song lengths for the Australian singers in 2003 were lower than that reported by Suzuki et al. (2006) for Hawaiian whales in 1976-1978; however, song redundancy did not differ between these two populations separated spatially and temporally. The average total information in the sequence of units in Australian song was approximately 35 bits/song. Aberrant songs (8%) yielded entropies similar to the typical songs.

Evolutionary diversity and potential recombinogenic role of integration targets of non-LTR retrotransposons

PubMed Central

Gentles, Andrew J.; Kohany, Oleksiy; Jurka, Jerzy

2005-01-01

Short interspersed elements (SINEs) make up a significant fraction of total DNA in mammalian genomes, providing a rich substrate for chromosomal rearrangements by SINE-SINE recombinations. Proliferation of mammalian SINEs is mediated primarily by LINE1 (L1) non-LTR retrotransposons that preferentially integrate at DNA sequence targets with average length ~15 bp and containing conserved endonucleolytic nicking signals at both ends. We report that sequence variations in the first of the two nicking signals, represented by a 5′TT-AAAA consensus sequence, affect the position of the second signal thus leading to target site duplications (TSDs) of different lengths. The length distribution of TSDs appears to be affected also by L1-encoded enzyme variants, since targets with the same 5′ nicking site can be of different average length in different mammalian species. Taking this into account, we re-analyzed the second nicking site and found that it is larger and includes more conserved sites than previously appreciated, with a consensus of 5′ANTNTN-AA. We also studied potential involvement of the nicking sites in stimulating recombinations between SINE elements. We determined that SINE elements retaining TSDs with perfect 5′TT-AAAA nicking sites appear to be lost relatively rapidly from the human and rat genomes, and less rapidly from dog. We speculate that the introduction of single-strand DNA breaks induced by recurring endonucleolytic attacks at these sites, combined with the ubiquitousness of SINEs, may significantly promote recombination between repetitive elements, leading to the observed losses. At the same time new L1 subfamilies may be selected for “incompatibility” with pre-existing targets. This provides a possible driving force for the continual emergence of new L1 subfamilies which, in turn, may affect selection of L1-dependent SINE subfamilies. PMID:15944437
3D morphometry using automated aortic segmentation in native MR angiography: an alternative to contrast enhanced MRA?

PubMed

Müller-Eschner, Matthias; Müller, Tobias; Biesdorf, Andreas; Wörz, Stefan; Rengier, Fabian; Böckler, Dittmar; Kauczor, Hans-Ulrich; Rohr, Karl; von Tengg-Kobligk, Hendrik

2014-04-01

Native-MR angiography (N-MRA) is considered an imaging alternative to contrast enhanced MR angiography (CE-MRA) for patients with renal insufficiency. Lower intraluminal contrast in N-MRA often leads to failure of the segmentation process in commercial algorithms. This study introduces an in-house 3D model-based segmentation approach used to compare both sequences by automatic 3D lumen segmentation, allowing for evaluation of differences of aortic lumen diameters as well as differences in length comparing both acquisition techniques at every possible location. Sixteen healthy volunteers underwent 1.5-T-MR Angiography (MRA). For each volunteer, two different MR sequences were performed, CE-MRA: gradient echo Turbo FLASH sequence and N-MRA: respiratory-and-cardiac-gated, T2-weighted 3D SSFP. Datasets were segmented using a 3D model-based ellipse-fitting approach with a single seed point placed manually above the celiac trunk. The segmented volumes were manually cropped from left subclavian artery to celiac trunk to avoid error due to side branches. Diameters, volumes and centerline length were computed for intraindividual comparison. For statistical analysis the Wilcoxon-Signed-Ranked-Test was used. Average centerline length obtained based on N-MRA was 239.0±23.4 mm compared to 238.6±23.5 mm for CE-MRA without significant difference (P=0.877). Average maximum diameter obtained based on N-MRA was 25.7±3.3 mm compared to 24.1±3.2 mm for CE-MRA (P<0.001). In agreement with the difference in diameters, volumes obtained based on N-MRA (100.1±35.4 cm(3)) were consistently and significantly larger compared to CE-MRA (89.2±30.0 cm(3)) (P<0.001). 3D morphometry shows highly similar centerline lengths for N-MRA and CE-MRA, but systematically higher diameters and volumes for N-MRA.
3D morphometry using automated aortic segmentation in native MR angiography: an alternative to contrast enhanced MRA?

PubMed Central

Müller-Eschner, Matthias; Müller, Tobias; Biesdorf, Andreas; Wörz, Stefan; Rengier, Fabian; Böckler, Dittmar; Kauczor, Hans-Ulrich; Rohr, Karl

2014-01-01

Introduction Native-MR angiography (N-MRA) is considered an imaging alternative to contrast enhanced MR angiography (CE-MRA) for patients with renal insufficiency. Lower intraluminal contrast in N-MRA often leads to failure of the segmentation process in commercial algorithms. This study introduces an in-house 3D model-based segmentation approach used to compare both sequences by automatic 3D lumen segmentation, allowing for evaluation of differences of aortic lumen diameters as well as differences in length comparing both acquisition techniques at every possible location. Methods and materials Sixteen healthy volunteers underwent 1.5-T-MR Angiography (MRA). For each volunteer, two different MR sequences were performed, CE-MRA: gradient echo Turbo FLASH sequence and N-MRA: respiratory-and-cardiac-gated, T2-weighted 3D SSFP. Datasets were segmented using a 3D model-based ellipse-fitting approach with a single seed point placed manually above the celiac trunk. The segmented volumes were manually cropped from left subclavian artery to celiac trunk to avoid error due to side branches. Diameters, volumes and centerline length were computed for intraindividual comparison. For statistical analysis the Wilcoxon-Signed-Ranked-Test was used. Results Average centerline length obtained based on N-MRA was 239.0±23.4 mm compared to 238.6±23.5 mm for CE-MRA without significant difference (P=0.877). Average maximum diameter obtained based on N-MRA was 25.7±3.3 mm compared to 24.1±3.2 mm for CE-MRA (P<0.001). In agreement with the difference in diameters, volumes obtained based on N-MRA (100.1±35.4 cm3) were consistently and significantly larger compared to CE-MRA (89.2±30.0 cm3) (P<0.001). Conclusions 3D morphometry shows highly similar centerline lengths for N-MRA and CE-MRA, but systematically higher diameters and volumes for N-MRA. PMID:24834406
The protein structure prediction problem could be solved using the current PDB library

PubMed Central

Zhang, Yang; Skolnick, Jeffrey

2005-01-01

For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with ≈82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. PMID:15653774
Remarkable sequence conservation of the last intron in the PKD1 gene.

PubMed

Rodova, Marianna; Islam, M Rafiq; Peterson, Kenneth R; Calvet, James P

2003-10-01

The last intron of the PKD1 gene (intron 45) was found to have exceptionally high sequence conservation across four mammalian species: human, mouse, rat, and dog. This conservation did not extend to the comparable intron in pufferfish. Pairwise comparisons for intron 45 showed 91% identity (human vs. dog) to 100% identity (mouse vs. rat) for an average for all four species of 94% identity. In contrast, introns 43 and 44 of the PKD1 gene had average pairwise identities of 57% and 54%, and exons 43, 44, and 45 and the coding region of exon 46 had average pairwise identities of 80%, 84%, 82%, and 80%. Intron 45 is 90 to 95 bp in length, with the major region of sequence divergence being in a central 4-bp to 9-bp variable region. RNA secondary structure analysis of intron 45 predicts a branching stem-loop structure in which the central variable region lies in one loop and the putative branch point sequence lies in another loop, suggesting that the intron adopts a specific stem-loop structure that may be important for its removal. Although intron 45 appears to conform to the class of small, G-triplet-containing introns that are spliced by a mechanism utilizing intron definition, its high sequence conservation may be a reflection of constraints imposed by a unique mechanism that coordinates splicing of this last PKD1 intron with polyadenylation.
Virtual Northern Analysis of the Human Genome

PubMed Central

Hurowitz, Evan H.; Drori, Iddo; Stodden, Victoria C.; Donoho, David L.; Brown, Patrick O.

2007-01-01

Background We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. Methodology/Principal Findings We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Conclusions/Significance Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes. PMID:17520019
A Primary Assembly of a Bovine Haplotype Block Map Based on a 15,036-Single-Nucleotide Polymorphism Panel Genotyped in Holstein–Friesian Cattle

PubMed Central

Khatkar, Mehar S.; Zenger, Kyall R.; Hobbs, Matthew; Hawken, Rachel J.; Cavanagh, Julie A. L.; Barris, Wes; McClintock, Alexander E.; McClintock, Sara; Thomson, Peter C.; Tier, Bruce; Nicholas, Frank W.; Raadsma, Herman W.

2007-01-01

Analysis of data on 1000 Holstein–Friesian bulls genotyped for 15,036 single-nucleotide polymorphisms (SNPs) has enabled genomewide identification of haplotype blocks and tag SNPs. A final subset of 9195 SNPs in Hardy–Weinberg equilibrium and mapped on autosomes on the bovine sequence assembly (release Btau 3.1) was used in this study. The average intermarker spacing was 251.8 kb. The average minor allele frequency (MAF) was 0.29 (0.05–0.5). Following recent precedents in human HapMap studies, a haplotype block was defined where 95% of combinations of SNPs within a region are in very high linkage disequilibrium. A total of 727 haplotype blocks consisting of ≥3 SNPs were identified. The average block length was 69.7 ± 7.7 kb, which is ∼5–10 times larger than in humans. These blocks comprised a total of 2964 SNPs and covered 50,638 kb of the sequence map, which constitutes 2.18% of the length of all autosomes. A set of tag SNPs, which will be useful for further fine-mapping studies, has been identified. Overall, the results suggest that as many as 75,000–100,000 tag SNPs would be needed to track all important haplotype blocks in the bovine genome. This would require ∼250,000 SNPs in the discovery phase. PMID:17435229
Transcriptome profiling to discover putative genes associated with paraquat resistance in goosegrass (Eleusine indica L.).

PubMed

An, Jing; Shen, Xuefeng; Ma, Qibin; Yang, Cunyi; Liu, Simin; Chen, Yong

2014-01-01

Goosegrass (Eleusine indica L.), a serious annual weed in the world, has evolved resistance to several herbicides including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. This is the first large-scale transcriptome sequencing of E. indica using the Illumina platform. Potential genes involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat resistance at the molecular level in goosegrass.
Transcriptome Profiling to Discover Putative Genes Associated with Paraquat Resistance in Goosegrass (Eleusine indica L.)

PubMed Central

An, Jing; Shen, Xuefeng; Ma, Qibin; Yang, Cunyi; Liu, Simin; Chen, Yong

2014-01-01

Background Goosegrass (Eleusine indica L.), a serious annual weed in the world, has evolved resistance to several herbicides including paraquat, a non-selective herbicide. The mechanism of paraquat resistance in weeds is only partially understood. To further study the molecular mechanism underlying paraquat resistance in goosegrass, we performed transcriptome analysis of susceptible and resistant biotypes of goosegrass with or without paraquat treatment. Results The RNA-seq libraries generated 194,716,560 valid reads with an average length of 91.29 bp. De novo assembly analysis produced 158,461 transcripts with an average length of 1153.74 bp and 100,742 unigenes with an average length of 712.79 bp. Among these, 25,926 unigenes were assigned to 65 GO terms that contained three main categories. A total of 13,809 unigenes with 1,208 enzyme commission numbers were assigned to 314 predicted KEGG metabolic pathways, and 12,719 unigenes were categorized into 25 KOG classifications. Furthermore, our results revealed that 53 genes related to reactive oxygen species scavenging, 10 genes related to polyamines and 18 genes related to transport were differentially expressed in paraquat treatment experiments. The genes related to polyamines and transport are likely potential candidate genes that could be further investigated to confirm their roles in paraquat resistance of goosegrass. Conclusion This is the first large-scale transcriptome sequencing of E. indica using the Illumina platform. Potential genes involved in paraquat resistance were identified from the assembled sequences. The transcriptome data may serve as a reference for further analysis of gene expression and functional genomics studies, and will facilitate the study of paraquat resistance at the molecular level in goosegrass. PMID:24927422
A state-based probabilistic model for tumor respiratory motion prediction

NASA Astrophysics Data System (ADS)

Kalet, Alan; Sandison, George; Wu, Huanmei; Schmitz, Ruth

2010-12-01

This work proposes a new probabilistic mathematical model for predicting tumor motion and position based on a finite state representation using the natural breathing states of exhale, inhale and end of exhale. Tumor motion was broken down into linear breathing states and sequences of states. Breathing state sequences and the observables representing those sequences were analyzed using a hidden Markov model (HMM) to predict the future sequences and new observables. Velocities and other parameters were clustered using a k-means clustering algorithm to associate each state with a set of observables such that a prediction of state also enables a prediction of tumor velocity. A time average model with predictions based on average past state lengths was also computed. State sequences which are known a priori to fit the data were fed into the HMM algorithm to set a theoretical limit of the predictive power of the model. The effectiveness of the presented probabilistic model has been evaluated for gated radiation therapy based on previously tracked tumor motion in four lung cancer patients. Positional prediction accuracy is compared with actual position in terms of the overall RMS errors. Various system delays, ranging from 33 to 1000 ms, were tested. Previous studies have shown duty cycles for latencies of 33 and 200 ms at around 90% and 80%, respectively, for linear, no prediction, Kalman filter and ANN methods as averaged over multiple patients. At 1000 ms, the previously reported duty cycles range from approximately 62% (ANN) down to 34% (no prediction). Average duty cycle for the HMM method was found to be 100% and 91 ± 3% for 33 and 200 ms latency and around 40% for 1000 ms latency in three out of four breathing motion traces. RMS errors were found to be lower than linear and no prediction methods at latencies of 1000 ms. The results show that for system latencies longer than 400 ms, the time average HMM prediction outperforms linear, no prediction, and the more general HMM-type predictive models. RMS errors for the time average model approach the theoretical limit of the HMM, and predicted state sequences are well correlated with sequences known to fit the data.
Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

PubMed

Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

2015-10-16

The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average deletion size of 42.0 Mb. A total of 520 markers were anchored to 216 Ae. tauschii sequence scaffolds, 116 of which were not anchored earlier to the D-genome. This study reports the development of first high resolution RH maps for the D-genome of Ae. tauschii accession AL8/78, which were then used for the anchoring of unassigned sequence scaffolds. This study demonstrates how RH mapping, which offered high and uniform resolution across the length of the chromosome, can facilitate the complete sequence assembly of the large and complex plant genomes.
Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

NASA Astrophysics Data System (ADS)

Shi, Jinming

In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.
Bioresorbable distraction device for the treatment of airway problems for infants with Robin sequence.

PubMed

Breugem, Corstiaan; Paes, Emma; Kon, Moshe; Mink van der Molen, Aebele B; van der Molen, Aebele B Mink

2012-08-01

Pierre Robin sequence is a well known craniofacial entity. There are numerous ways to treat the respiratory insufficiency, but sometimes surgical intervention is needed. Tracheotomy could be associated with morbidity, and distraction osteogenesis has been established as a stable method to obtain a safe airway. Distraction osteogenesis has traditionally been performed with an external device. In this manuscript we describe the feasibility of an internal bioresorbable device. Retrospective descriptive study was performed in a tertiary academic children's hospital. After multidisciplinary team consultation, 12 consecutive patients with Robin sequence were treated with this internal distraction device. The mean age at surgery was 32 days, and the average amount of mandibular distraction was 18 mm. All patients were extubated after an average of 7.5 days after the surgery. The average length of stay in the hospital was 17 days after surgery. There were no major surgical complications. A tracheotomy was prevented in all our patients, and complications were limited. Long-term studies are needed to evaluate the influence that internal distraction has on the growth of the mandible and teeth. The internal distraction system seems safe for infants with micrognathia and has certain benefits when compared to the external distractor.
Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

NASA Astrophysics Data System (ADS)

Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

2017-02-01

Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
Spatial methods for deriving crop rotation history

NASA Astrophysics Data System (ADS)

Mueller-Warrant, George W.; Trippe, Kristin M.; Whittaker, Gerald W.; Anderson, Nicole P.; Sullivan, Clare S.

2017-08-01

Benefits of converting 11 years of remote sensing classification data into cropping history of agricultural fields included measuring lengths of rotation cycles and identifying specific sequences of intervening crops grown between final years of old grass seed stands and establishment of new ones. Spatial and non-spatial methods were complementary. Individual-year classification errors were often correctable in spreadsheet-based non-spatial analysis, whereas their presence in spatial data generally led to exclusion of fields from further analysis. Markov-model testing of non-spatial data revealed that year-to-year cropping sequences did not match average frequencies for transitions among crops grown in western Oregon, implying that rotations into new grass seed stands were influenced by growers' desires to achieve specific objectives. Moran's I spatial analysis of length of time between consecutive grass seed stands revealed that clustering of fields was relatively uncommon, with high and low value clusters only accounting for 7.1 and 6.2% of fields.
All gene-sized DNA molecules in four species of hypotrichs have the same terminal sequence and an unusual 3' terminus.

PubMed Central

Klobutcher, L A; Swanton, M T; Donini, P; Prescott, D M

1981-01-01

In hypotrichous ciliates, all of the macronuclear DNA is in the form of low molecular weight molecules with an average size of approximately 2200 base pairs. Total macronuclear DNA from four hypotrichs has been shown to have inverted terminal repeats by direct sequence analysis. In Oxytricha nova, Oxytricha sp., and Stylonychia pustulata, this terminal sequence may be written as 5'-C4A4C4A4C4 ... 3'-G4T4G4T4G4T4G4T4G4 ... In Euplotes aediculatus, the sequences is similar but differs in the lengths of the duplex region (28 base pairs) and of the putative 3' extension (14 base pairs). Also in Euplotes, a second common sequence of 5 base pairs (A-A-C-T-T-T-T-G-A-A) occurs internal to the terminal repeat and a 17-base-pair heterogeneous region: 5'-C4A4C4A4C4A4C4(X)17T-T-G-A-A ... 3'-G2T4G4T4G4T4G4T4G4T4G4(X)17A-A-C-T-T ... The length of the terminal repeat sequence for O. nova was confirmed in cloned macronuclear DNA molecules. Images PMID:6265931
Position-dependent effects of locked nucleic acid (LNA) on DNA sequencing and PCR primers

PubMed Central

Levin, Joshua D.; Fiala, Dean; Samala, Meinrado F.; Kahn, Jason D.; Peterson, Raymond J.

2006-01-01

Genomes are becoming heavily annotated with important features. Analysis of these features often employs oligonucleotides that hybridize at defined locations. When the defined location lies in a poor sequence context, traditional design strategies may fail. Locked Nucleic Acid (LNA) can enhance oligonucleotide affinity and specificity. Though LNA has been used in many applications, formal design rules are still being defined. To further this effort we have investigated the effect of LNA on the performance of sequencing and PCR primers in AT-rich regions, where short primers yield poor sequencing reads or PCR yields. LNA was used in three positional patterns: near the 5′ end (LNA-5′), near the 3′ end (LNA-3′) and distributed throughout (LNA-Even). Quantitative measures of sequencing read length (Phred Q30 count) and real-time PCR signal (cycle threshold, CT) were characterized using two-way ANOVA. LNA-5′ increased the average Phred Q30 score by 60% and it was never observed to decrease performance. LNA-5′ generated cycle thresholds in quantitative PCR that were comparable to high-yielding conventional primers. In contrast, LNA-3′ and LNA-Even did not improve read lengths or CT. ANOVA demonstrated the statistical significance of these results and identified significant interaction between the positional design rule and primer sequence. PMID:17071964
Development and utilization of novel intron length polymorphic markers in foxtail millet (Setaria italica (L.) P. Beauv.).

PubMed

Gupta, Sarika; Kumari, Kajal; Das, Jyotirmoy; Lata, Charu; Puranik, Swati; Prasad, Manoj

2011-07-01

Introns are noncoding sequences in a gene that are transcribed to precursor mRNA but spliced out during mRNA maturation and are abundant in eukaryotic genomes. The availability of codominant molecular markers and saturated genetic linkage maps have been limited in foxtail millet (Setaria italica (L.) P. Beauv.). Here, we describe the development of 98 novel intron length polymorphic (ILP) markers in foxtail millet using sequence information of the model plant rice. A total of 575 nonredundant expressed sequence tag (EST) sequences were obtained, of which 327 and 248 unique sequences were from dehydration- and salinity-stressed suppression subtractive hybridization libraries, respectively. The BLAST analysis of 98 EST sequences suggests a nearly defined function for about 64% of them, and they were grouped into 11 different functional categories. All 98 ILP primer pairs showed a high level of cross-species amplification in two millets and two nonmillets species ranging from 90% to 100%, with a mean of ∼97%. The mean observed heterozygosity and Nei's average gene diversity 0.016 and 0.171, respectively, established the efficiency of the ILP markers for distinguishing the foxtail millet accessions. Based on 26 ILP markers, a reasonable dendrogram of 45 foxtail millet accessions was constructed, demonstrating the utility of ILP markers in germplasm characterizations and genomic relationships in millets and nonmillets species.
Dynamic sample size detection in learning command line sequence for continuous authentication.

PubMed

Traore, Issa; Woungang, Isaac; Nakkabi, Youssef; Obaidat, Mohammad S; Ahmed, Ahmed Awad E; Khalilian, Bijan

2012-10-01

Continuous authentication (CA) consists of authenticating the user repetitively throughout a session with the goal of detecting and protecting against session hijacking attacks. While the accuracy of the detector is central to the success of CA, the detection delay or length of an individual authentication period is important as well since it is a measure of the window of vulnerability of the system. However, high accuracy and small detection delay are conflicting requirements that need to be balanced for optimum detection. In this paper, we propose the use of sequential sampling technique to achieve optimum detection by trading off adequately between detection delay and accuracy in the CA process. We illustrate our approach through CA based on user command line sequence and naïve Bayes classification scheme. Experimental evaluation using the Greenberg data set yields encouraging results consisting of a false acceptance rate (FAR) of 11.78% and a false rejection rate (FRR) of 1.33%, with an average command sequence length (i.e., detection delay) of 37 commands. When using the Schonlau (SEA) data set, we obtain FAR = 4.28% and FRR = 12%.

A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

PubMed

Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

2017-01-01

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

PubMed

Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

2016-06-01

Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.
A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

PubMed

Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

2016-09-02

Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Steinernema balochiense n. sp. (Rhabditida: Steinernematidae) a new entomopathogenic nematode from Pakistan.

PubMed

Fayyaz, Shahina; Khanum, Tabassum Ara; Ali, Shaukat; Solangi, Ghulam Sarwar; Gulsher, Mehreen; Javed, Salma

2015-01-07

A new species of entomopathogenic nematode (EPN) named Steinernema balochiense n. sp. belonging to the family Steinernematidae was isolated from Psidium guajava L., rhizophere soil samples of Balochistan, Pakistan. This new species belongs to the carpocapsae group. The new species can be separated from other described species by morphological and morphometrics characteristics as well as DNA sequence polymorphisms. This new nematode species is morphologically characterized by features of infective juveniles (IJ) and males. For the IJ average body length was (455; 415-528) µm, distance from anterior end to excretory pore (35; 32-38) µm, pharynx length (90; 85-98) µm, tail length (44.3; 40-51) µm, D% and E% values (39; 36-44) and (80; 70-92), respectively. For male specimens, the diagnostic characters included total body length (1330; 1135-1632) µm, gubernaculum length (44.4; 40-47) µm, D% (43.8; 40-51) and ratio of GS (63.8; 53-75). Morphological diagnostic traits for new species include the presence of a funnel shaped gubernaculum at the proximal end. S. balochiense n. sp. differs from infective stage juveniles of closest species S. nepalense by having 6 ridges vs 7 ridges in the lateral field. Molecular phylogenetic trees based on sequence of ITS-rDNA, D2D3 regions and the mitochondrial 12S rRNA gene supports the description of this nematode isolate as a new species.
Genomic Sequencing and Characterization of Cynomolgus Macaque Cytomegalovirus▿

PubMed Central

Marsh, Angie K.; Willer, David O.; Ambagala, Aruna P. N.; Dzamba, Misko; Chan, Jacqueline K.; Pilon, Richard; Fournier, Jocelyn; Sandstrom, Paul; Brudno, Michael; MacDonald, Kelly S.

2011-01-01

Cytomegalovirus (CMV) infection is the most common opportunistic infection in immunosuppressed individuals, such as transplant recipients or people living with HIV/AIDS, and congenital CMV is the leading viral cause of developmental disabilities in infants. Due to the highly species-specific nature of CMV, animal models that closely recapitulate human CMV (HCMV) are of growing importance for vaccine development. Here we present the genomic sequence of a novel nonhuman primate CMV from cynomolgus macaques (Macaca fascicularis; CyCMV). CyCMV (Ottawa strain) was isolated from the urine of a healthy, captive-bred, 4-year-old cynomolgus macaque of Philippine origin, and the viral genome was sequenced using next-generation Illumina sequencing to an average of 516-fold coverage. The CyCMV genome is 218,041 bp in length, with 49.5% G+C content and 84% protein-coding density. We have identified 262 putative open reading frames (ORFs) with an average coding length of 789 bp. The genomic organization of CyCMV is largely colinear with that of rhesus macaque CMV (RhCMV). Of the 262 CyCMV ORFs, 137 are homologous to HCMV genes, 243 are homologous to RhCMV 68.1, and 200 are homologous to RhCMV 180.92. CyCMV encodes four ORFs that are not present in RhCMV strain 68.1 or 180.92 but have homologies with HCMV (UL30, UL74A, UL126, and UL146). Similar to HCMV, CyCMV does not produce the RhCMV-specific viral homologue of cyclooxygenase-2. This newly characterized CMV may provide a novel model in which to study CMV biology and HCMV vaccine development. PMID:21994460
Transcriptome Analysis of Dendrobium officinale and its Application to the Identification of Genes Associated with Polysaccharide Synthesis

PubMed Central

Zhang, Jianxia; He, Chunmei; Wu, Kunlin; Teixeira da Silva, Jaime A.; Zeng, Songjun; Zhang, Xinhua; Yu, Zhenming; Xia, Haoqiang; Duan, Jun

2016-01-01

Dendrobium officinale is one of the most important Chinese medicinal herbs. Polysaccharides are one of the main active ingredients of D. officinale. To identify the genes that maybe related to polysaccharides synthesis, two cDNA libraries were prepared from juvenile and adult D. officinale, and were named Dendrobium-1 and Dendrobium-2, respectively. Illumina sequencing for Dendrobium-1 generated 102 million high quality reads that were assembled into 93,881 unigenes with an average sequence length of 790 base pairs. The sequencing for Dendrobium-2 generated 86 million reads that were assembled into 114,098 unigenes with an average sequence length of 695 base pairs. Two transcriptome databases were integrated and assembled into a total of 145,791 unigenes. Among them, 17,281 unigenes were assigned to 126 KEGG pathways while 135 unigenes were involved in fructose and mannose metabolism. Gene Ontology analysis revealed that the majority of genes were associated with metabolic and cellular processes. Furthermore, 430 glycosyltransferase and 89 cellulose synthase genes were identified. Comparative analysis of both transcriptome databases revealed a total of 32,794 differential expression genes (DEGs), including 22,051 up-regulated and 10,743 down-regulated genes in Dendrobium-2 compared to Dendrobium-1. Furthermore, a total of 1142 and 7918 unigenes showed unique expression in Dendrobium-1 and Dendrobium-2, respectively. These DEGs were mainly correlated with metabolic pathways and the biosynthesis of secondary metabolites. In addition, 170 DEGs belonged to glycosyltransferase genes, 37 DEGs were related to cellulose synthase genes and 627 DEGs encoded transcription factors. This study substantially expands the transcriptome information for D. officinale and provides valuable clues for identifying candidate genes involved in polysaccharide biosynthesis and elucidating the mechanism of polysaccharide biosynthesis. PMID:26904032
Characterization of DTI Indices in the Cervical, Thoracic, and Lumbar Spinal Cord in Healthy Humans

PubMed Central

Bosma, Rachael L.; Stroman, Patrick W.

2012-01-01

The aim of this study was to characterize in vivo measurements of diffusion along the length of the entire healthy spinal cord and to compare DTI indices, including fractional anisotropy (FA) and mean diffusivity (MD), between cord regions. The objective is to determine whether or not there are significant differences in DTI indices along the cord that must be considered for future applications of characterizing the effects of injury or disease. A cardiac gated, single-shot EPI sequence was used to acquire diffusion-weighted images of the cervical, thoracic, and lumbar regions of the spinal cord in nine neurologically intact subjects (19 to 22 years). For each cord section, FA versus MD values were plotted, and a k-means clustering method was applied to partition the data according to tissue properties. FA and MD values from both white matter (average FA = 0.69, average MD = 0.93 × 10−3 mm2/s) and grey matter (average FA = 0.44, average MD = 1.8 × 10−3 mm2/s) were relatively consistent along the length of the cord. PMID:22295179
Structure-related statistical singularities along protein sequences: a correlation study.

PubMed

Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

2005-01-01

A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses

PubMed Central

Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A.; Janke, Axel

2015-01-01

The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. PMID:26019166
Enriching Genomic Resources and Marker Development from Transcript Sequences of Jatropha curcas for Microgravity Studies

PubMed Central

Tian, Wenlan; Paudel, Dev

2017-01-01

Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822
Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing

PubMed Central

Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C.; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird

2016-01-01

The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. PMID:27345719
Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

DOE PAGES

Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...

2017-01-19

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Qian; Jun, Se -Ran; Leuze, Michael

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
Complete chloroplast genome sequence of a major allogamous forage species, perennial ryegrass (Lolium perenne L.).

PubMed

Diekmann, Kerstin; Hodkinson, Trevor R; Wolfe, Kenneth H; van den Bekerom, Rob; Dix, Philip J; Barth, Susanne

2009-06-01

Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1-27 codons in comparison of L. perenne to other Poaceae and 1-68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT-PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan.
Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

PubMed Central

Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat

2017-01-01

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365
De novo assembly of the transcriptome of Aegiceras corniculatum, a mangrove species in the Indo-West Pacific region.

PubMed

Fang, Lu; Yang, Yuchen; Guo, Wuxia; Li, Jianfang; Zhong, Cairong; Huang, Yelin; Zhou, Renchao; Shi, Suhua

2016-08-01

Aegiceras corniculatum (L.) Blanco is one of the most salt tolerant mangrove species and can thrive in 3% salinity at the seaward edge of mangrove forests. Here we sequenced the transcriptome of A. corniculatum used Illumina GA platform to develop its genomic resources for ecological and evolutionary studies. We obtained about 50 million high-quality paired-end reads with 75bp in length. Using the short read assembler Velvet, we yielded 49,437 contigs with the average length of 625bp. A total of 32,744 (66.23%) contigs showed significant similarity to the GenBank non-redundant (NR) protein database. 30,911 and 18,004 of these sequences were assigned to Gene Ontology and eukaryotic orthologous groups of proteins (KOG). A total of 4942 transcripts from our assemblies had significant similarity with KEGG Orthologs and were involved in 144 KEGG pathways, while 9899 unigenes had enzyme commission (EC) numbers. In addition, 9792 transcriptome-derived SSRs were identified from 7342 sequences. With our strict criteria, 4165 candidate SNPs were also identified from 2058 contigs. Some of these SNPs were further validated by Sanger sequencing. Genomic resources generated in this study should be valuable in ecological, evolutionary, and functional genomics studies for this mangrove species. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature

PubMed Central

Pozzoli, Uberto; Elgar, Greg; Cagliani, Rachele; Riva, Laura; Comi, Giacomo P.; Bresolin, Nereo; Bardoni, Alessandra; Sironi, Manuela

2003-01-01

The human DMD gene is the largest known to date, spanning > 2000 kb on the X chromosome. The gene size is mainly accounted for by huge intronic regions. We sequenced 190 kb of Fugu rubripes (pufferfish) genomic DNA corresponding to the complete dystrophin gene (FrDMD) and provide the first report of gene structure and sequence comparison among dystrophin genomic sequences from different vertebrate organisms. Almost all intron positions and phases are conserved between FrDMD and its mammalian counterparts, and the predicted protein product of the Fugu gene displays 55% identity and 71% similarity to human dystrophin. In analogy to the human gene, FrDMD presents several-fold longer than average intronic regions. Analysis of intron sequences of the human and murine genes revealed that they are extremely conserved in size and that a similar fraction of total intron length is represented by repetitive elements; moreover, our data indicate that intron expansion through repeat accumulation in the two orthologs is the result of independent insertional events. The hypothesis that intron length might be functionally relevant to the DMD gene regulation is proposed and substantiated by the finding that dystrophin intron gigantism is common to the three vertebrate genes. [Supplemental material is available online at www.genome.org.] PMID:12727896
A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

PubMed Central

Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

2000-01-01

Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957
Predictors of Failure in Infant Mandibular Distraction Osteogenesis.

PubMed

Hammoudeh, Jeffrey A; Fahradyan, Artur; Brady, Colin; Tsuha, Michaela; Azadgoli, Beina; Ward, Sally; Urata, Mark M

2018-03-15

Mandibular distraction osteogenesis (MDO) has been shown to be successful in treating upper airway obstruction caused by micrognathia in pediatric patients. The purpose of this study was to assess the success rate of MDO and possible predictors of failure. The records of all neonates and infants who underwent MDO from 2008 to 2015 were retrospectively reviewed. Procedural failure was defined as patient death or the need for tracheostomy postoperatively. Details of distraction, length of stay, and failures were captured and elucidated. Of the 82 patients, 47 (57.3%) were male; 46 (56.1%) had sporadic Pierre Robin sequence; 33 (40.3%) had syndromic Pierre Robin sequence; and 3 (3.7%) had micrognathia, not otherwise specified. The average distraction length was 27.5 mm (range, 15 to 30 mm; SD, 4.4 mm), the average age at operation was 63.3 days (range, 3 to 342 days; SD, 71.4 days), and the average length of post-MDO hospital stay was 43 days (range, 9 to 219 days; SD, 35 days) with an average follow-up period of 4.3 years (range, 1.1 to 9.6 years; SD, 2.6 years). There were 7 failures (8.5%) (5 tracheostomies and 2 deaths) resulting in a 91.5% success rate. Regression analysis showed that the predicted probability of the need for tracheostomy was 45% (P = .02) when the patient had a central nervous system (CNS) anomaly. The predicted probability of the need for tracheostomy and death combined was 99.6% when the patient had laryngomalacia and a CNS anomaly and was preoperatively intubated (P < .05). This review confirms that MDO is an effective method of treating the upper airway obstruction caused by micrognathia with a high success rate. In our sample the presence of CNS abnormalities, laryngomalacia, and preoperative intubation had a significant impact on the failure rate. Copyright © 2018 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Vibration transfer mobility measurements using maximum length sequences

NASA Astrophysics Data System (ADS)

Singleton, Herbert L.

2005-09-01

Vibration transfer mobility measurements are required under Federal Transit Administration guidelines when developing detailed predictions of ground-borne vibration for rail transit systems. These measurements typically use a large instrumented hammer to generate impulses in the soil. These impulses are measured by an array of accelerometers to characterize the transfer mobility of the ground in a localized area. While effective, these measurements often make use of heavy, custom-engineered equipment to produce the impulse signal. To obtain satisfactory signal-to-noise ratios, it is necessary to generate multiple impulses to generate an average value, but this process involves considerable physical labor in the field. To address these shortcomings, a transfer mobility measurement system utilizing a tactile transducer and maximum length sequences (MLS) was developed. This system uses lightweight off-the-shelf components to significantly reduce the weight and cost of the system. The use of MLS allows for adequate signal-to-noise ratio from the tactile transducer, while minimizing the length of the measurement. Tests of the MLS system show good agreement with the impulse-based method. The combination of the cost savings and reduced weight of this new system facilitates transfer mobility measurements that are less physically demanding, and more economical when compared with current methods.

Genome structure of Rosa multiflora, a wild ancestor of cultivated roses

PubMed Central

Nakamura, Noriko; Hirakawa, Hideki; Sato, Shusei; Otagaki, Shungo; Matsumoto, Shogo; Tabata, Satoshi; Tanaka, Yoshikazu

2018-01-01

Abstract The draft genome sequence of a wild rose (Rosa multiflora Thunb.) was determined using Illumina MiSeq and HiSeq platforms. The total length of the scaffolds was 739,637,845 bp, consisting of 83,189 scaffolds, which was close to the 711 Mbp length estimated by k-mer analysis. N50 length of the scaffolds was 90,830 bp, and extent of the longest was 1,133,259 bp. The average GC content of the scaffolds was 38.9%. After gene prediction, 67,380 candidates exhibiting sequence homology to known genes and domains were extracted, which included complete and partial gene structures. This large number of genes for a diploid plant may reflect heterogeneity of the genome originating from self-incompatibility in R. multiflora. According to CEGMA analysis, 91.9% and 98.0% of the core eukaryotic genes were completely and partially conserved in the scaffolds, respectively. Genes presumably involved in flower color, scent and flowering are assigned. The results of this study will serve as a valuable resource for fundamental and applied research in the rose, including breeding and phylogenetic study of cultivated roses. PMID:29045613
Regional grassland productivity responses to precipitation during multiyear above- and below-average rainfall periods.

PubMed

Petrie, Matthew D; Peters, Debra P C; Yao, Jin; Blair, John M; Burruss, Nathan D; Collins, Scott L; Derner, Justin D; Gherardi, Laureano A; Hendrickson, John R; Sala, Osvaldo E; Starks, Patrick J; Steiner, Jean L

2018-05-01

There is considerable uncertainty in the magnitude and direction of changes in precipitation associated with climate change, and ecosystem responses are also uncertain. Multiyear periods of above- and below-average rainfall may foretell consequences of changes in rainfall regime. We compiled long-term aboveground net primary productivity (ANPP) and precipitation (PPT) data for eight North American grasslands, and quantified relationships between ANPP and PPT at each site, and in 1-3 year periods of above- and below-average rainfall for mesic, semiarid cool, and semiarid warm grassland types. Our objective was to improve understanding of ANPP dynamics associated with changing climatic conditions by contrasting PPT-ANPP relationships in above- and below-average PPT years to those that occurred during sequences of multiple above- and below-average years. We found differences in PPT-ANPP relationships in above- and below-average years compared to long-term site averages, and variation in ANPP not explained by PPT totals that likely are attributed to legacy effects. The correlation between ANPP and current- and prior-year conditions changed from year to year throughout multiyear periods, with some legacy effects declining, and new responses emerging. Thus, ANPP in a given year was influenced by sequences of conditions that varied across grassland types and climates. Most importantly, the influence of prior-year ANPP often increased with the length of multiyear periods, whereas the influence of the amount of current-year PPT declined. Although the mechanisms by which a directional change in the frequency of above- and below-average years imposes a persistent change in grassland ANPP require further investigation, our results emphasize the importance of legacy effects on productivity for sequences of above- vs. below-average years, and illustrate the utility of long-term data to examine these patterns. © 2018 John Wiley & Sons Ltd.
Characterization of the Antarctic sea urchin (Sterechinus neumayeri) transcriptome and mitogenome: a molecular resource for phylogenetics, ecophysiology and global change biology.

PubMed

Dilly, G F; Gaitán-Espitia, J D; Hofmann, G E

2015-03-01

This is the first de novo transcriptome and complete mitochondrial genome of an Antarctic sea urchin species sequenced to date. Sterechinus neumayeri is an Antarctic sea urchin and a model species for ecology, development, physiology and global change biology. To identify transcripts important to ocean acidification (OA) and thermal stress, this transcriptome was created pooling, and 13 larval samples representing developmental stages on day 11 (late gastrula), 19 (early pluteus) and 30 (mid pluteus) maintained at three CO2 levels (421, 652, and 1071 μatm) as well as four additional heat-shocked samples. The normalized cDNA pool was sequenced using emulsion PCR (pyrosequencing) resulting in 1.34M reads with an average read length of 492 base pairs. 40,994 isotigs were identified, averaging 1188 bp with a median coverage of 11×. Additional primer design and gap sequencing were required to complete the mitochondrial genome. The mitogenome of S. neumayeri is a circular DNA molecule with a length of 15 684 bp that contains all 37 genes normally found in metazoans. We detail the main features of the transcriptome and the mitogenome architecture and investigate the phylogenetic relationships of S. neumayeri within Echinoidea. In addition, we provide comparative analyses of S. neumayeri with its closest relative, Strongylocentrotus purpuratus, including a list of potential OA gene targets. The resources described here will support a variety of quantitative (genomic, proteomic, multistress and comparative) studies to interrogate physiological responses to OA and other stressors in this important Antarctic calcifier. © 2014 John Wiley & Sons Ltd.
Transcriptome analysis of sika deer in China.

PubMed

Jia, Bo-Yin; Ba, Heng-Xing; Wang, Gui-Wu; Yang, Ying; Cui, Xue-Zhe; Peng, Ying-Hua; Zheng, Jun-Jun; Xing, Xiu-Mei; Yang, Fu-He

2016-10-01

Sika deer is of great commercial value because their antlers are used in tonics and alternative medicine and their meat is healthy and delicious. The goal of this study was to generate transcript sequences from sika deer for functional genomic analyses and to identify the transcripts that demonstrate tissue-specific, age-dependent differential expression patterns. These sequences could enhance our understanding of the molecular mechanisms underlying sika deer growth and development. In the present study, we performed de novo transcriptome assembly and profiling analysis across ten tissue types and four developmental stages (juvenile, adolescent, adult, and aged) of sika deer, using Illumina paired-end tag (PET) sequencing technology. A total of 1,752,253 contigs with an average length of 799 bp were generated, from which 1,348,618 unigenes with an average length of 590 bp were defined. Approximately 33.2 % of these (447,931 unigenes) were then annotated in public protein databases. Many sika deer tissue-specific, age-dependent unigenes were identified. The testes have the largest number of tissue-enriched unigenes, and some of them were prone to develop new functions for other tissues. Additionally, our transcriptome revealed that the juvenile-adolescent transition was the most complex and important stage of the sika deer life cycle. The present work represents the first multiple tissue transcriptome analysis of sika deer across four developmental stages. The generated data not only provide a functional genomics resource for future biological research on sika deer but also guide the selection and manipulation of genes controlling growth and development.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

DOE PAGES

Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...

2017-07-18

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

PubMed Central

Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.

2017-01-01

This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
Mitochondrial DNA typing from human axillary, pubic and head hair shafts - success rates and sequence comparisons.

PubMed

Pfeiffer, H; Hühne, J; Ortmann, C; Waterkamp, K; Brinkmann, B

1999-01-01

The analysis of mitochondrial DNA (mtDNA) from shed hairs has gained high importance in forensic casework since telogen hairs are one of the most common types of evidence left at the crime scene. In this systematic study of hair shafts from 20 individuals, the correlation of mtDNA recovery with hair morphology (length, diameter, volume, colour), with sex, and with body localisation (head, armpit, pubis) was investigated. The highest average success rate of hypervariable region 1 (HV 1) sequencing was found in head hair shafts (75%) followed by pubic (66%) and axillary hair shafts (52%). No statistically significant correlation between morphological parameters or sex and the success rate of sequencing was found. MtDNA sequences of buccal cells, head, pubic and axillary hair shafts did not show intraindividual differences. Heteroplasmic base positions were observed neither in the hair shafts nor in control samples of buccal cells.
Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses.

PubMed

Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A; Janke, Axel

2015-05-27

The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Length-independent structural similarities enrich the antibody CDR canonical class model.

PubMed

Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M

2016-01-01

Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing.
Genome features of moderately halophilic polyhydroxyalkanoate-producing Yangia sp. CCB-MM3.

PubMed

Lau, Nyok-Sean; Sam, Ka-Kei; Amirul, Abdullah Al-Ashraf

2017-01-01

Yangia sp. CCB-MM3 was one of several halophilic bacteria isolated from soil sediment in the estuarine Matang Mangrove, Malaysia. So far, no member from the genus Yangia , a member of the Rhodobacteraceae family, has been reported sequenced. In the current study, we present the first complete genome sequence of Yangia sp. strain CCB-MM3. The genome includes two chromosomes and five plasmids with a total length of 5,522,061 bp and an average GC content of 65%. Since a different strain of Yangia sp. (ND199) was reported to produce a polyhydroxyalkanoate copolymer, the ability for this production was tested in vitro and confirmed for strain CCB-MM3. Analysis of its genome sequence confirmed presence of a pathway for production of propionyl-CoA and gene cluster for PHA production in the sequenced strain. The genome sequence described will be a useful resource for understanding the physiology and metabolic potential of Yangia as well as for comparative genomic analysis with other Rhodobacteraceae .
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

PubMed

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Association between sequence variants in panicle development genes and the number of spikelets per panicle in rice.

PubMed

Jang, Su; Lee, Yunjoo; Lee, Gileung; Seo, Jeonghwan; Lee, Dongryung; Yu, Yoye; Chin, Joong Hyoun; Koh, Hee-Jong

2018-01-15

Balancing panicle-related traits such as panicle length and the numbers of primary and secondary branches per panicle, is key to improving the number of spikelets per panicle in rice. Identifying genetic information contributes to a broader understanding of the roles of gene and provides candidate alleles for use as DNA markers. Discovering relations between panicle-related traits and sequence variants allows opportunity for molecular application in rice breeding to improve the number of spikelets per panicle. In total, 142 polymorphic sites, which constructed 58 haplotypes, were detected in coding regions of ten panicle development gene and 35 sequence variants in six genes were significantly associated with panicle-related traits. Rice cultivars were clustered according to their sequence variant profiles. One of the four resultant clusters, which contained only indica and tong-il varieties, exhibited the largest average number of favorable alleles and highest average number of spikelets per panicle, suggesting that the favorable allele combination found in this cluster was beneficial in increasing the number of spikelets per panicle. Favorable alleles identified in this study can be used to develop functional markers for rice breeding programs. Furthermore, stacking several favorable alleles has the potential to substantially improve the number of spikelets per panicle in rice.
The effects of rest interval length manipulation of the first upper-body resistance exercise in sequence on acute performance of subsequent exercises in men and women.

PubMed

Ratamess, Nicholas A; Chiarello, Christina M; Sacco, Anthony J; Hoffman, Jay R; Faigenbaum, Avery D; Ross, Ryan E; Kang, Jie

2012-11-01

The purpose of the present study was to investigate the effects of manipulating rest interval (RI) length of the first upper-body exercise in sequence on subsequent resistance exercise performance. Twenty-two men and women with at least 1 year of resistance training experience performed resistance exercise protocols on 3 occasions in random order. Each protocol consisted of performing 4 barbell upper-body exercises in the same sequence (bench press, incline bench press, shoulder press, and bent-over row) for 3 sets of up to 10 repetitions with 75% of 1 repetition maximum. Bench press RIs were 1, 2, or 3 minutes, whereas other exercises were performed with a standard 2-minute rest interval. The number of repetitions completed, average power, and velocity for each set of each exercise were recorded. Gender differences were observed during the bench press and incline press as women performed significantly (p ≤ 0.05) more repetitions than men during all RIs. The magnitude of decline in velocity and power over 3 sets of the bench press and incline press was significantly higher in men than women. Manipulation of RI length during the bench press did not affect performance of the remaining exercises in men. However, significantly more repetitions were performed by women during the first set of the incline press using 3-minute rest interval than 1-minute rest interval. In men and women, performance of the incline press and shoulder press was compromised compared with baseline performances. Manipulation of RI length of the first exercise affected performance of only the first set of 1 subsequent exercise in women. All RIs led to comparable levels of fatigue in men, indicating that reductions in load are necessary for subsequent exercises performed in sequence that stress similar agonist muscle groups when 10 repetitions are desired.
Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance

PubMed Central

2011-01-01

Background Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. Results A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Conclusion Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology. PMID:21679424
Development of 5123 Intron-Length Polymorphic Markers for Large-Scale Genotyping Applications in Foxtail Millet

PubMed Central

Muthamilarasan, Mehanathan; Venkata Suresh, B.; Pandey, Garima; Kumari, Kajal; Parida, Swarup Kumar; Prasad, Manoj

2014-01-01

Generating genomic resources in terms of molecular markers is imperative in molecular breeding for crop improvement. Though development and application of microsatellite markers in large-scale was reported in the model crop foxtail millet, no such large-scale study was conducted for intron-length polymorphic (ILP) markers. Considering this, we developed 5123 ILP markers, of which 4049 were physically mapped onto 9 chromosomes of foxtail millet. BLAST analysis of 5123 expressed sequence tags (ESTs) suggested the function for ∼71.5% ESTs and grouped them into 5 different functional categories. About 440 selected primer pairs representing the foxtail millet genome and the different functional groups showed high-level of cross-genera amplification at an average of ∼85% in eight millets and five non-millet species. The efficacy of the ILP markers for distinguishing the foxtail millet is demonstrated by observed heterozygosity (0.20) and Nei's average gene diversity (0.22). In silico comparative mapping of physically mapped ILP markers demonstrated substantial percentage of sequence-based orthology and syntenic relationship between foxtail millet chromosomes and sorghum (∼50%), maize (∼46%), rice (∼21%) and Brachypodium (∼21%) chromosomes. Hence, for the first time, we developed large-scale ILP markers in foxtail millet and demonstrated their utility in germplasm characterization, transferability, phylogenetics and comparative mapping studies in millets and bioenergy grass species. PMID:24086082
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Telomere Length, Current Perceived Stress, and Urinary Stress Hormones in Women

PubMed Central

Parks, Christine G.; Miller, Diane B.; McCanlies, Erin C.; Cawthon, Richard M.; Andrew, Michael E.; DeRoo, Lisa A.; Sandler, Dale P.

2009-01-01

Telomeres are repetitive DNA sequences that cap and protect the ends of chromosomes; critically short telomeres may lead to cellular senescence or carcinogenic transformation. Previous findings suggest a link between psychosocial stress, shorter telomeres, and chronic disease risk. This cross-sectional study examined relative telomere length in relation to perceived stress and urinary stress hormones in a sample of participants (n = 647) in the National Institute of Environmental Health Sciences Sister Study, a cohort of women ages 35 to 74 years who have a sister with breast cancer. Average leukocyte telomere length was determined by quantitative PCR. Current stress was assessed using the Perceived Stress Scale and creatinine-adjusted neuroendocrine hormones in first morning urines. Linear regression models estimated differences in telomere length base pairs (bp) associated with stress measures adjusted for age, race, smoking, and obesity. Women with higher perceived stress had somewhat shorter telomeres [adjusted difference of −129bp for being at or above moderate stress levels; 95% confidence interval (CI), −292 to 33], but telomere length did not decrease monotonically with higher stress levels. Shorter telomeres were independently associated with increasing age (−27bp/year), obesity, and current smoking. Significant stress-related differences in telomere length were seen in women ages 55 years and older (−289bp; 95% CI, −519 to −59), those with recent major losses (−420bp; 95% CI, −814 to −27), and those with above-average urinary catecholamines (e.g., epinephrine: −484bp; 95% CI, −709 to −259). Although current perceived stress was only modestly associated with shorter telomeres in this broad sample of women, our findings suggest the effect of stress on telomere length may vary depending on neuroendocrine responsiveness, external stressors, and age. PMID:19190150
Standardized quantitative measurements of wrist cartilage in healthy humans using 3T magnetic resonance imaging

PubMed Central

Zink, Jean-Vincent; Souteyrand, Philippe; Guis, Sandrine; Chagnaud, Christophe; Fur, Yann Le; Militianu, Daniela; Mattei, Jean-Pierre; Rozenbaum, Michael; Rosner, Itzhak; Guye, Maxime; Bernard, Monique; Bendahan, David

2015-01-01

AIM: To quantify the wrist cartilage cross-sectional area in humans from a 3D magnetic resonance imaging (MRI) dataset and to assess the corresponding reproducibility. METHODS: The study was conducted in 14 healthy volunteers (6 females and 8 males) between 30 and 58 years old and devoid of articular pain. Subjects were asked to lie down in the supine position with the right hand positioned above the pelvic region on top of a home-built rigid platform attached to the scanner bed. The wrist was wrapped with a flexible surface coil. MRI investigations were performed at 3T (Verio-Siemens) using volume interpolated breath hold examination (VIBE) and dual echo steady state (DESS) MRI sequences. Cartilage cross sectional area (CSA) was measured on a slice of interest selected from a 3D dataset of the entire carpus and metacarpal-phalangeal areas on the basis of anatomical criteria using conventional image processing radiology software. Cartilage cross-sectional areas between opposite bones in the carpal region were manually selected and quantified using a thresholding method. RESULTS: Cartilage CSA measurements performed on a selected predefined slice were 292.4 ± 39 mm2 using the VIBE sequence and slightly lower, 270.4 ± 50.6 mm2, with the DESS sequence. The inter (14.1%) and intra (2.4%) subject variability was similar for both MRI methods. The coefficients of variation computed for the repeated measurements were also comparable for the VIBE (2.4%) and the DESS (4.8%) sequences. The carpus length averaged over the group was 37.5 ± 2.8 mm with a 7.45% between-subjects coefficient of variation. Of note, wrist cartilage CSA measured with either the VIBE or the DESS sequences was linearly related to the carpal bone length. The variability between subjects was significantly reduced to 8.4% when the CSA was normalized with respect to the carpal bone length. CONCLUSION: The ratio between wrist cartilage CSA and carpal bone length is a highly reproducible standardized measurement which normalizes the natural diversity between individuals. PMID:26396941

Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA

NASA Astrophysics Data System (ADS)

Rovenchak, Andrij

2018-02-01

Rank-frequency distributions of nucleotide sequences in mitochondrial DNA are defined in a way analogous to the linguistic approach, with the highest-frequent nucleobase serving as a whitespace. For such sequences, entropy and mean length are calculated. These parameters are shown to discriminate the species of the Felidae (cats) and Ursidae (bears) families. From purely numerical values we are able to see in particular that giant pandas are bears while koalas are not. The observed linear relation between the parameters is explained using a simple probabilistic model. The approach based on the non-additive generalization of the Bose distribution is used to analyze the frequency spectra of the nucleotide sequences. In this case, the separation of families is not very sharp. Nevertheless, the distributions for Felidae have on average longer tails comparing to Ursidae.
DNA Sequencing Using capillary Electrophoresis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dr. Barry Karger

2011-05-09

The overall goal of this program was to develop capillary electrophoresis as the tool to be used to sequence for the first time the Human Genome. Our program was part of the Human Genome Project. In this work, we were highly successful and the replaceable polymer we developed, linear polyacrylamide, was used by the DOE sequencing lab in California to sequence a significant portion of the human genome using the MegaBase multiple capillary array electrophoresis instrument. In this final report, we summarize our efforts and success. We began our work by separating by capillary electrophoresis double strand oligonucleotides using cross-linkedmore » polyacrylamide gels in fused silica capillaries. This work showed the potential of the methodology. However, preparation of such cross-linked gel capillaries was difficult with poor reproducibility, and even more important, the columns were not very stable. We improved stability by using non-cross linked linear polyacrylamide. Here, the entangled linear chains could move when osmotic pressure (e.g. sample injection) was imposed on the polymer matrix. This relaxation of the polymer dissipated the stress in the column. Our next advance was to use significantly lower concentrations of the linear polyacrylamide that the polymer could be automatically blown out after each run and replaced with fresh linear polymer solution. In this way, a new column was available for each analytical run. Finally, while testing many linear polymers, we selected linear polyacrylamide as the best matrix as it was the most hydrophilic polymer available. Under our DOE program, we demonstrated initially the success of the linear polyacrylamide to separate double strand DNA. We note that the method is used even today to assay purity of double stranded DNA fragments. Our focus, of course, was on the separation of single stranded DNA for sequencing purposes. In one paper, we demonstrated the success of our approach in sequencing up to 500 bases. Other application papers of sequencing up to this level were also published in the mid 1990's. A major interest of the sequencing community has always been read length. The longer the sequence read per run the more efficient the process as well as the ability to read repeat sequences. We therefore devoted a great deal of time to studying the factors influencing read length in capillary electrophoresis, including polymer type and molecule weight, capillary column temperature, applied electric field, etc. In our initial optimization, we were able to demonstrate, for the first time, the sequencing of over 1000 bases with 90% accuracy. The run required 80 minutes for separation. Sequencing of 1000 bases per column was next demonstrated on a multiple capillary instrument. Our studies revealed that linear polyacrylamide produced the longest read lengths because the hydrophilic single strand DNA had minimal interaction with the very hydrophilic linear polyacrylamide. Any interaction of the DNA with the polymer would lead to broader peaks and lower read length. Another important parameter was the molecular weight of the linear chains. High molecular weight (> 1 MDA) was important to allow the long single strand DNA to reptate through the entangled polymer matrix. In an important paper, we showed an inverse emulsion method to prepare reproducibility linear polyacrylamide polymer with an average MWT of 9MDa. This approach was used in the polymer for sequencing the human genome. Another critical factor in the successful use of capillary electrophoresis for sequencing was the sample preparation method. In the Sanger sequencing reaction, high concentration of salts and dideoxynucleotide remained. Since the sample was introduced to the capillary column by electrokinetic injection, these salt ions would be favorably injected into the column over the sequencing fragments, thus reducing the signal for longer fragments and hence reading read length. In two papers, we examined the role of individual components from the sequencing reaction and then developed a protocol to reduce the deleterious salts. We demonstrated a robust method for achieving long read length DNA sequencing. Continuing our advances, we next demonstrated the achievement of over 1000 bases in less than one hour with a base calling accuracy of between 98 and 99%. In this work, we implemented energy transfer dyes which allowed for cleaner differentiation of the 4 dye labeled terminal nucleotides. In addition, we developed improved base calling software to help read sequencing when the separation was only minimal as occurs at long read lengths. Another critical parameter we studied was column temperature. We demonstrated that read lengths improved as the column temperature was increased from room temperature to 60 C or 70 C. The higher temperature relaxed the DNA chains under the influence of the high electric field.« less
Genetic Mapping and QTL Analysis of Growth-Related Traits in Pinctada fucata Using Restriction-Site Associated DNA Sequencing

PubMed Central

Li, Yaoguo; He, Maoxian

2014-01-01

The pearl oyster, Pinctada fucata (P. fucata), is one of the marine bivalves that is predominantly cultured for pearl production. To obtain more genetic information for breeding purposes, we constructed a high-density linkage map of P. fucata and identified quantitative trait loci (QTL) for growth-related traits. One F1 family, which included the two parents, 48 largest progeny and 50 smallest progeny, was sampled to construct a linkage map using restriction site-associated DNA sequencing (RAD-Seq). With low coverage data, 1956.53 million clean reads and 86,342 candidate RAD loci were generated. A total of 1373 segregating SNPs were used to construct a sex-average linkage map. This spanned 1091.81 centimorgans (cM), with 14 linkage groups and an average marker interval of 1.41 cM. The genetic linkage map coverage, Coa, was 97.24%. Thirty-nine QTL-peak loci, for seven growth-related traits, were identified using the single-marker analysis, nonparametric mapping Kruskal-Wallis (KW) test. Parameters included three for shell height, six for shell length, five for shell width, four for hinge length, 11 for total weight, eight for soft tissue weight and two for shell weight. The QTL peak loci for shell height, shell length and shell weight were all located in linkage group 6. The genotype frequencies of most QTL peak loci showed significant differences between the large subpopulation and the small subpopulation (P<0.05). These results highlight the effectiveness of RAD-Seq as a tool for generation of QTL-targeted and genome-wide marker data in the non-model animal, P. fucata, and its possible utility in marker-assisted selection (MAS). PMID:25369421
Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

PubMed Central

Reilly, Kevin J.; Spencer, Kristie A.

2013-01-01

The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

DOEpatents

Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA

2011-01-18

A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.
Partial Gene Sequencing of CYP1A, Vitellogenin, and Metallothionein in Mosquitofish Gambusia yucatana and Gambusia sexradiata.

PubMed

Vázquez-Euán, Roberto; Escalante-Herrera, Karla S; Rodríguez-Fuentes, Gabriela

2017-01-01

Ground characteristics in the Yucatan Peninsula make recovery and treatment of wastewater very expensive. This situation has contributed to an increase of pollutants in the aquifer. Unfortunately, studies related to the effects of those pollutants in native organisms are scarce. The aim of this work was to obtain partial sequences of widely known genes used as biomarkers of pollutant effect in Gambusia yucatana and Gambusia sexradiata. The studied genes were: cytochrome P450 1A (CYP1A); vitellogenin (VTG); metallothionein (MT), and two housekeeping genes, 18S and β-actin. From reported sequences of Gambusia affinis, primers were designed and amplification was done in the local Gambusia species exposed for 48 h to gasoline (100 µL/L, stirred for 24 h pre-exposure). Preliminary results revealed partial sequences of all genes with an approximate average length of 200 bp. BLAST analysis of found sequences indicated a minimum of 97% identity with reported sequences for G. affinis or Gambusia holbrooki showing great similarity.
Deep RNA-Seq to unlock the gene bank of floral development in Sinapis arvensis.

PubMed

Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong

2014-01-01

Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops.
Deep RNA-Seq to Unlock the Gene Bank of Floral Development in Sinapis arvensis

PubMed Central

Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong

2014-01-01

Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops. PMID:25192023
Network of listed companies based on common shareholders and the prediction of market volatility

NASA Astrophysics Data System (ADS)

Li, Jie; Ren, Da; Feng, Xu; Zhang, Yongjie

2016-11-01

In this paper, we build a network of listed companies in the Chinese stock market based on common shareholding data from 2003 to 2013. We analyze the evolution of topological characteristics of the network (e.g., average degree, diameter, average path length and clustering coefficient) with respect to the time sequence. Additionally, we consider the economic implications of topological characteristic changes on market volatility and use them to make future predictions. Our study finds that the network diameter significantly predicts volatility. After adding control variables used in traditional financial studies (volume, turnover and previous volatility), network topology still significantly influences volatility and improves the predictive ability of the model.
Novel methodologies for spectral classification of exon and intron sequences

NASA Astrophysics Data System (ADS)

Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

2012-12-01

Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
Molecular basis of length polymorphism in the human zeta-globin gene complex.

PubMed Central

Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J

1983-01-01

The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Predicting nuclear gene coalescence from mitochondrial data: the three-times rule.

PubMed

Palumbi, S R; Cipriano, F; Hare, M P

2001-05-01

Coalescence theory predicts when genetic drift at nuclear loci will result in fixation of sequence differences to produce monophyletic gene trees. However, the theory is difficult to apply to particular taxa because it hinges on genetically effective population size, which is generally unknown. Neutral theory also predicts that evolution of monophyly will be four times slower in nuclear than in mitochondrial genes primarily because genetic drift is slower at nuclear loci. Variation in mitochondrial DNA (mtDNA) within and between species has been studied extensively, but can these mtDNA data be used to predict coalescence in nuclear loci? Comparison of neutral theories of coalescence of mitochondrial and nuclear loci suggests a simple rule of thumb. The "three-times rule" states that, on average, most nuclear loci will be monophyletic when the branch length leading to the mtDNA sequences of a species is three times longer than the average mtDNA sequence diversity observed within that species. A test using mitochondrial and nuclear intron data from seven species of whales and dolphins suggests general agreement with predictions of the three-times rule. We define the coalescence ratio as the mitochondrial branch length for a species divided by intraspecific mtDNA diversity. We show that species with high coalescence ratios show nuclear monophyly, whereas species with low ratios have polyphyletic nuclear gene trees. As expected, species with intermediate coalescence ratios show a variety of patterns. Especially at very high or low coalescence ratios, the three-times rule predicts nuclear gene patterns that can help detect the action of selection. The three-times rule may be useful as an empirical benchmark for evaluating evolutionary processes occurring at multiple loci.
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

PubMed Central

2007-01-01

Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273
Variable length adjacent partitioning for PTS based PAPR reduction of OFDM signal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibraheem, Zeyid T.; Rahman, Md. Mijanur; Yaakob, S. N.

2015-05-15

Peak-to-Average power ratio (PAPR) is a major drawback in OFDM communication. It leads the power amplifier into nonlinear region operation resulting into loss of data integrity. As such, there is a strong motivation to find techniques to reduce PAPR. Partial Transmit Sequence (PTS) is an attractive scheme for this purpose. Judicious partitioning the OFDM data frame into disjoint subsets is a pivotal component of any PTS scheme. Out of the existing partitioning techniques, adjacent partitioning is characterized by an attractive trade-off between cost and performance. With an aim of determining effects of length variability of adjacent partitions, we performed anmore » investigation into the performances of a variable length adjacent partitioning (VL-AP) and fixed length adjacent partitioning in comparison with other partitioning schemes such as pseudorandom partitioning. Simulation results with different modulation and partitioning scenarios showed that fixed length adjacent partition had better performance compared to variable length adjacent partitioning. As expected, simulation results showed a slightly better performance of pseudorandom partitioning technique compared to fixed and variable adjacent partitioning schemes. However, as the pseudorandom technique incurs high computational complexities, adjacent partitioning schemes were still seen as favorable candidates for PAPR reduction.« less
Long-term impacts of human harvesting on shellfish: North Iberian top shells and limpets from the Upper Palaeolithic to the present

NASA Astrophysics Data System (ADS)

Turrero, Pablo; Muñoz-Colmenero, A. Marta; Prado, Andrea; Garcia-Vazquez, Eva

2014-11-01

Humans have contributed to phenotypic and demographic changes in their prey from very early on in the colonization of Europe, including the harvesting of shellfish in coastal ecosystems. We estimated trends in population growth (variation in the number of individuals) from DNA sequences of modern specimens in two North Iberian molluscs, top shells (Osilinus lineatus, from 24 sequences and 14 haplotypes) and limpets (Patella vulgata, taken from the bibliography), which were subjected to very different levels of harvesting pressure during the Upper Palaeolithic (~ 20000 to ~ 6000 years ago). The less harvested Osilinus top shells experienced fluctuations in population numbers coincident with climatic oscillations. Patella limpets, which were harvested in greater numbers, suffered clear and uninterrupted decreases in their numbers during the Upper Palaeolithic. These trends coincided with morphological changes in shell size (length or width) in the same direction (i.e., shell size decreased when population size decreased and vice versa). The differing trends seen in taxa subjected to different intensities of harvesting pressure suggest that climate effects were overcome by anthropogenic selection (leading to a smaller average length) in limpets. We suggest that intense fishing pressure may have induced irreversible shell length decreases in the most exploited species.
A High-Quality Reference Genome for the Invasive Mosquitofish Gambusia affinis Using a Chicago Library.

PubMed

Hoffberg, Sandra L; Troendle, Nicholas J; Glenn, Travis C; Mahmud, Ousman; Louha, Swarnali; Chalopin, Domitille; Bennetzen, Jeffrey L; Mauricio, Rodney

2018-04-27

The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species. Copyright © 2018, G3: Genes, Genomes, Genetics.
Systematic Characteristic Exploration of the Chimeras Generated in Multiple Displacement Amplification through Next Generation Sequencing Data Reanalysis

PubMed Central

Gao, Shen; Yao, Bei; Lu, Zuhong

2015-01-01

Background The chimeric sequences produced by phi29 DNA polymerase, which are named as chimeras, influence the performance of the multiple displacement amplification (MDA) and also increase the difficulty of sequence data process. Despite several articles have reported the existence of chimeric sequence, there was only one research focusing on the structure and generation mechanism of chimeras, and it was merely based on hundreds of chimeras found in the sequence data of E. coli genome. Method We finished data mining towards a series of Next Generation Sequencing (NGS) reads which were used for whole genome haplotype assembling in a primary study. We established a bioinformatics pipeline based on subsection alignment strategy to discover all the chimeras inside and achieve their structural visualization. Then, we artificially defined two statistical indexes (the chimeric distance and the overlap length), and their regular abundance distribution helped illustrate of the structural characteristics of the chimeras. Finally we analyzed the relationship between the chimera type and the average insertion size, so that illustrate a method to decrease the proportion of wasted data in the procedure of DNA library construction. Results/Conclusion 131.4 Gb pair-end (PE) sequence data was reanalyzed for the chimeras. Totally, 40,259,438 read pairs (6.19%) with chimerism were discovered among 650,430,811 read pairs. The chimeric sequences are consisted of two or more parts which locate inconsecutively but adjacently on the chromosome. The chimeric distance between the locations of adjacent parts on the chromosome followed an approximate bimodal distribution ranging from 0 to over 5,000 nt, whose peak was at about 250 to 300 nt. The overlap length of adjacent parts followed an approximate Poisson distribution and revealed a peak at 6 nt. Moreover, unmapped chimeras, which were classified as the wasted data, could be reduced by properly increasing the length of the insertion segment size through a linear correlation analysis. Significance This study exhibited the profile of the phi29MDA chimeras by tens of millions of chimeric sequences, and helped understand the amplification mechanism of the phi29 DNA polymerase. Our work also illustrated the importance of NGS data reanalysis, not only for the improvement of data utilization efficiency, but also for more potential genomic information. PMID:26440104
High-density genetic map construction and QTLs identification for plant height in white jute (Corchorus capsularis L.) using specific locus amplified fragment (SLAF) sequencing.

PubMed

Tao, Aifen; Huang, Long; Wu, Guifen; Afshar, Reza Keshavarz; Qi, Jianmin; Xu, Jiantang; Fang, Pingping; Lin, Lihui; Zhang, Liwu; Lin, Peiqing

2017-05-08

Genetic mapping and quantitative trait locus (QTL) detection are powerful methodologies in plant improvement and breeding. White jute (Corchorus capsularis L.) is an important industrial raw material fiber crop because of its elite characteristics. However, construction of a high-density genetic map and identification of QTLs has been limited in white jute due to a lack of sufficient molecular markers. The specific locus amplified fragment sequencing (SLAF-seq) strategy combines locus-specific amplification and high-throughput sequencing to carry out de novo single nuclear polymorphism (SNP) discovery and large-scale genotyping. In this study, SLAF-seq was employed to obtain sufficient markers to construct a high-density genetic map for white jute. Moreover, with the development of abundant markers, genetic dissection of fiber yield traits such as plant height was also possible. Here, we present QTLs associated with plant height that were identified using our newly constructed genetic linkage groups. An F 8 population consisting of 100 lines was developed. In total, 69,446 high-quality SLAFs were detected of which 5,074 SLAFs were polymorphic; 913 polymorphic markers were used for the construction of a genetic map. The average coverage for each SLAF marker was 43-fold in the parents, and 9.8-fold in each F 8 individual. A linkage map was constructed that contained 913 SLAFs on 11 linkage groups (LGs) covering 1621.4 cM with an average density of 1.61 cM per locus. Among the 11 LGs, LG1 was the largest with 210 markers, a length of 406.34 cM, and an average distance of 1.93 cM between adjacent markers. LG11 was the smallest with only 25 markers, a length of 29.66 cM, and an average distance of 1.19 cM between adjacent markers. 'SNP_only' markers accounted for 85.54% and were the predominant markers on the map. QTL mapping based on the F 8 phenotypes detected 11 plant height QTLs including one major effect QTL across two cultivation locations, with each QTL accounting for 4.14-15.63% of the phenotypic variance. To our knowledge, the linkage map constructed here is the densest one available to date for white jute. This analysis also identified the first QTL in white jute. The results will provide an important platform for gene/QTL mapping, sequence assembly, genome comparisons, and marker-assisted selection breeding for white jute.
[Cloning and sequence analysis of full-length cDNA of secoisolariciresinol dehydrogenase of Dysosma versipellis].

PubMed

Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen

2009-06-01

To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.

Morphological characters and DNA barcoding of Syngnathus schlegeli in the coastal waters of China

NASA Astrophysics Data System (ADS)

Chen, Zhi; Zhang, Yan; Han, Zhiqiang; Song, Na; Gao, Tianxiang

2018-03-01

A Syngnathus species widely distributed in Chinese seas was permanently identified as Syngnathus acus by native ichthyologists, but the taxonomic description about this species was inadequate and lacking conclusively molecular evidence. To identify this species, 357 individuals of this species from the coastal waters of Dandong, Yantai, Qingdao and Zhoushan were collected and measured. Morphological results showed that these slender specimens were mainly brownish, usually mottled with pale. Standard length ranged from 117 mm to 213 mm with an average length of 180.3 mm. The above characters were consistent with S. schlegeli distributed in Japan but colored differently from and much smaller than typical S. acus reported in Europe. Thus, morphological studies revealed that this species was previously misidentified as S. acus and might be S. schlegeli in reality. In addition, a fragment of cytochrome oxidase subunit I ( COI) gene of mitochondrial DNA was also sequenced for species identification, and 15 COI sequences belonging to different Syngnathus species were also used for the molecular identification. COI sequences of our specimens had the minimum genetic distance from recognized S. schlegeli from Japan and clustered with it firstly. The phylogenetic analysis similarly suggested that the species previously identified as S. acus in the coastal waters of China was S. schlegeli actually.
Magnetic bead purification of labeled DNA fragments forhigh-throughput capillary electrophoresis sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elkin, Christopher; Kapur, Hitesh; Smith, Troy

2001-09-15

We have developed an automated purification method for terminator sequencing products based on a magnetic bead technology. This 384-well protocol generates labeled DNA fragments that are essentially free of contaminates for less than $0.005 per reaction. In comparison to laborious ethanol precipitation protocols, this method increases the phred20 read length by forty bases with various DNA templates such as PCR fragments, Plasmids, Cosmids and RCA products. Our method eliminates centrifugation and is compatible with both the MegaBACE 1000 and ABIPrism 3700 capillary instruments. As of September 2001, this method has produced over 1.6 million samples with 93 percent averaging 620more » phred20 bases as part of Joint Genome Institutes Production Process.« less
Isoform Sequencing Provides a More Comprehensive View of the Panax ginseng Transcriptome.

PubMed

Jo, Ick-Hyun; Lee, Jinsu; Hong, Chi Eun; Lee, Dong Jin; Bae, Wonsil; Park, Sin-Gi; Ahn, Yong Ju; Kim, Young Chang; Kim, Jang Uk; Lee, Jung Woo; Hyun, Dong Yun; Rhee, Sung-Keun; Hong, Chang Pyo; Bang, Kyong Hwan; Ryu, Hojin

2017-09-15

Korean ginseng ( Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng , we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana . Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng . In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.
The de novo Transcriptome and Its Analysis in the Worldwide Vegetable Pest, Delia antiqua (Diptera: Anthomyiidae)

PubMed Central

Zhang, Yu-Juan; Hao, Youjin; Si, Fengling; Ren, Shuang; Hu, Ganyu; Shen, Li; Chen, Bin

2014-01-01

The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, diapause, and further pest control of D. antiqua. PMID:24615268
The presence of the ancestral insect telomeric motif in kissing bugs (Triatominae) rules out the hypothesis of its loss in evolutionarily advanced Heteroptera (Cimicomorpha)

PubMed Central

Pita, Sebastián; Panzera, Francisco; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Lorite, Pedro

2016-01-01

Abstract Next-generation sequencing data analysis on Triatoma infestans Klug, 1834 (Heteroptera, Cimicomorpha, Reduviidae) revealed the presence of the ancestral insect (TTAGG)n telomeric motif in its genome. Fluorescence in situ hybridization confirms that chromosomes bear this telomeric sequence in their chromosomal ends. Furthermore, motif amount estimation was about 0.03% of the total genome, so that the average telomere length in each chromosomal end is almost 18 kb long. We also detected the presence of (TTAGG)n telomeric repeat in mitotic and meiotic chromosomes in other three species of Triatominae: Triatoma dimidiata Latreille, 1811, Dipetalogaster maxima Uhler, 1894, and Rhodnius prolixus Ståhl, 1859. This is the first report of the (TTAGG)n telomeric repeat in the infraorder Cimicomorpha, contradicting the currently accepted hypothesis that evolutionarily recent heteropterans lack this ancestral insect telomeric sequence. PMID:27830050
Developmental Transcriptome Analysis and Identification of Genes Involved in Larval Metamorphosis of the Razor Clam, Sinonovacula constricta.

PubMed

Niu, Donghong; Wang, Fei; Xie, Shumei; Sun, Fanyue; Wang, Ze; Peng, Maoxiao; Li, Jiale

2016-04-01

The razor clam Sinonovacula constricta is an important commercial species. The deficiency of developmental transcriptomic data is becoming the bottleneck of further researches on the mechanisms underlying settlement and metamorphosis in early development. In this study, de novo transcriptome sequencing was performed for S. constricta at different early developmental stages by using Illumina HiSeq 2000 paired-end (PE) sequencing technology. A total of 112,209,077 PE clean reads were generated. De novo assembly generated 249,795 contigs with an average length of 585 bp. Gene annotation resulted in the identification of 22,870 unigene hits against the NCBI database. Eight unique sequences related to metamorphosis were identified and analyzed using real-time PCR. The razor clam reference transcriptome would provide useful information on early developmental and metamorphosis mechanisms and could be used in the genetic breeding of shellfish.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Stillman, A.E.; Wilke, N.; Li, D.

Our goal was to determine the feasibility of using an intravascular MR contrast agent to improve 3D MRA. Three-dimensional TOF MRA was performed in nine patients both prior to and following the administration of an ultrasmall particle superparamagnetic iron oxide contrast agent (AMI 227). The lengths of both renal arteries were measured from the maximum intensity projection (MIP) images as well as the individual partitions. Seven of these patients also were studied by a 3D coronary artery MRA sequence. Signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) measurements of the right coronary artery were determined both prior to and following themore » administration of AMI 227. Statistical analysis of both renal artery lengths and right coronary SNR and CNR was performed using a one tailed paired t test comparing pre- and postcontrast images. The renal artery lengths significantly increased (right renal artery: 30%, p = 0.001; left renal artery: 25%, p < 0.008) when measured from the individual axial slice partitions. No significant increase in length was observed on the MIP images following contrast. In the right coronary artery, the SNR increased by an average of 80% (p = 0.008) and CNR increased by an average of 109% (p = 0.007). Increased background signal and superimposed venous structures reduced the measurable lengths of the renal arteries from the MIP images. These studies support the hypothesis that 3D MRA in the body will benefit from the use of intravascular contrast agents. Nevertheless, conventional MIP processing is unable to reveal the full advantage of the contrast improvement. 14 refs., 6 figs., 2 tabs.« less
Sequence-Dependent Persistence Length of Long DNA

NASA Astrophysics Data System (ADS)

Chuang, Hui-Min; Reifenberger, Jeffrey G.; Cao, Han; Dorfman, Kevin D.

2017-12-01

Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm ×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.
Microsatellite Development for an Endangered Bream Megalobrama pellegrini (Teleostei, Cyprinidae) Using 454 Sequencing

PubMed Central

Wang, Jinjin; Yu, Xiaomu; Zhao, Kai; Zhang, Yaoguang; Tong, Jingou; Peng, Zuogang

2012-01-01

Megalobrama pellegrini is an endemic fish species found in the upper Yangtze River basin in China. This species has become endangered due to the construction of the Three Gorges Dam and overfishing. However, the available genetic data for this species is limited. Here, we developed 26 polymorphic microsatellite markers from the M. pellegrini genome using next-generation sequencing techniques. A total of 257,497 raw reads were obtained from a quarter-plate run on 454 GS-FLX titanium platforms and 49,811 unique sequences were generated with an average length of 404 bp; 24,522 (49.2%) sequences contained microsatellite repeats. Of the 53 loci screened, 33 were amplified successfully and 26 were polymorphic. The genetic diversity in M. pellegrini was moderate, with an average of 3.08 alleles per locus, and the mean observed and expected heterozygosity were 0.47 and 0.51, respectively. In addition, we tested cross-species amplification for all 33 loci in four additional breams: M. amblycephala, M. skolkovii, M. terminalis, and Sinibrama wui. The cross-species amplification showed a significant high level of transferability (79%–97%), which might be due to their dramatically close genetic relationships. The polymorphic microsatellites developed in the current study will not only contribute to further conservation genetic studies and parentage analyses of this endangered species, but also facilitate future work on the other closely related species. PMID:22489139
Investigation of modulation parameters in multiplexing gas chromatography.

PubMed

Trapp, Oliver

2010-10-22

Combination of information technology and separation sciences opens a new avenue to achieve high sample throughputs and therefore is of great interest to bypass bottlenecks in catalyst screening of parallelized reactors or using multitier well plates in reaction optimization. Multiplexing gas chromatography utilizes pseudo-random injection sequences derived from Hadamard matrices to perform rapid sample injections which gives a convoluted chromatogram containing the information of a single sample or of several samples with similar analyte composition. The conventional chromatogram is obtained by application of the Hadamard transform using the known injection sequence or in case of several samples an averaged transformed chromatogram is obtained which can be used in a Gauss-Jordan deconvolution procedure to obtain all single chromatograms of the individual samples. The performance of such a system depends on the modulation precision and on the parameters, e.g. the sequence length and modulation interval. Here we demonstrate the effects of the sequence length and modulation interval on the deconvoluted chromatogram, peak shapes and peak integration for sequences between 9-bit (511 elements) and 13-bit (8191 elements) and modulation intervals Δt between 5 s and 500 ms using a mixture of five components. It could be demonstrated that even for high-speed modulation at time intervals of 500 ms the chromatographic information is very well preserved and that the separation efficiency can be improved by very narrow sample injections. Furthermore this study shows that the relative peak areas in multiplexed chromatograms do not deviate from conventionally recorded chromatograms. Copyright © 2010 Elsevier B.V. All rights reserved.
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

DOEpatents

Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S

2013-06-25

A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.
Genome-Wide Single-Nucleotide Polymorphisms Discovery and High-Density Genetic Map Construction in Cauliflower Using Specific-Locus Amplified Fragment Sequencing

PubMed Central

Zhao, Zhenqing; Gu, Honghui; Sheng, Xiaoguang; Yu, Huifang; Wang, Jiansheng; Huang, Long; Wang, Dan

2016-01-01

Molecular markers and genetic maps play an important role in plant genomics and breeding studies. Cauliflower is an important and distinctive vegetable; however, very few molecular resources have been reported for this species. In this study, a novel, specific-locus amplified fragment (SLAF) sequencing strategy was employed for large-scale single nucleotide polymorphism (SNP) discovery and high-density genetic map construction in a double-haploid, segregating population of cauliflower. A total of 12.47 Gb raw data containing 77.92 M pair-end reads were obtained after processing and 6815 polymorphic SLAFs between the two parents were detected. The average sequencing depths reached 52.66-fold for the female parent and 49.35-fold for the male parent. Subsequently, these polymorphic SLAFs were used to genotype the population and further filtered based on several criteria to construct a genetic linkage map of cauliflower. Finally, 1776 high-quality SLAF markers, including 2741 SNPs, constituted the linkage map with average data integrity of 95.68%. The final map spanned a total genetic length of 890.01 cM with an average marker interval of 0.50 cM, and covered 364.9 Mb of the reference genome. The markers and genetic map developed in this study could provide an important foundation not only for comparative genomics studies within Brassica oleracea species but also for quantitative trait loci identification and molecular breeding of cauliflower. PMID:27047515
An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile.

PubMed

Prakash, Celine; Haeseler, Arndt Von

2017-03-01

RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile

PubMed Central

Haeseler, Arndt Von

2017-01-01

Abstract RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment. PMID:27661099
Toward an Accurate Theoretical Framework for Describing Ensembles for Proteins under Strongly Denaturing Conditions

PubMed Central

Tran, Hoang T.; Pappu, Rohit V.

2006-01-01

Our focus is on an appropriate theoretical framework for describing highly denatured proteins. In high concentrations of denaturants, proteins behave like polymers in a good solvent and ensembles for denatured proteins can be modeled by ignoring all interactions except excluded volume (EV) effects. To assay conformational preferences of highly denatured proteins, we quantify a variety of properties for EV-limit ensembles of 23 two-state proteins. We find that modeled denatured proteins can be best described as follows. Average shapes are consistent with prolate ellipsoids. Ensembles are characterized by large correlated fluctuations. Sequence-specific conformational preferences are restricted to local length scales that span five to nine residues. Beyond local length scales, chain properties follow well-defined power laws that are expected for generic polymers in the EV limit. The average available volume is filled inefficiently, and cavities of all sizes are found within the interiors of denatured proteins. All properties characterized from simulated ensembles match predictions from rigorous field theories. We use our results to resolve between conflicting proposals for structure in ensembles for highly denatured states. PMID:16766618
Phylogenetic tree construction using trinucleotide usage profile (TUP).

PubMed

Chen, Si; Deng, Lih-Yuan; Bowman, Dale; Shiau, Jyh-Jen Horng; Wong, Tit-Yee; Madahian, Behrouz; Lu, Henry Horng-Shing

2016-10-06

It has been a challenging task to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences. The most popular method, called feature frequency profile (FFP-k), finds the frequency distribution for all words of certain length k over the whole genome sequence using (overlapping) windows of the same length. For a satisfactory result, the recommended word length (k) ranges from 6 to 15 and it may not be a multiple of 3 (codon length). The total number of possible words needed for FFP-k can range from 4 6 =4096 to 4 15 . We propose a simple improvement over the popular FFP method using only a typical word length of 3. A new method, called Trinucleotide Usage Profile (TUP), is proposed based only on the (relative) frequency distribution using non-overlapping windows of length 3. The total number of possible words needed for TUP is 4 3 =64, which is much less than the total count for the recommended optimal "resolution" for FFP. To build a phylogenetic tree, we propose first representing each of the species by a TUP vector and then using an appropriate distance measure between pairs of the TUP vectors for the tree construction. In particular, we propose summarizing a DNA sequence by a matrix of three rows corresponding to three reading frames, recording the frequency distribution of the non-overlapping words of length 3 in each of the reading frame. We also provide a numerical measure for comparing trees constructed with various methods. Compared to the FFP method, our empirical study showed that the proposed TUP method is more capable of building phylogenetic trees with a stronger biological support. We further provide some justifications on this from the information theory viewpoint. Unlike the FFP method, the TUP method takes the advantage that the starting of the first reading frame is (usually) known. Without this information, the FFP method could only rely on the frequency distribution of overlapping words, which is the average (or mixture) of the frequency distributions of three possible reading frames. Consequently, we show (from the entropy viewpoint) that the FFP procedure could dilute important gene information and therefore provides less accurate classification.
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach

PubMed Central

Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.

2017-01-01

Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

PubMed

Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B

2017-03-01

Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.
Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implications.

PubMed

Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong

2009-03-31

The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our molecular trees support Ohwi's original treatment of Megaleranthis saniculiforia to Trollius chosenensis Ohwi.
Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections.

PubMed

Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe; Avarre, Jean-Christophe

2016-01-01

Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×10 7 . The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.

Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections

PubMed Central

Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe

2016-01-01

Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3. PMID:27703859
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

PubMed

Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

2017-10-06

Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Novel dicistrovirus from bat guano.

PubMed

Reuter, Gábor; Pankovics, Péter; Gyöngyi, Zoltán; Delwart, Eric; Boros, Akos

2014-12-01

A novel dicistrovirus (strain NB-1/2011/HUN, KJ802403) genome was detected from guano collected from an insectivorous bat (species Pipistrellus pipistrellus) in Hungary, using viral metagenomics. The complete genome of NB-1 is 9136 nt in length, excluding the poly(A) tail. NB-1 has a genome organization typical of a dicistrovirus with multiple 3B(VPg) and a cripavirus-like intergenic region (IGR)-IRES. NB-1 shares only 41 % average amino acid sequence identity with capsid proteins of Himetobi P virus, indicating a potential novel species in the genus Cripavirus, family Dicistroviridae.
Analysis of copy number variations in Holstein-Friesian cow genomes based on whole-genome sequence data.

PubMed

Mielczarek, M; Frąszczak, M; Giannico, R; Minozzi, G; Williams, John L; Wojdak-Maksymiec, K; Szyda, J

2017-07-01

Thirty-two whole genome DNA sequences of cows were analyzed to evaluate inter-individual variability in the distribution and length of copy number variations (CNV) and to functionally annotate CNV breakpoints. The total number of deletions per individual varied between 9,731 and 15,051, whereas the number of duplications was between 1,694 and 5,187. Most of the deletions (81%) and duplications (86%) were unique to a single cow. No relation between the pattern of variant sharing and a family relationship or disease status was found. The animal-averaged length of deletions was from 5,234 to 9,145 bp and the average length of duplications was between 7,254 and 8,843 bp. Highly significant inter-individual variation in length and number of CNV was detected for both deletions and duplications. The majority of deletion and duplication breakpoints were located in intergenic regions and introns, whereas fewer were identified in noncoding transcripts and splice regions. Only 1.35 and 0.79% of the deletion and duplication breakpoints were observed within coding regions. A gene with the highest number of deletion breakpoints codes for protein kinase cGMP-dependent type I, whereas the T-cell receptor α constant gene had the most duplication breakpoints. The functional annotation of genes with the largest incidence of deletion/duplication breakpoints identified 87/112 Kyoto Encyclopedia of Genes and Genomes pathways, but none of the pathways were significantly enriched or depleted with breakpoints. The analysis of Gene Ontology (GO) terms revealed that a cluster with the highest enrichment score among genes with many deletion breakpoints was represented by GO terms related to ion transport, whereas the GO term cluster mostly enriched among the genes with many duplication breakpoints was related to binding of macromolecules. Furthermore, when considering the number of deletion breakpoints per gene functional category, no significant differences were observed between the "housekeeping" and "strong selection" categories, but genes representing the "low selection pressure" group showed a significantly higher number of breakpoints. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
The randomized benchmarking number is not what you think it is

NASA Astrophysics Data System (ADS)

Proctor, Timothy; Rudinger, Kenneth; Blume-Kohout, Robin; Sarovar, Mohan; Young, Kevin

Randomized benchmarking (RB) is a widely used technique for characterizing a gate set, whereby random sequences of gates are used to probe the average behavior of the gate set. The gates are chosen to ideally compose to the identity, and the rate of decay in the survival probability of an initial state with increasing length sequences is extracted from a set of experiments - this is the `RB number'. For reasonably well-behaved noise and particular gate sets, it has been claimed that the RB number is a reliable estimate of the average gate fidelity (AGF) of each noisy gate to the ideal target unitary, averaged over all gates in the set. Contrary to this widely held view, we show that this is not the case. We show that there are physically relevant situations, in which RB was thought to be provably reliable, where the RB number is many orders of magnitude away from the AGF. These results have important implications for interpreting the RB protocol, and immediate consequences for many advanced RB techniques. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

PubMed Central

Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

2013-01-01

In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105
Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

PubMed

Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

2013-05-24

In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.
High-resolution picture of a venom gland transcriptome: case study with the marine snail Conus consors.

PubMed

Terrat, Yves; Biass, Daniel; Dutertre, Sébastien; Favreau, Philippe; Remm, Maido; Stöcklin, Reto; Piquemal, David; Ducancel, Frédéric

2012-01-01

Although cone snail venoms have been intensively investigated in the past few decades, little is known about the whole conopeptide and protein content in venom ducts, especially at the transcriptomic level. If most of the previous studies focusing on a limited number of sequences have contributed to a better understanding of conopeptide superfamilies, they did not give access to a complete panorama of a whole venom duct. Additionally, rare transcripts were usually not identified due to sampling effect. This work presents the data and analysis of a large number of sequences obtained from high throughput 454 sequencing technology using venom ducts of Conus consors, an Indo-Pacific living piscivorous cone snail. A total of 213,561 Expressed Sequence Tags (ESTs) with an average read length of 218 base pairs (bp) have been obtained. These reads were assembled into 65,536 contiguous DNA sequences (contigs) then into 5039 clusters. The data revealed 11 conopeptide superfamilies representing a total of 53 new isoforms (full length or nearly full-length sequences). Considerable isoform diversity and major differences in transcription level could be noted between superfamilies. A, O and M superfamilies are the most diverse. The A family isoforms account for more than 70% of the conopeptide cocktail (considering all ESTs before clustering step). In addition to traditional superfamilies and families, minor transcripts including both cysteine free and cysteine-rich peptides could be detected, some of them figuring new clades of conopeptides. Finally, several sets of transcripts corresponding to proteins commonly recruited in venom function could be identified for the first time in cone snail venom duct. This work provides one of the first large-scale EST project for a cone snail venom duct using next-generation sequencing, allowing a detailed overview of the venom duct transcripts. This leads to an expanded definition of the overall cone snail venom duct transcriptomic activity, which goes beyond the cysteine-rich conopeptides. For instance, this study enabled to detect proteins involved in common post-translational maturation and folding, and to reveal compounds classically involved in hemolysis and mechanical penetration of the venom into the prey. Further comparison with proteomic and genomic data will lead to a better understanding of conopeptides diversity and the underlying mechanisms involved in conopeptide evolution. Copyright © 2011 Elsevier Ltd. All rights reserved.
First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

PubMed

Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

2015-10-01

Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations.

PubMed

Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis

2016-08-24

To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.
De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing

PubMed Central

2011-01-01

Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition. PMID:21492485
De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing.

PubMed

Natarajan, Purushothaman; Parani, Madasamy

2011-04-15

Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition.
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

PubMed Central

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-01-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

PubMed

Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

2014-01-01

Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the evolutionary history of this medically important virus.
Effective de novo assembly of fish genome using haploid larvae.

PubMed

Iwasaki, Yuki; Nishiki, Issei; Nakamura, Yoji; Yasuike, Motoshige; Kai, Wataru; Nomura, Kazuharu; Yoshida, Kazunori; Nomura, Yousuke; Fujiwara, Atushi; Kobayashi, Takanori; Ototake, Mitsuru

2016-02-01

Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
RNA Sequencing Analysis of the Gametophyte Transcriptome from the Liverwort, Marchantia polymorpha

PubMed Central

Sharma, Niharika; Jung, Chol-Hee; Bhalla, Prem L.; Singh, Mohan B.

2014-01-01

The liverwort Marchantia polymorpha is a member of the most basal lineage of land plants (embryophytes) and likely retains many ancestral morphological, physiological and molecular characteristics. Despite its phylogenetic importance and the availability of previous EST studies, M. polymorpha’s lack of economic importance limits accessible genomic resources for this species. We employed Illumina RNA-Seq technology to sequence the gametophyte transcriptome of M. polymorpha. cDNA libraries from 6 different male and female developmental tissues were sequenced to delineate a global view of the M. polymorpha transcriptome. Approximately 80 million short reads were obtained and assembled into a non-redundant set of 46,533 transcripts (> = 200 bp) from 46,070 loci. The average length and the N50 length of the transcripts were 757 bp and 471 bp, respectively. Sequence comparison of assembled transcripts with non-redundant proteins from embryophytes resulted in the annotation of 43% of the transcripts. The transcripts were also compared with M. polymorpha expressed sequence tags (ESTs), and approximately 69.5% of the transcripts appeared to be novel. Twenty-one percent of the transcripts were assigned GO terms to improve annotation. In addition, 6,112 simple sequence repeats (SSRs) were identified as potential molecular markers, which may be useful in studies of genetic diversity. A comparative genomics approach revealed that a substantial proportion of the genes (35.5%) expressed in M. polymorpha were conserved across phylogenetically related species, such as Selaginella and Physcomitrella, and identified 580 genes that are potentially unique to liverworts. Our study presents an extensive amount of novel sequence information for M. polymorpha. This information will serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the isolation and characterization of functional genes that are involved in sex differentiation and sexual reproduction in this liverwort. PMID:24841988
A High-Density Genetic Map of Tetraploid Salix matsudana Using Specific Length Amplified Fragment Sequencing (SLAF-seq)

PubMed Central

Li, Min; Li, Yujuan; Wang, Ying; Ma, Xiangjian; Zhang, Yuan; Tan, Feng; Wu, Rongling

2016-01-01

As a salt-tolerant arbor tree species, Salix matsudana plays an important role in afforestation and greening in the coastal areas of China. To select superior Salix varieties that adapt to wide saline areas, it is of paramount importance to understand and identify the mechanisms of salt-tolerance at the level of the whole genome. Here, we describe a high-density genetic linkage map of S. matsudana that represents a good coverage of the Salix genome. An intraspecific F1 hybrid population was established by crossing the salt-sensitive “Yanjiang” variety as the female parent with the salt-tolerant “9901” variety as the male parent. This population, along with its parents, was genotyped by specific length amplified fragment sequencing (SLAF-seq), leading to 277,333 high-quality SLAF markers. By marker analysis, we found that both the parents and offspring were tetraploid. The mean sequencing depth was 53.20-fold for “Yanjiang”, 47.41-fold for “9901”, and 11.02-fold for the offspring. Of the SLAF markers detected, 42,321 are polymorphic with sufficient quality for map construction. The final genetic map was constructed using 6,737 SLAF markers, covering 38 linkage groups (LGs). The genetic map spanned 5,497.45 cM in length, with an average distance of 0.82 cM. As a first high-density genetic map of S. matsudana constructed from salt tolerance-varying varieties, this study will provide a foundation for mapping quantitative trait loci that modulate salt tolerance and resistance in Salix and provide important references for molecular breeding of this important forest tree. PMID:27327501
Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

PubMed Central

Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

2015-01-01

There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

PubMed Central

Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

2011-01-01

To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533
Understanding Adherence and Prescription Patterns Using Large-Scale Claims Data.

PubMed

Bjarnadóttir, Margrét V; Malik, Sana; Onukwugha, Eberechukwu; Gooden, Tanisha; Plaisant, Catherine

2016-02-01

Advanced computing capabilities and novel visual analytics tools now allow us to move beyond the traditional cross-sectional summaries to analyze longitudinal prescription patterns and the impact of study design decisions. For example, design decisions regarding gaps and overlaps in prescription fill data are necessary for measuring adherence using prescription claims data. However, little is known regarding the impact of these decisions on measures of medication possession (e.g., medication possession ratio). The goal of the study was to demonstrate the use of visualization tools for pattern discovery, hypothesis generation, and study design. We utilized EventFlow, a novel discrete event sequence visualization software, to investigate patterns of prescription fills, including gaps and overlaps, utilizing large-scale healthcare claims data. The study analyzes data of individuals who had at least two prescriptions for one of five hypertension medication classes: ACE inhibitors, angiotensin II receptor blockers, beta blockers, calcium channel blockers, and diuretics. We focused on those members initiating therapy with diuretics (19.2%) who may have concurrently or subsequently take drugs in other classes as well. We identified longitudinal patterns in prescription fills for antihypertensive medications, investigated the implications of decisions regarding gap length and overlaps, and examined the impact on the average cost and adherence of the initial treatment episode. A total of 790,609 individuals are included in the study sample, 19.2% (N = 151,566) of whom started on diuretics first during the study period. The average age was 52.4 years and 53.1% of the population was female. When the allowable gap was zero, 34% of the population had continuous coverage and the average length of continuous coverage was 2 months. In contrast, when the allowable gap was 30 days, 69% of the population showed a single continuous prescription period with an average length of 5 months. The average prescription cost of the period of continuous coverage ranged from US$3.44 (when the maximum gap was 0 day) to US$9.08 (when the maximum gap was 30 days). Results were less impactful when considering overlaps. This proof-of-concept study illustrates the use of visual analytics tools in characterizing longitudinal medication possession. We find that prescription patterns and associated prescription costs are more influenced by allowable gap lengths than by definitions and treatment of overlap. Research using medication gaps and overlaps to define medication possession in prescription claims data should pay particular attention to the definition and use of gap lengths.

Predicting protein crystallization propensity from protein sequence

PubMed Central

2011-01-01

The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor. PMID:20177794
[Identification of medicinal plant Dendrobium based on the chloroplast psbK-psbI intergenic spacer].

PubMed

Yao, Hui; Yang, Pei; Zhou, Hong; Ma, Shuang-jiao; Song, Jing-yuan; Chen, Shi-lin

2015-06-01

In this paper, the chloroplast psbK-psbI intergenic spacers of 18 species of Dendrobium and their adulterants were amplified and sequenced, and then the sequence characteristics were analyzed. The sequence lengths of chloroplast psbK-psbI regions of Dendrobium ranged from 474 to 513 bp and the GC contents were 25.4%-27.6%. The variable sites were 71 while the informative sites were 46. The inter-specific genetic distances calculated by Kimura 2-parameter (K2P) of Dendrobium were 0.006 1-0.058 1, with an average of 0.028 4. The K2P genetic distances between Dendrobium species and Bulbophyllum odoratissimum were 0.093 2-0.120 4. The NJ tree showed that the Dendrobium species can be easily differentiated from each other and 6 samples of the inspected Dendrobium species were identified successfully through sequencing the psbK-psbI intergenic spacer. Therefore, the chloroplast psbK-psbI intergenic spacer can be used as a candidate marker to identify Dendrobium species and its adulterants.
Geologic Inheritance and Earthquake Rupture Processes: The 1905 M ≥ 8 Tsetserleg-Bulnay Strike-Slip Earthquake Sequence, Mongolia

NASA Astrophysics Data System (ADS)

Choi, Jin-Hyuck; Klinger, Yann; Ferry, Matthieu; Ritz, Jean-François; Kurtz, Robin; Rizza, Magali; Bollinger, Laurent; Davaasambuu, Battogtokh; Tsend-Ayush, Nyambayar; Demberel, Sodnomsambuu

2018-02-01

In 1905, 14 days apart, two M 8 continental strike-slip earthquakes, the Tsetserleg and Bulnay earthquakes, occurred on the Bulnay fault system, in Mongolia. Together, they ruptured four individual faults, with a total length of 676 km. Using submetric optical satellite images "Pleiades" with ground resolution of 0.5 m, complemented by field observation, we mapped in detail the entire surface rupture associated with this earthquake sequence. Surface rupture along the main Bulnay fault is 388 km in length, striking nearly E-W. The rupture is formed by a series of fault segments that are 29 km long on average, separated by geometric discontinuities. Although there is a difference of about 2 m in the average slip between the western and eastern parts of the Bulnay rupture, along-fault slip variations are overall limited, resulting in a smooth slip distribution, except for local slip deficit at segment boundaries. We show that damage, including short branches and secondary faulting, associated with the rupture propagation, occurred significantly more often along the western part of the Bulnay rupture, while the eastern part of the rupture appears more localized and thus possibly structurally simpler. Eventually, the difference of slip between the western and eastern parts of the rupture is attributed to this difference of rupture localization, associated at first order with a lateral change in the local geology. Damage associated to rupture branching appears to be located asymmetrically along the extensional side of the strike-slip rupture and shows a strong dependence on structural geologic inheritance.
A Consensus Genetic Map for Pinus taeda and Pinus elliottii and Extent of Linkage Disequilibrium in Two Genotype-Phenotype Discovery Populations of Pinus taeda

PubMed Central

Westbrook, Jared W.; Chhatre, Vikram E.; Wu, Le-Shin; Chamala, Srikar; Neves, Leandro Gomide; Muñoz, Patricio; Martínez-García, Pedro J.; Neale, David B.; Kirst, Matias; Mockaitis, Keithanne; Nelson, C. Dana; Peter, Gary F.; Echt, Craig S.

2015-01-01

A consensus genetic map for Pinus taeda (loblolly pine) and Pinus elliottii (slash pine) was constructed by merging three previously published P. taeda maps with a map from a pseudo-backcross between P. elliottii and P. taeda. The consensus map positioned 3856 markers via genotyping of 1251 individuals from four pedigrees. It is the densest linkage map for a conifer to date. Average marker spacing was 0.6 cM and total map length was 2305 cM. Functional predictions of mapped genes were improved by aligning expressed sequence tags used for marker discovery to full-length P. taeda transcripts. Alignments to the P. taeda genome mapped 3305 scaffold sequences onto 12 linkage groups. The consensus genetic map was used to compare the genome-wide linkage disequilibrium in a population of distantly related P. taeda individuals (ADEPT2) used for association genetic studies and a multiple-family pedigree used for genomic selection (CCLONES). The prevalence and extent of LD was greater in CCLONES as compared to ADEPT2; however, extended LD with LGs or between LGs was rare in both populations. The average squared correlations, r2, between SNP alleles less than 1 cM apart were less than 0.05 in both populations and r2 did not decay substantially with genetic distance. The consensus map and analysis of linkage disequilibrium establish a foundation for comparative association mapping and genomic selection in P. taeda and P. elliottii. PMID:26068575
A meiotic linkage map of the silver fox, aligned and compared to the canine genome.

PubMed

Kukekova, Anna V; Trut, Lyudmila N; Oskina, Irina N; Johnson, Jennifer L; Temnykh, Svetlana V; Kharlamova, Anastasiya V; Shepeleva, Darya V; Gulievich, Rimma G; Shikhevich, Svetlana G; Graphodatsky, Alexander S; Aguirre, Gustavo D; Acland, Gregory M

2007-03-01

A meiotic linkage map is essential for mapping traits of interest and is often the first step toward understanding a cryptic genome. Specific strains of silver fox (a variant of the red fox, Vulpes vulpes), which segregate behavioral and morphological phenotypes, create a need for such a map. One such strain, selected for docility, exhibits friendly dog-like responses to humans, in contrast to another strain selected for aggression. Development of a fox map is facilitated by the known cytogenetic homologies between the dog and fox, and by the availability of high resolution canine genome maps and sequence data. Furthermore, the high genomic sequence identity between dog and fox allows adaptation of canine microsatellites for genotyping and meiotic mapping in foxes. Using 320 such markers, we have constructed the first meiotic linkage map of the fox genome. The resulting sex-averaged map covers 16 fox autosomes and the X chromosome with an average inter-marker distance of 7.5 cM. The total map length corresponds to 1480.2 cM. From comparison of sex-averaged meiotic linkage maps of the fox and dog genomes, suppression of recombination in pericentromeric regions of the metacentric fox chromosomes was apparent, relative to the corresponding segments of acrocentric dog chromosomes. Alignment of the fox meiotic map against the 7.6x canine genome sequence revealed high conservation of marker order between homologous regions of the two species. The fox meiotic map provides a critical tool for genetic studies in foxes and identification of genetic loci and genes implicated in fox domestication.
Coiled-coil length: Size does matter.

PubMed

Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B

2015-12-01

Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.
Transcriptome analysis of Capsicum annuum varieties Mandarin and Blackcluster: assembly, annotation and molecular marker discovery.

PubMed

Ahn, Yul-Kyun; Tripathi, Swati; Kim, Jeong-Ho; Cho, Young-Il; Lee, Hye-Eun; Kim, Do-Sun; Woo, Jong-Gyu; Cho, Myeong-Cheoul

2014-01-10

Next generation sequencing technologies have proven to be a rapid and cost-effective means to assemble and characterize gene content and identify molecular markers in various organisms. Pepper (Capsicum annuum L., Solanaceae) is a major staple vegetable crop, which is economically important and has worldwide distribution. High-throughput transcriptome profiling of two pepper cultivars, Mandarin and Blackcluster, using 454 GS-FLX pyrosequencing yielded 279,221 and 316,357 sequenced reads with a total 120.44 and 142.54Mb of sequence data (average read length of 431 and 450 nucleotides). These reads resulted from 17,525 and 16,341 'isogroups' and were assembled into 19,388 and 18,057 isotigs, and 22,217 and 13,153 singletons for both the cultivars, respectively. Assembled sequences were annotated functionally based on homology to genes in multiple public databases. Detailed sequence variant analysis identified a total of 9701 and 12,741 potential SNPs which eventually resulted in 1025 and 1059 genotype specific SNPs, for both the varieties, respectively, after examining SNP frequency distribution for each mapped unigenes. These markers for pepper will be highly valuable for marker-assisted breeding and other genetic studies. © 2013 Elsevier B.V. All rights reserved.
Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

PubMed Central

Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

2016-01-01

Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153
Metataxonomics reveal vultures as a reservoir for Clostridium perfringens.

PubMed

Meng, Xiangli; Lu, Shan; Yang, Jing; Jin, Dong; Wang, Xiaohong; Bai, Xiangning; Wen, Yumeng; Wang, Yiting; Niu, Lina; Ye, Changyun; Rosselló-Móra, Ramon; Xu, Jianguo

2017-02-22

The Old World vulture may carry and spread pathogens for emerging infections since they feed on the carcasses of dead animals and participate in the sky burials of humans, some of whom have died from communicable diseases. Therefore, we studied the precise fecal microbiome of the Old World vulture with metataxonomics, integrating the high-throughput sequencing of almost full-length small subunit ribosomal RNA (16S rRNA) gene amplicons in tandem with the operational phylogenetic unit (OPU) analysis strategy. Nine vultures of three species were sampled using rectal swabs on the Qinghai-Tibet Plateau, China. Using the Pacific Biosciences sequencing platform, we obtained 54 135 high-quality reads of 16S rRNA amplicons with an average of 1442±6.9 bp in length and 6015±1058 reads per vulture. Those sequences were classified into 314 OPUs, including 102 known species, 50 yet to be described species and 161 unknown new lineages of uncultured representatives. Forty-five species have been reported to be responsible for human outbreaks or infections, and 23 yet to be described species belong to genera that include pathogenic species. Only six species were common to all vultures. Clostridium perfringens was the most abundant and present in all vultures, accounting for 30.8% of the total reads. Therefore, using the new technology, we found that vultures are an important reservoir for C. perfringens as evidenced by the isolation of 107 strains encoding for virulence genes, representing 45 sequence types. Our study suggests that the soil-related C. perfringens and other pathogens could have a reservoir in vultures and other animals.
Metataxonomics reveal vultures as a reservoir for Clostridium perfringens

PubMed Central

Meng, Xiangli; Lu, Shan; Yang, Jing; Jin, Dong; Wang, Xiaohong; Bai, Xiangning; Wen, Yumeng; Wang, Yiting; Niu, Lina; Ye, Changyun; Rosselló-Móra, Ramon; Xu, Jianguo

2017-01-01

The Old World vulture may carry and spread pathogens for emerging infections since they feed on the carcasses of dead animals and participate in the sky burials of humans, some of whom have died from communicable diseases. Therefore, we studied the precise fecal microbiome of the Old World vulture with metataxonomics, integrating the high-throughput sequencing of almost full-length small subunit ribosomal RNA (16S rRNA) gene amplicons in tandem with the operational phylogenetic unit (OPU) analysis strategy. Nine vultures of three species were sampled using rectal swabs on the Qinghai-Tibet Plateau, China. Using the Pacific Biosciences sequencing platform, we obtained 54 135 high-quality reads of 16S rRNA amplicons with an average of 1442±6.9 bp in length and 6015±1058 reads per vulture. Those sequences were classified into 314 OPUs, including 102 known species, 50 yet to be described species and 161 unknown new lineages of uncultured representatives. Forty-five species have been reported to be responsible for human outbreaks or infections, and 23 yet to be described species belong to genera that include pathogenic species. Only six species were common to all vultures. Clostridium perfringens was the most abundant and present in all vultures, accounting for 30.8% of the total reads. Therefore, using the new technology, we found that vultures are an important reservoir for C. perfringens as evidenced by the isolation of 107 strains encoding for virulence genes, representing 45 sequence types. Our study suggests that the soil-related C. perfringens and other pathogens could have a reservoir in vultures and other animals. PMID:28223683
The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction

PubMed Central

2012-01-01

Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp. PMID:22533342
The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction.

PubMed

Garzón-Martínez, Gina A; Zhu, Z Iris; Landsman, David; Barrero, Luz S; Mariño-Ramírez, Leonardo

2012-04-25

Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI's BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp.
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Next-generation sequencing of the yellowfin tuna mitochondrial genome reveals novel phylogenetic relationships within the genus Thunnus.

PubMed

Guo, Liang; Li, Mingming; Zhang, Heng; Yang, Sen; Chen, Xinghan; Meng, Zining; Lin, Haoran

2016-05-01

Recently, the next-generation sequencing (NGS) technology has become a powerful tool for sequencing the teleost mitochondrial genome (mitogenome). Here, we used this technology to determine the mitogenome of the yellowfin tuna (Thunnus albacares). A total of 41,378 reads were generated by Illumina platform with an average depth of 250×. The mitogenome (16,528 bp in length) contained 37 mitochondrial genes with the similar gene order to other typical teleosts. These mitochondrial genes were encoded on the heavy strand except for ND6 and eight tRNA genes. The result of phylogenetic analysis supported two distinct clades dividing the genus Thunnus, but the tuna species of these two genetic clades were different from that of two recognized subgenus based on anatomical characters and geographical distribution. Our results might help to understand the structure, function, and evolutionary history of the yellowfin tuna mitogenome and also provide valuable new insights for phylogenetic affinity of tuna species.
The de novo transcriptome and its analysis in the worldwide vegetable pest, Delia antiqua (Diptera: Anthomyiidae).

PubMed

Zhang, Yu-Juan; Hao, Youjin; Si, Fengling; Ren, Shuang; Hu, Ganyu; Shen, Li; Chen, Bin

2014-03-10

The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, diapause, and further pest control of D. antiqua. Copyright © 2014 Zhang et al.
An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

PubMed

Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L

2017-01-01

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Author 2017. Published by Oxford University Press.
The complete mitochondrial genome of the desert darkling beetle Asbolus verrucosus (Coleoptera, Tenebrionidae).

PubMed

Rider, Stanley Dean

2016-07-01

The complete mitochondrial genome of the desert darkling beetle Asbolus verrucosus (LeConte, 1851) was sequenced using paired-end technology to an average depth of 42,111× and assembled using De Bruijn graph-based methods. The genome is 15,828 bp in length and conforms to the basal arthropod mitochondrial gene composition with the same gene orders and orientations as other darkling beetle mitochondria. This arrangement includes a control region, 22 tRNA genes, 2 rRNA genes and 13 protein-coding genes. The main coding strand is probably replicated as the lagging strand (GC skew of -0.36 and AT skew of +0.19). Phylogenomics analyses are consistent with taxonomic classifications and indicate that Tenebrio molitor is the closest relative that has a completely sequenced mitochondrial genome available for analysis. This is the first fully assembled mitogenome sequence for a darkling beetle in the subfamily Pimeliinae and will be useful for population studies on members of this ecologically important group of beetles.
Chloroplast microsatellite markers for Artocarpus (Moraceae) developed from transcriptome sequences1

PubMed Central

Gardner, Elliot M.; Laricchia, Kristen M.; Murphy, Matthew; Ragone, Diane; Scheffler, Brian E.; Simpson, Sheron; Williams, Evelyn W.; Zerega, Nyree J. C.

2015-01-01

Premise of the study: Chloroplast microsatellite loci were characterized from transcriptomes of Artocarpus altilis (breadfruit) and A. camansi (breadnut). They were tested in A. odoratissimus (terap) and A. altilis and evaluated in silico for two congeners. Methods and Results: Fifteen simple sequence repeats (SSRs) were identified in chloroplast sequences from four Artocarpus transcriptome assemblies. The markers were evaluated using capillary electrophoresis in A. odoratissimus (105 accessions) and A. altilis (73). They were also evaluated in silico in A. altilis (10), A. camansi (6), and A. altilis × A. mariannensis (7) transcriptomes. All loci were polymorphic in at least one species, with all 15 polymorphic in A. camansi. Per species, average alleles per locus ranged between 2.2 and 2.5. Three loci had evidence of fragment-length homoplasy. Conclusions: These markers will complement existing nuclear markers by enabling confident identification of maternal and clone lines, which are often important in vegetatively propagated crops such as breadfruit. PMID:26421253
Erratum to: An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

PubMed

Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L

2017-10-01

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Authors 2017. Published by Oxford University Press.
The association between childhood obesity and tooth eruption.

PubMed

Must, Aviva; Phillips, Sarah M; Tybor, David J; Lividini, Keith; Hayes, Catherine

2012-10-01

Obesity is a growth-promoting process as evidenced by its effect on the timing of puberty. Although studies are limited, obesity has been shown to affect the timing of tooth eruption. Both the timing and sequence of tooth eruption are important to overall oral health. The purpose of this study was to examine the association between obesity and tooth eruption. Data were combined from three consecutive cycles (2001-2006) of the National Health and Nutrition Examination Survey (NHANES) and analyzed to examine associations between the number of teeth erupted (NET) and obesity status (BMI z-score >95th percentile BMI relative to the Centers for Disease Control and Prevention (CDC) growth reference) among children 5 up to 14 years of age, controlling for potential confounding by age, gender, race, and socioeconomic status (SES). Obesity is significantly associated with having a higher average NET during the mixed dentition period. On average, teeth of obese children erupted earlier than nonobese children with obese children having on average 1.44 more teeth erupted than nonobese children, after adjusting for age, gender, and race/ethnicity (P < 0.0001). SES was not a confounder of the observed associations. Obese children, on average, have significantly more teeth erupted than nonobese children after adjusting for gender, age, and race. These findings may have clinical importance in the area of dental and orthodontic medicine both in terms of risk for dental caries due to extended length of time exposed in the oral cavity and sequencing which may increase the likelihood of malocclusions.

High-Resolution Sequence-Function Mapping of Full-Length Proteins

PubMed Central

Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.

2015-01-01

Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

PubMed

Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

2010-05-07

Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep.

PubMed

Hecht, Jochen; Kuhl, Heiner; Haas, Stefan A; Bauer, Sebastian; Poustka, Albert J; Lienau, Jasmin; Schell, Hanna; Stiege, Asita C; Seitz, Volkhard; Reinhardt, Richard; Duda, Georg N; Mundlos, Stefan; Robinson, Peter N

2006-07-05

The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes. In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis. The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.
Influence of time and length size feature selections for human activity sequences recognition.

PubMed

Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran

2014-01-01

In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.
Longer biopsy cores do not increase prostate cancer detection rate: A large-scale cohort study refuting cut-off values indicated in the literature

PubMed Central

Yılmaz, Hasan; Yavuz, Ufuk; Üstüner, Murat; Çiftçi, Seyfettin; Yaşar, Hikmet; Müezzinoğlu, Bahar; Uslubaş, Ali Kemal; Dillioğlugil, Özdal

2017-01-01

Objective Only a few papers in the literature aimed to evaluate biopsy core lengths. Additionally, studies evaluated the core length with different approaches. We aimed to determine whether prostate cancer (PCa) detection is affected from core lengths according to three different approaches in a large standard cohort and compare our cut-off values with the published cut-offs. Material and methods We retrospectively analyzed 1,523 initial consecutive transrectal ultrasound-guided 12-core prostate biopsies. Biopsies were evaluated with respect to total core length (total length of each patients’ core) average core length (total core length divided by total number of cores in each patient), and mean core length (mean length of all cores pooled), and compared our cut-off values with the published cut-offs. The prostate volumes were categorized into four groups (<30, 30–59.99, 60–119.99, ≥120 cm3) and PCa detection rates in these categories were examined. Results PCa was found in 41.5% patients. There was no difference between benign and malignant mean core lengths of the pooled cores (p>0.05). Total core length and average core length were not significantly associated with PCa in multivariate logistic regression analyses (p>0.05). The core lengths (mean, average and total core lengths) increased (p<0.001) and PCa rates decreased (p<0.001) steadily with increasing prostate volume categories. PCa percentages decreased in all categories above the utilized cut-offs for mean (p>0.05), average (p<0.05), and total core lengths (p>0.05). Conclusion There was no difference between mean core lengths of benign and malignant cores. Total core length and average core length were not significantly associated with PCa. Contrary to the cut-offs used for mean and average core lengths in the published studies, PCa rates decrease as these core lengths increase. Larger studies are necessary for the determination and acceptance of accurate cut-offs. PMID:28861301
Short Communication: Genetic linkage map of Cucurbita maxima with molecular and morphological markers.

PubMed

Ge, Y; Li, X; Yang, X X; Cui, C S; Qu, S P

2015-05-22

Cucurbita maxima is one of the most widely cultivated vegetables in China and exhibits distinct morphological characteristics. In this study, genetic linkage analysis with 57 simple-sequence repeats, 21 amplified fragment length polymorphisms, 3 random-amplified polymorphic DNA, and one morphological marker revealed 20 genetic linkage groups of C. maxima covering a genetic distance of 991.5 cM with an average of 12.1 cM between adjacent markers. Genetic linkage analysis identified the simple-sequence repeat marker 'PU078072' 5.9 cM away from the locus 'Rc', which controls rind color. The genetic map in the present study will be useful for better mapping, tagging, and cloning of quantitative trait loci/gene(s) affecting economically important traits and for breeding new varieties of C. maxima through marker-assisted selection.
Transcriptomic analysis of Siberian ginseng (Eleutherococcus senticosus) to discover genes involved in saponin biosynthesis.

PubMed

Hwang, Hwan-Su; Lee, Hyoshin; Choi, Yong Eui

2015-03-14

Eleutherococcus senticosus, Siberian ginseng, is a highly valued woody medicinal plant belonging to the family Araliaceae. E. senticosus produces a rich variety of saponins such as oleanane-type, noroleanane-type, 29-hydroxyoleanan-type, and lupane-type saponins. Genomic or transcriptomic approaches have not been used to investigate the saponin biosynthetic pathway in this plant. In this study, de novo sequencing was performed to select candidate genes involved in the saponin biosynthetic pathway. A half-plate 454 pyrosequencing run produced 627,923 high-quality reads with an average sequence length of 422 bases. De novo assembly generated 72,811 unique sequences, including 15,217 contigs and 57,594 singletons. Approximately 48,300 (66.3%) unique sequences were annotated using BLAST similarity searches. All of the mevalonate pathway genes for saponin biosynthesis starting from acetyl-CoA were isolated. Moreover, 206 reads of cytochrome P450 (CYP) and 145 reads of uridine diphosphate glycosyltransferase (UGT) sequences were isolated. Based on methyl jasmonate (MeJA) treatment and real-time PCR (qPCR) analysis, 3 CYPs and 3 UGTs were finally selected as candidate genes involved in the saponin biosynthetic pathway. The identified sequences associated with saponin biosynthesis will facilitate the study of the functional genomics of saponin biosynthesis and genetic engineering of E. senticosus.
Principal sequence pattern analysis of episodes of excess mortality due to heat in the Barcelona metropolitan area.

PubMed

Peña, Juan Carlos; Aran, Montserrat; Raso, José Miguel; Pérez-Zanón, Nuria

2015-04-01

The aim of the study is to classify the synoptic sequences associated with excess mortality during the warm season in the Barcelona metropolitan area. To achieve this purpose, we undertook a principal sequence pattern analysis that incorporates different atmospheric levels, in an attempt at identifying the main features that account for dynamic and thermodynamic atmospheric processes. The sequence length was determined by the short-term displacement between temperature and mortality. To detect this lag, we applied the cross-correlation function to the residuals obtained from the modelling of the daily temperature and mortality series of summer. These residuals were estimated by means of an autoregressive integrated moving average (ARIMA) model. A 7-day sequence emerged as the basic temporal unit for evaluating the synoptic background that triggers the temperature related to excess mortality in the Barcelona metropolitan area. The principal sequence pattern analysis distinguished three main synoptic patterns: two dynamic configurations produced by southern fluxes related to an Atlantic low, which can be associated with heat waves recorded in southern Europe, and a third pattern identified by a stagnation situation associated with the persistence of a blocking anticyclone over Europe, related to heat waves recorded in northern and central western Europe.
Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

PubMed

Bashir, Ali; Bansal, Vikas; Bafna, Vineet

2010-06-18

Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
Species identification of mutans streptococci by groESL gene sequence.

PubMed

Hung, Wei-Chung; Tsai, Jui-Chang; Hsueh, Po-Ren; Chia, Jean-San; Teng, Lee-Jene

2005-09-01

The near full-length sequences of the groESL genes were determined and analysed among eight reference strains (serotypes a to h) representing five species of mutans group streptococci. The groES sequences from these reference strains revealed that there are two lengths (285 and 288 bp) in the five species. The intergenic spacer between groES and groEL appears to be a unique marker for species, with a variable size (ranging from 111 to 310 bp) and sequence. Phylogenetic analysis of groES and groEL separated the eight serotypes into two major clusters. Strains of serotypes b, c, e and f were highly related and had groES gene sequences of the same length, 288 bp, while strains of serotypes a, d, g and h were also closely related and their groES gene sequence lengths were 285 bp. The groESL sequences in clinical isolates of three serotypes of S. mutans were analysed for intraspecies polymorphism. The results showed that the groESL sequences could provide information for differentiation among species, but were unable to distinguish serotypes of the same species. Based on the determined sequences, a PCR assay was developed that could differentiate members of the mutans streptococci by amplicon size and provide an alternative way for distinguishing mutans streptococci from other viridans streptococci.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules.

PubMed

Li, Yueqi; Xiang, Limin; Palma, Julio L; Asai, Yoshihiro; Tao, Nongjian

2016-04-15

Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules

PubMed Central

Li, Yueqi; Xiang, Limin; Palma, Julio L.; Asai, Yoshihiro; Tao, Nongjian

2016-01-01

Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models. PMID:27079152
Mapping of Micro-Tom BAC-End Sequences to the Reference Tomato Genome Reveals Possible Genome Rearrangements and Polymorphisms

PubMed Central

Asamizu, Erika; Shirasawa, Kenta; Hirakawa, Hideki; Sato, Shusei; Tabata, Satoshi; Yano, Kentaro; Ariizumi, Tohru; Shibata, Daisuke; Ezura, Hiroshi

2012-01-01

A total of 93,682 BAC-end sequences (BESs) were generated from a dwarf model tomato, cv. Micro-Tom. After removing repetitive sequences, the BESs were similarity searched against the reference tomato genome of a standard cultivar, “Heinz 1706.” By referring to the “Heinz 1706” physical map and by eliminating redundant or nonsignificant hits, 28,804 “unique pair ends” and 8,263 “unique ends” were selected to construct hypothetical BAC contigs. The total physical length of the BAC contigs was 495, 833, 423 bp, covering 65.3% of the entire genome. The average coverage of euchromatin and heterochromatin was 58.9% and 67.3%, respectively. From this analysis, two possible genome rearrangements were identified: one in chromosome 2 (inversion) and the other in chromosome 3 (inversion and translocation). Polymorphisms (SNPs and Indels) between the two cultivars were identified from the BLAST alignments. As a result, 171,792 polymorphisms were mapped on 12 chromosomes. Among these, 30,930 polymorphisms were found in euchromatin (1 per 3,565 bp) and 140,862 were found in heterochromatin (1 per 2,737 bp). The average polymorphism density in the genome was 1 polymorphism per 2,886 bp. To facilitate the use of these data in Micro-Tom research, the BAC contig and polymorphism information are available in the TOMATOMICS database. PMID:23227037
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

PubMed Central

Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

2010-01-01

Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Identification of exotic genetic components and DNA methylation pattern analysis of three cotton introgression lines from Gossypium bickii.

PubMed

He, Shou-Pu; Sun, Jun-Ling; Zhang, Chao; Du, Xiong-Ming

2011-01-01

The impact of alien DNA fragments on plant genome has been studied in many species. However, little is known about the introgression lines of Gossypium. To study the consequences of introgression in Gossypium, we investigated 2000 genomic and 800 epigenetic sites in three typical cotton introgression lines, as well as their cultivar (Gossypium hirsutum) and wild parents (Gossypium bickii), by amplified fragment length polymorphism (AFLP) and methylation-sensitive amplified polymorphism (MSAP). The results demonstrate that an average of 0.5% of exotic DNA segments from wild cotton is transmitted into the genome of each introgression line, with the addition of other forms of genetic variation. In total, an average of 0.7% of genetic variation sites is identified in introgression lines. Simultaneously, the overall cytosine methylation level in each introgression line is very close to that of the upland cotton parent (an average of 22.6%). Further dividing patterns reveal that both hypomethylation and hypermethylation occurred in introgression lines in comparison with the upland cotton parent. Sequencing of nine methylation polymorphism fragments showed that most (7 of 9) of the methylation alternations occurred in the noncoding sequences. The molecular evidence of introgression from wild cotton into introgression lines in our study is identified by AFLP. Moreover, the causes of petal variation in introgression lines are discussed.
Differences in duration of eye fixation for conditions in a numerical stroop-effect experiment.

PubMed

Crespo, Antonio; Cabestrero, Raúl; Quirós, Pilar

2009-02-01

Durations of eye fixation were recorded for a numerical Stroop effect experiment. Participants (6 men, 19 women; M age=22 yr.) reported the number of characters present in sequences of variable length (2 to 5 characters) while attempting to ignore the identity of the character. Three conditions were included: congruent (the number of characters and the numeral were matched, e.g., responding "two" to 22), incongruent (the number of characters and the numeral were mismatched, e.g., responding "two" to 55), and control (baseline of stimuli made up of "X"s, e.g., responding "two" to XX). Comparisons among the three conditions produced the longest response times and average durations of fixation for the incongruent condition. The shortest response times and average durations of fixation were obtained for the congruent condition.
Length Variation, Heteroplasmy and Sequence Divergence in the Mitochondrial DNA of Four Species of Sturgeon (Acipenser)

PubMed Central

Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.

1996-01-01

The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

PubMed

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu

2011-09-07

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

PubMed Central

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

2011-01-01

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

PubMed

Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

2015-03-31

With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

PubMed

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-06-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants

PubMed Central

2014-01-01

Background The source inoculum of gastrointestinal tract (GIT) microbes is largely influenced by delivery mode in full-term infants, but these influences may be decoupled in very low birth weight (VLBW, <1,500 g) neonates via conventional broad-spectrum antibiotic treatment. We hypothesize the built environment (BE), specifically room surfaces frequently touched by humans, is a predominant source of colonizing microbes in the gut of premature VLBW infants. Here, we present the first matched fecal-BE time series analysis of two preterm VLBW neonates housed in a neonatal intensive care unit (NICU) over the first month of life. Results Fresh fecal samples were collected every 3 days and metagenomes sequenced on an Illumina HiSeq2000 device. For each fecal sample, approximately 33 swabs were collected from each NICU room from 6 specified areas: sink, feeding and intubation tubing, hands of healthcare providers and parents, general surfaces, and nurse station electronics (keyboard, mouse, and cell phone). Swabs were processed using a recently developed ‘expectation maximization iterative reconstruction of genes from the environment’ (EMIRGE) amplicon pipeline in which full-length 16S rRNA amplicons were sheared and sequenced using an Illumina platform, and short reads reassembled into full-length genes. Over 24,000 full-length 16S rRNA sequences were produced, generating an average of approximately 12,000 operational taxonomic units (OTUs) (clustered at 97% nucleotide identity) per room-infant pair. Dominant gut taxa, including Staphylococcus epidermidis, Klebsiella pneumoniae, Bacteroides fragilis, and Escherichia coli, were widely distributed throughout the room environment with many gut colonizers detected in more than half of samples. Reconstructed genomes from infant gut colonizers revealed a suite of genes that confer resistance to antibiotics (for example, tetracycline, fluoroquinolone, and aminoglycoside) and sterilizing agents, which likely offer a competitive advantage in the NICU environment. Conclusions We have developed a high-throughput culture-independent approach that integrates room surveys based on full-length 16S rRNA gene sequences with metagenomic analysis of fecal samples collected from infants in the room. The approach enabled identification of discrete ICU reservoirs of microbes that also colonized the infant gut and provided evidence for the presence of certain organisms in the room prior to their detection in the gut. PMID:24468033
An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.

PubMed

Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J

2014-11-01

The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Memory for tonal pitches: a music-length effect hypothesis.

PubMed

Akiva-Kabiri, Lilach; Vecchi, Tomaso; Granot, Roni; Basso, Demis; Schön, Daniele

2009-07-01

One of the most studied effects of verbal working memory (WM) is the influence of the length of the words that compose the list to be remembered. This work aims to investigate the nature of musical WM by replicating the word length effect in the musical domain. Length and rate of presentation were manipulated in a recognition task of tone sequences. Results showed significant effects for both factors (length and presentation rate) as well as their interaction, suggesting the existence of different strategies (e.g., chunking and rehearsal) for the immediate memory of musical information, depending upon the length of the sequences.
Transcriptome-wide identification of Rauvolfia serpentina microRNAs and prediction of their potential targets.

PubMed

Prakash, Pravin; Rajakani, Raja; Gupta, Vikrant

2016-04-01

MicroRNAs (miRNAs) are small non-coding RNAs of ∼ 19-24 nucleotides (nt) in length and considered as potent regulators of gene expression at transcriptional and post-transcriptional levels. Here we report the identification and characterization of 15 conserved miRNAs belonging to 13 families from Rauvolfia serpentina through in silico analysis of available nucleotide dataset. The identified mature R. serpentina miRNAs (rse-miRNAs) ranged between 20 and 22nt in length, and the average minimal folding free energy index (MFEI) value of rse-miRNA precursor sequences was found to be -0.815 kcal/mol. Using the identified rse-miRNAs as query, their potential targets were predicted in R. serpentina and other plant species. Gene Ontology (GO) annotation showed that predicted targets of rse-miRNAs include transcription factors as well as genes involved in diverse biological processes such as primary and secondary metabolism, stress response, disease resistance, growth, and development. Few rse-miRNAs were predicted to target genes of pharmaceutically important secondary metabolic pathways such as alkaloids and anthocyanin biosynthesis. Phylogenetic analysis showed the evolutionary relationship of rse-miRNAs and their precursor sequences to homologous pre-miRNA sequences from other plant species. The findings under present study besides giving first hand information about R. serpentina miRNAs and their targets, also contributes towards the better understanding of miRNA-mediated gene regulatory processes in plants. Copyright © 2015 Elsevier Ltd. All rights reserved.
Transcriptome Analysis of Leaves, Flowers and Fruits Perisperm of Coffea arabica L. Reveals the Differential Expression of Genes Involved in Raffinose Biosynthesis

PubMed Central

dos Santos, Tiago Benedito; de Oliveira, Fernanda Freitas; Pot, David; Leroy, Thierry; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães

2017-01-01

Coffea arabica L. is an important crop in several developing countries. Despite its economic importance, minimal transcriptome data are available for fruit tissues, especially during fruit development where several compounds related to coffee quality are produced. To understand the molecular aspects related to coffee fruit and grain development, we report a large-scale transcriptome analysis of leaf, flower and perisperm fruit tissue development. Illumina sequencing yielded 41,881,572 high-quality filtered reads. De novo assembly generated 65,364 unigenes with an average length of 1,264 bp. A total of 24,548 unigenes were annotated as protein coding genes, including 12,560 full-length sequences. In the annotation process, we identified nine candidate genes related to the biosynthesis of raffinose family oligossacarides (RFOs). These sugars confer osmoprotection and are accumulated during initial fruit development. Four genes from this pathway had their transcriptional pattern validated by quantitative reverse transcription polymerase chain reaction (qRT-PCR). Furthermore, we identified ~24,000 putative target sites for microRNAs (miRNAs) and 134 putative transcriptionally active transposable elements (TE) sequences in our dataset. This C. arabica transcriptomic atlas provides an important step for identifying candidate genes related to several coffee metabolic pathways, especially those related to fruit chemical composition and therefore beverage quality. Our results are the starting point for enhancing our knowledge about the coffee genes that are transcribed during the flowering and initial fruit development stages. PMID:28068432
Transcriptome Analysis of Leaves, Flowers and Fruits Perisperm of Coffea arabica L. Reveals the Differential Expression of Genes Involved in Raffinose Biosynthesis.

PubMed

Ivamoto, Suzana Tiemi; Reis, Osvaldo; Domingues, Douglas Silva; Dos Santos, Tiago Benedito; de Oliveira, Fernanda Freitas; Pot, David; Leroy, Thierry; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães; Pereira, Luiz Filipe Protasio

2017-01-01

Coffea arabica L. is an important crop in several developing countries. Despite its economic importance, minimal transcriptome data are available for fruit tissues, especially during fruit development where several compounds related to coffee quality are produced. To understand the molecular aspects related to coffee fruit and grain development, we report a large-scale transcriptome analysis of leaf, flower and perisperm fruit tissue development. Illumina sequencing yielded 41,881,572 high-quality filtered reads. De novo assembly generated 65,364 unigenes with an average length of 1,264 bp. A total of 24,548 unigenes were annotated as protein coding genes, including 12,560 full-length sequences. In the annotation process, we identified nine candidate genes related to the biosynthesis of raffinose family oligossacarides (RFOs). These sugars confer osmoprotection and are accumulated during initial fruit development. Four genes from this pathway had their transcriptional pattern validated by quantitative reverse transcription polymerase chain reaction (qRT-PCR). Furthermore, we identified ~24,000 putative target sites for microRNAs (miRNAs) and 134 putative transcriptionally active transposable elements (TE) sequences in our dataset. This C. arabica transcriptomic atlas provides an important step for identifying candidate genes related to several coffee metabolic pathways, especially those related to fruit chemical composition and therefore beverage quality. Our results are the starting point for enhancing our knowledge about the coffee genes that are transcribed during the flowering and initial fruit development stages.
First insights into the giant panda (Ailuropoda melanoleuca) blood transcriptome: a resource for novel gene loci and immunogenetics.

PubMed

Du, Lianming; Li, Wujiao; Fan, Zhenxin; Shen, Fujun; Yang, Mingyu; Wang, Zili; Jian, Zuoyi; Hou, Rong; Yue, Bisong; Zhang, Xiuyue

2015-07-01

The giant panda (Ailuropoda melanoleuca) is one of the most famous flagship species for conservation, and its draft genome has recently been assembled. However, the transcriptome is not yet available. In this study, the blood transcriptomes of three pandas were characterized and about 160 million sequencing reads were generated using Illumina HiSeq 2000 paired-end sequencing technology. The assembly yielded 92 598 transcripts with an average length of 1626 bp and N50 length of 2842 bp. Based on a sequence similarity search against nonredundant (nr) protein database, a total of 38 522 (41.6%) transcripts were annotated. Of these annotated transcripts, 25 142 and 8272 transcripts were assigned to gene ontology terms and clusters of orthologous group, respectively. A search against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 9098 (9.83%) transcripts mapped to 324 KEGG pathways, and the best represented functional categories of pathways were signal transduction and immune system. We have also identified 23 460 microsatellites, 43 560 SNPs as well as 21 456 alternative splicing events in the assembly. Additionally, a total of 24 341 complete open reading frames (ORFs) were detected from the assembly where 1492 ORFs were found to be novel gene loci as these have not been annotated so far in any public database. © 2014 John Wiley & Sons Ltd.
Comparative oncology DNA sequencing of canine T cell lymphoma via human hotspot panel

PubMed Central

Beheshti, Afshin; Pilichowska, Monika; Burgess, Kristine; Ricks-Santi, Luisel; McNiel, Elizabeth; London, Cheryl B.; Ravi, Dashnamoorthy; Evens, Andrew M.

2018-01-01

T-cell lymphoma (TCL) is an uncommon and aggressive form of human cancer. Lymphoma is the most common hematopoietic tumor in canines (companion animals), with TCL representing approximately 30% of diagnoses. Collectively, the canine is an appealing model for cancer research given the spontaneous occurrence of cancer, intact immune system, and phytogenetic proximity to humans. We sought to establish mutational congruence of the canine with known human TCL mutations in order to identify potential actionable oncogenic pathways. Following pathologic confirmation, DNA was sequenced in 16 canine TCL (cTCL) cases using a custom Human Cancer Hotspot Panel of 68 genes commonly mutated in human TCL. Sequencing identified 4,527,638 total reads with average length of 229 bases containing 346 unique variants and 1,474 total variants; each sample had an average of 92 variants. Among these, there were 258 germline and 32 somatic variants. Among the 32 somatic variants there were 8 missense variants, 1 splice junction variant and the remaining were intron or synonymous variants. A frequency of 4 somatic mutations per sample were noted with >7 mutations detected in MET, KDR, STK11 and BRAF. Expression of these associated proteins were also detected via Western blot analyses. In addition, Sanger sequencing confirmed three variants of high quality (MYC, MET, and TP53 missense mutation). Taken together, the mutational spectrum and protein analyses showed mutations in signaling pathways similar to human TCL and also identified novel mutations that may serve as drug targets as well as potential biomarkers. PMID:29854308
Evaluation of MR issues for the latest standard brands of orthopedic metal implants: plates and screws.

PubMed

Zou, Yue-Fen; Chu, Bin; Wang, Chuan-Bing; Hu, Zhi-Yi

2015-03-01

The study was performed to evaluate magnetic resonance (MR) issues for the latest standard brands of plates and screws used in orthopedic surgery at a 1.5-T MR system, including the safety and metallic artifacts. The plates and screws (made of titanium alloy and stainless steel materials, according to the latest standard brands) were assessed for displacement in degrees, MRI-related heating and artifacts at a 1.5-T MR system. The displacement in degrees of the plates and screws was evaluated on an angel-measurement instrument at the entrance of the MR scanner. The MRI-related heating was assessed on a swine leg fixed with a plate by using a "worst-case" pulse sequence. A rectangular water phantom was designed to evaluate metallic artifacts of a screw on different sequences (T1/T2-weighted FSE, STIR, T2-FSE fat saturation, GRE, DWI) and then artifacts were evaluated on T2-weighted FSE sequence by modifying the scanning parameters including field of view (FOV), echo train length (ETL) and bandwidth to identify the influence of parameters on metallic artifacts. 15 volunteers with internal vertebral fixation (titanium alloy materials) were scanned with MR using axial and sagittal T2-FSE, sagittal T2-FSE fat suppression and STIR with conventional and optimized parameters, respectively. Then all images were graded by two experienced radiologists having the experience of more than 7 years under double-blind studies that is neither of them knew which was conventional parameter group and optimized parameter group. The average deflection angle of titanium alloy and stainless steel implants were 4.3° and 7.7°, respectively, (less than 45°) which indicated that the magnetically induced force was less than the weight of the object. The deflection angle of the titanium alloy implants was less than the stainless steel one (t=9.69, P<0.001). The average temperature changes of titanium alloy before and after the scan was 0.48°C and stainless steel implants was 0.74°C, respectively, with the background temperature changes of 0.24°C. The water phantom test indicated that the DWI sequence produced largest artifacts, while FSE pulse sequence produced smallest artifacts. And T2-weighted FSE fat saturation sequence produced larger artifacts than STIR sequence. The influence of the scanning parameters on metallic artifacts was verified that metallic artifacts increased with longer echo train length and bigger FOV, while decreased with larger bandwidth. The interreader agreement was good or excellent for each set of images graded with Cohen's Kappa statistic. Image grading of axial and sagittal T2-FSE with optimized parameters were significantly superior to that with conventional parameters (grade, 3.3±0.5 vs 2.7±0.6, P=0.003; 3.2±0.4 vs 1.9±0.7, P=0.001) and image of STIR sequence received a better grade than T2-FSE FS sequence (grade, 3.4±0.5 vs 1.7±0.6, P<0.001). The latest standard plates and screws used in orthopedic surgery do not pose an additional hazard or risk to patients undergoing MR imaging at 1.5-T or less. Though artifacts caused by them cannot be ignored because of their relatively large size, it is possible to be minimized by choosing appropriate pulse sequences and optimizing scanning parameters, such as FSE and STIR sequence with large bandwidth, small FOV and appropriate echo train length. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A note on chaotic unimodal maps and applications.

PubMed

Zhou, C T; He, X T; Yu, M Y; Chew, L Y; Wang, X G

2006-09-01

Based on the word-lift technique of symbolic dynamics of one-dimensional unimodal maps, we investigate the relation between chaotic kneading sequences and linear maximum-length shift-register sequences. Theoretical and numerical evidence that the set of the maximum-length shift-register sequences is a subset of the set of the universal sequence of one-dimensional chaotic unimodal maps is given. By stabilizing unstable periodic orbits on superstable periodic orbits, we also develop techniques to control the generation of long binary sequences.
What is a melody? On the relationship between pitch and brightness of timbre.

PubMed

Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel

2013-01-01

Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners' task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities.
Poly A tail length analysis of in vitro transcribed mRNA by LC-MS.

PubMed

Beverly, Michael; Hagen, Caitlin; Slack, Olga

2018-02-01

The 3'-polyadenosine (poly A) tail of in vitro transcribed (IVT) mRNA was studied using liquid chromatography coupled to mass spectrometry (LC-MS). Poly A tails were cleaved from the mRNA using ribonuclease T1 followed by isolation with dT magnetic beads. Extracted tails were then analyzed by LC-MS which provided tail length information at single-nucleotide resolution. A 2100-nt mRNA with plasmid-encoded poly A tail lengths of either 27, 64, 100, or 117 nucleotides was used for these studies as enzymatically added poly A tails showed significant length heterogeneity. The number of As observed in the tails closely matched Sanger sequencing results of the DNA template, and even minor plasmid populations with sequence variations were detected. When the plasmid sequence contained a discreet number of poly As in the tail, analysis revealed a distribution that included tails longer than the encoded tail lengths. These observations were consistent with transcriptional slippage of T7 RNAP taking place within a poly A sequence. The type of RNAP did not alter the observed tail distribution, and comparison of T3, T7, and SP6 showed all three RNAPs produced equivalent tail length distributions. The addition of a sequence at the 3' end of the poly A tail did, however, produce narrower tail length distributions which supports a previously described model of slippage where the 3' end can be locked in place by having a G or C after the poly nucleotide region. Graphical abstract Determination of mRNA poly A tail length using magnetic beads and LC-MS.
Identification of full-length proviral DNA of porcine endogenous retrovirus from Chinese Wuzhishan miniature pigs inbred.

PubMed

Ma, Yuyuan; Lv, Maomin; Xu, Shu; Wu, Jianmin; Tian, Kegong; Zhang, Jingang

2010-07-01

Existence of porcine endogenous retrovirus (PERV) hinders pigs to be used in clinical xenotransplantation to alleviate the shortage of human transplants. Chinese miniature pigs are potential organ donors for xenotransplantation in China. However, so far, an adequate level of information on the molecular characteristics of PERV from Chinese miniature pigs has not been available. We described here the cloning and characterization of full-length proviral DNA of PERV from Chinese Wuzhishan miniature pigs inbred (WZSP). Full-length nucleotide sequences of PERV-WZSP and other PERVs were aligned and phylogenetic tree was constructed from deduced amino-acid sequences of env. The results demonstrated that the full-length proviral DNA of PERV-WZSP belongs to gammaretrovirus and shares high similarity with other PERVs. Sequence analysis also suggested that different patterns of LTR existed in the same porcine germ line and partial PERV-C sequence may recombine with PERV-A sequence in LTR. (c) 2008 Elsevier Ltd. All rights reserved.
Average focal length and power of a section of any defined surface.

PubMed

Kaye, Stephen B

2010-04-01

To provide a method to allow calculation of the average focal length and power of a lens through a specified meridian of any defined surface, not limited to the paraxial approximations. University of Liverpool, Liverpool, United Kingdom. Functions were derived to model back-vertex focal length and representative power through a meridian containing any defined surface. Average back-vertex focal length was based on the definition of the average of a function, using the angle of incidence as an independent variable. Univariate functions allowed determination of average focal length and power through a section of any defined or topographically measured surface of a known refractive index. These functions incorporated aberrations confined to the section. The proposed method closely approximates the average focal length, and by inference power, of a section (meridian) of a surface to a single or scalar value. It is not dependent on the paraxial and other nonconstant approximations and includes aberrations confined to that meridian. A generalization of this method to include all orthogonal and oblique meridians is needed before a comparison with measured wavefront values can be made. Copyright (c) 2010 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.
Transcriptome Analysis of the Differentially Expressed Genes in the Male and Female Shrub Willows (Salix suchowensis)

PubMed Central

Liu, Jingjing; Yin, Tongming; Ye, Ning; Chen, Yingnan; Yin, Tingting; Liu, Min; Hassani, Danial

2013-01-01

Background The dioecious system is relatively rare in plants. Shrub willow is an annual flowering dioecious woody plant, and possesses many characteristics that lend it as a great model for tracking the missing pieces of sex determination evolution. To gain a global view of the genes differentially expressed in the male and female shrub willows and to develop a database for further studies, we performed a large-scale transcriptome sequencing of flower buds which were separately collected from two types of sexes. Results Totally, 1,201,931 high quality reads were obtained, with an average length of 389 bp and a total length of 467.96 Mb. The ESTs were assembled into 29,048 contigs, and 132,709 singletons. These unigenes were further functionally annotated by comparing their sequences to different proteins and functional domain databases and assigned with Gene Ontology (GO) terms. A biochemical pathway database containing 291 predicted pathways was also created based on the annotations of the unigenes. Digital expression analysis identified 806 differentially expressed genes between the male and female flower buds. And 33 of them located on the incipient sex chromosome of Salicaceae, among which, 12 genes might involve in plant sex determination empirically. These genes were worthy of special notification in future studies. Conclusions In this study, a large number of EST sequences were generated from the flower buds of a male and a female shrub willow. We also reported the differentially expressed genes between the two sex-type flowers. This work provides valuable information and sequence resources for uncovering the sex determining genes and for future functional genomics analysis of Salicaceae spp. PMID:23560075
Characterization of mtDNA variation in a cohort of South African paediatric patients with mitochondrial disease.

PubMed

van der Walt, Elizna M; Smuts, Izelle; Taylor, Robert W; Elson, Joanna L; Turnbull, Douglass M; Louw, Roan; van der Westhuizen, Francois H

2012-06-01

Mitochondrial disease can be attributed to both mitochondrial and nuclear gene mutations. It has a heterogeneous clinical and biochemical profile, which is compounded by the diversity of the genetic background. Disease-based epidemiological information has expanded significantly in recent decades, but little information is known that clarifies the aetiology in African patients. The aim of this study was to investigate mitochondrial DNA variation and pathogenic mutations in the muscle of diagnosed paediatric patients from South Africa. A cohort of 71 South African paediatric patients was included and a high-throughput nucleotide sequencing approach was used to sequence full-length muscle mtDNA. The average coverage of the mtDNA genome was 81±26 per position. After assigning haplogroups, it was determined that although the nature of non-haplogroup-defining variants was similar in African and non-African haplogroup patients, the number of substitutions were significantly higher in African patients. We describe previously reported disease-associated and novel variants in this cohort. We observed a general lack of commonly reported syndrome-associated mutations, which supports clinical observations and confirms general observations in African patients when using single mutation screening strategies based on (predominantly non-African) mtDNA disease-based information. It is finally concluded that this first extensive report on muscle mtDNA sequences in African paediatric patients highlights the need for a full-length mtDNA sequencing strategy, which applies to all populations where specific mutations is not present. This, in addition to nuclear DNA gene mutation and pathogenicity evaluations, will be required to better unravel the aetiology of these disorders in African patients.
Length and sequence variability in mitochondrial control region of the milkfish, Chanos chanos.

PubMed

Ravago, Rachel G; Monje, Virginia D; Juinio-Meñez, Marie Antonette

2002-01-01

Extensive length variability was observed in the mitochondrial control region of the milkfish, Chanos chanos. The nucleotide sequence of the control region and flanking regions was determined. Length variability and heteroplasmy was due to the presence of varying numbers of a 41-bp tandemly repeated sequence and a 48-bp insertion/deletion (indel). The structure and organization of the milkfish control region is similar to that of other teleost fish and vertebrates. However, extensive variation in the copy number of tandem repeats (4-20 copies) and the presence of a relatively large (48-bp) indel, are apparently uncommon in teleost fish control region sequences reported to date. High sequence variability of control region peripheral domains indicates the potential utility of selected regions as markers for population-level studies.
Transcriptomic Profiling of Differential Responses to Drought in Two Freshwater Mussel Species, the Giant Floater Pyganodon grandis and the Pondhorn Uniomerus tetralasmus

PubMed Central

Landis, Andrew Gascho; Wang, Guiling; Stoeckel, James; Peatman, Eric

2014-01-01

The southeastern US has experienced recurrent drought during recent decades. Increasing demand for water, as precipitation decreases, exacerbates stress on the aquatic biota of the Southeast: a global hotspot for freshwater mussel, crayfish, and fish diversity. Freshwater unionid mussels are ideal candidates to study linkages between ecophysiological and behavioral responses to drought. Previous work on co-occurring mussel species suggests a coupling of physiology and behavior along a gradient ranging from intolerant species such as Pyganodon grandis (giant floater) that track receding waters and rarely burrow in the substrates to tolerant species such as Uniomerus tetralasmus (pondhorn) that rarely track receding waters, but readily burrow into the drying sediments. We utilized a next-generation sequencing-based RNA-Seq approach to examine heat/desiccation-induced transcriptomic profiles of these two species in order to identify linkages between patterns of gene expression, physiology and behavior. Sequencing produced over 425 million 100 bp reads. Using the de novo assembly package Trinity, we assembled the short reads into 321,250 contigs from giant floater (average length 835 bp) and 385,735 contigs from pondhorn (average length 929 bp). BLAST-based annotation and gene expression analysis revealed 2,832 differentially expressed genes in giant floater and 2,758 differentially expressed genes in pondhorn. Trancriptomic responses included changes in molecular chaperones, oxidative stress profiles, cell cycling, energy metabolism, immunity, and cytoskeletal rearrangements. Comparative analyses between species indicated significantly higher induction of molecular chaperones and cytoskeletal elements in the intolerant P. grandis as well as important differences in genes regulating apoptosis and immunity. PMID:24586812
Yellow lupin (Lupinus luteus L.) transcriptome sequencing: molecular marker development and comparative studies

PubMed Central

2012-01-01

Background Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Results Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession’s origin. Conclusion L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection. PMID:22920992

Yellow lupin (Lupinus luteus L.) transcriptome sequencing: molecular marker development and comparative studies.

PubMed

Parra-González, Lorena B; Aravena-Abarzúa, Gabriela A; Navarro-Navarro, Cristell S; Udall, Joshua; Maughan, Jeff; Peterson, Louis M; Salvo-Garrido, Haroldo E; Maureira-Butler, Iván J

2012-08-24

Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession's origin. L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection.
Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

PubMed

Farmery, James H R; Smith, Mike L; Lynch, Andy G

2018-01-22

Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.
[Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

2010-01-01

Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
The Association Between childhood Obesity and Tooth Eruption

PubMed Central

Must, Aviva; Phillips, Sarah M.; Tybor, David J.; Lividini, Keith; Hayes, Catherine

2013-01-01

Obesity is a growth-promoting process as evidenced by its effect on the timing of puberty. Although studies are limited, obesity has been shown to affect the timing of tooth eruption. Both the timing and sequence of tooth eruption are important to overall oral health. The purpose of this study was to examine the association between obesity and tooth eruption. Data were combined from three consecutive cycles (2001–2006) of the National Health and Nutrition Examination Survey (NHANES) and analyzed to examine associations between the number of teeth erupted (NET) and obesity status (BMI z-score >95th percentile BMI relative to the Centers for Disease Control and Prevention (CDC) growth reference) among children 5 up to 14 years of age, controlling for potential confounding by age, gender, race, and socioeconomic status (SES). Obesity is significantly associated with having a higher average NET during the mixed dentition period. On average, teeth of obese children erupted earlier than nonobese children with obese children having on average 1.44 more teeth erupted than nonobese children, after adjusting for age, gender, and race/ethnicity (P < 0.0001). SES was not a confounder of the observed associations. Obese children, on average, have significantly more teeth erupted than nonobese children after adjusting for gender, age, and race. These findings may have clinical importance in the area of dental and orthodontic medicine both in terms of risk for dental caries due to extended length of time exposed in the oral cavity and sequencing which may increase the likelihood of malocclusions. PMID:22310231
Novel Detection of Insecticide Resistance Related P450 Genes and Transcriptome Analysis of the Hemimetabolous Pest Erthesina fullo (Thunberg) (Hemiptera: Heteroptera).

PubMed

Liu, Yang; Wu, Haoyang; Xie, Qiang; Bu, Wenjun

2015-01-01

Erthesina fullo (Thunberg, 1783) is an economically important heteropteran species in China. Since only three nucleotide sequences of this species (COI, 16S rRNA, and 18S rRNA) appear in the GenBank database so far, no analysis of the molecular mechanisms underlying E. fullo's resistance to insecticide and environmental stress has been accomplished. We reported a de novo assembled and annotated transcriptome for adult E. fullo using the Illumina sequence system. A total of 53,359,458 clean reads of 4.8 billion nucleotides (nt) were assembled into 27,488 unigenes with an average length of 750 bp, of which 17,743 (64.55%) were annotated. In the present study, we identified 88 putative cytochrome P450 sequences and analyzed the evolution of cytochrome P450 superfamilies, genes of the CYP3 clan related to metabolizing xenobiotics and plant natural compounds, in E. fullo, increasing the candidate genes for the molecular mechanisms of insecticide resistance in P450. The sequenced transcriptome greatly expands the available genomic information and could allow a better understanding of the mechanisms of insecticide resistance at the systems biology level.
Compartmentalization of the yeast meiotic nucleus revealed by analysis of ectopic recombination.

PubMed

Schlecht, Hélène B; Lichten, Michael; Goldman, Alastair S H

2004-11-01

As yeast cells enter meiosis, chromosomes move from a centromere-clustered (Rabl) to a telomere-clustered (bouquet) configuration and then to states of progressive homolog pairing where telomeres are more dispersed. It is uncertain at which stage of this process sequences commit to recombine with each other. Previous analyses using recombination between dispersed homologous sequences (ectopic recombination) support the view that, on average, homologs are aligned end to end by the time of commitment to recombination. We have undertaken further analyses incorporating new inserts, chromosome rearrangements, an alternate mode of recombination initiation, and mutants that disrupt nuclear structure or telomere metabolism. Our findings support previous conclusions and reveal that distance from the nearest telomere is an important parameter influencing recombination between dispersed sequences. In general, the farther dispersed sequences are from their nearest telomere, the less likely they are to engage in ectopic recombination. Neither the mode of initiating recombination nor the formation of the bouquet appears to affect this relationship. We suggest that aspects of telomere localization and behavior influence the organization and mobility of chromosomes along their entire length, during a critical period of meiosis I prophase that encompasses the homology search.
RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

PubMed Central

Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

2000-01-01

The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be revealed by seven RISA systems within one month. PMID:11076861
Pathogenic bacteria in sewage treatment plants as revealed by 454 pyrosequencing.

PubMed

Ye, Lin; Zhang, Tong

2011-09-01

This study applied 454 high-throughput pyrosequencing to analyze potentially pathogenic bacteria in activated sludge from 14 municipal wastewater treatment plants (WWTPs) across four countries (China, U.S., Canada, and Singapore), plus the influent and effluent of one of the 14 WWTPs. A total of 370,870 16S rRNA gene sequences with average length of 207 bps were obtained and all of them were assigned to corresponding taxonomic ranks by using RDP classifier and MEGAN. It was found that the most abundant potentially pathogenic bacteria in the WWTPs were affiliated with the genera of Aeromonas and Clostridium. Aeromonas veronii, Aeromonas hydrophila, and Clostridium perfringens were species most similar to the potentially pathogenic bacteria found in this study. Some sequences highly similar (>99%) to Corynebacterium diphtheriae were found in the influent and activated sludge samples from a saline WWTP. Overall, the percentage of the sequences closely related (>99%) to known pathogenic bacteria sequences was about 0.16% of the total sequences. Additionally, a platform-independent Java application (BAND) was developed for graphical visualization of the data of microbial abundance generated by high-throughput pyrosequencing. The approach demonstrated in this study could examine most of the potentially pathogenic bacteria simultaneously instead of one-by-one detection by other methods.
Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

PubMed

Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

2016-06-01

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.
Operating characteristics of the implicit learning system supporting serial interception sequence learning.

PubMed

Sanchez, Daniel J; Reber, Paul J

2012-04-01

The memory system that supports implicit perceptual-motor sequence learning relies on brain regions that operate separately from the explicit, medial temporal lobe memory system. The implicit learning system therefore likely has distinct operating characteristics and information processing constraints. To attempt to identify the limits of the implicit sequence learning mechanism, participants performed the serial interception sequence learning (SISL) task with covertly embedded repeating sequences that were much longer than most previous studies: ranging from 30 to 60 (Experiment 1) and 60 to 90 (Experiment 2) items in length. Robust sequence-specific learning was observed for sequences up to 80 items in length, extending the known capacity of implicit sequence learning. In Experiment 3, 12-item repeating sequences were embedded among increasing amounts of irrelevant nonrepeating sequences (from 20 to 80% of training trials). Despite high levels of irrelevant trials, learning occurred across conditions. A comparison of learning rates across all three experiments found a surprising degree of constancy in the rate of learning regardless of sequence length or embedded noise. Sequence learning appears to be constant with the logarithm of the number of sequence repetitions practiced during training. The consistency in learning rate across experiments and conditions implies that the mechanisms supporting implicit sequence learning are not capacity-constrained by very long sequences nor adversely affected by high rates of irrelevant sequences during training.
On avoided words, absent words, and their application to biological sequence analysis.

PubMed

Almirantis, Yannis; Charalampopoulos, Panagiotis; Gao, Jia; Iliopoulos, Costas S; Mohamed, Manal; Pissis, Solon P; Polychronopoulos, Dimitris

2017-01-01

The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided . This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w , denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length [Formula: see text] is a [Formula: see text]-avoided word in x if [Formula: see text], for a given threshold [Formula: see text]. Notice that such a word may be completely absent from x . Hence, computing all such words naïvely can be a very time-consuming procedure, in particular for large k . In this article, we propose an [Formula: see text]-time and [Formula: see text]-space algorithm to compute all [Formula: see text]-avoided words of length k in a given sequence of length n over a fixed-sized alphabet. We also present a time-optimal [Formula: see text]-time algorithm to compute all [Formula: see text]-avoided words (of any length) in a sequence of length n over an integer alphabet of size [Formula: see text]. In addition, we provide a tight asymptotic upper bound for the number of [Formula: see text]-avoided words over an integer alphabet and the expected length of the longest one. We make available an implementation of our algorithm. Experimental results, using both real and synthetic data, show the efficiency and applicability of our implementation in biological sequence analysis. The systematic search for avoided words is particularly useful for biological sequence analysis. We present a linear-time and linear-space algorithm for the computation of avoided words of length k in a given sequence x . We suggest a modification to this algorithm so that it computes all avoided words of x , irrespective of their length, within the same time complexity. We also present combinatorial results with regards to avoided words and absent words.
DNA confinement in nanochannels: physics and biological applications

NASA Astrophysics Data System (ADS)

Reisner, Walter; Pedersen, Jonas N.; Austin, Robert H.

2012-10-01

DNA is the central storage molecule of genetic information in the cell, and reading that information is a central problem in biology. While sequencing technology has made enormous advances over the past decade, there is growing interest in platforms that can readout genetic information directly from long single DNA molecules, with the ultimate goal of single-cell, single-genome analysis. Such a capability would obviate the need for ensemble averaging over heterogeneous cellular populations and eliminate uncertainties introduced by cloning and molecular amplification steps (thus enabling direct assessment of the genome in its native state). In this review, we will discuss how the information contained in genomic-length single DNA molecules can be accessed via physical confinement in nanochannels. Due to self-avoidance interactions, DNA molecules will stretch out when confined in nanochannels, creating a linear unscrolling of the genome along the channel for analysis. We will first review the fundamental physics of DNA nanochannel confinement—including the effect of varying ionic strength—and then discuss recent applications of these systems to genomic mapping. Apart from the intense biological interest in extracting linear sequence information from elongated DNA molecules, from a physics view these systems are fascinating as they enable probing of single-molecule conformation in environments with dimensions that intersect key physical length-scales in the 1 nm to 100 µm range.
Departure Queue Prediction for Strategic and Tactical Surface Scheduler Integration

NASA Technical Reports Server (NTRS)

Zelinski, Shannon; Windhorst, Robert

2016-01-01

A departure metering concept to be demonstrated at Charlotte Douglas International Airport (CLT) will integrate strategic and tactical surface scheduling components to enable the respective collaborative decision making and improved efficiency benefits these two methods of scheduling provide. This study analyzes the effect of tactical scheduling on strategic scheduler predictability. Strategic queue predictions and target gate pushback times to achieve a desired queue length are compared between fast time simulations of CLT surface operations with and without tactical scheduling. The use of variable departure rates as a strategic scheduler input was shown to substantially improve queue predictions over static departure rates. With target queue length calibration, the strategic scheduler can be tuned to produce average delays within one minute of the tactical scheduler. However, root mean square differences between strategic and tactical delays were between 12 and 15 minutes due to the different methods the strategic and tactical schedulers use to predict takeoff times and generate gate pushback clearances. This demonstrates how difficult it is for the strategic scheduler to predict tactical scheduler assigned gate delays on an individual flight basis as the tactical scheduler adjusts departure sequence to accommodate arrival interactions. Strategic/tactical scheduler compatibility may be improved by providing more arrival information to the strategic scheduler and stabilizing tactical scheduler changes to runway sequence in response to arrivals.
DNA confinement in nanochannels: physics and biological applications.

PubMed

Reisner, Walter; Pedersen, Jonas N; Austin, Robert H

2012-10-01

DNA is the central storage molecule of genetic information in the cell, and reading that information is a central problem in biology. While sequencing technology has made enormous advances over the past decade, there is growing interest in platforms that can readout genetic information directly from long single DNA molecules, with the ultimate goal of single-cell, single-genome analysis. Such a capability would obviate the need for ensemble averaging over heterogeneous cellular populations and eliminate uncertainties introduced by cloning and molecular amplification steps (thus enabling direct assessment of the genome in its native state). In this review, we will discuss how the information contained in genomic-length single DNA molecules can be accessed via physical confinement in nanochannels. Due to self-avoidance interactions, DNA molecules will stretch out when confined in nanochannels, creating a linear unscrolling of the genome along the channel for analysis. We will first review the fundamental physics of DNA nanochannel confinement--including the effect of varying ionic strength--and then discuss recent applications of these systems to genomic mapping. Apart from the intense biological interest in extracting linear sequence information from elongated DNA molecules, from a physics view these systems are fascinating as they enable probing of single-molecule conformation in environments with dimensions that intersect key physical length-scales in the 1 nm to 100 µm range.
SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

PubMed Central

Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

2013-01-01

Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of genetic variation among Turkish olive genotypes revealed by SNPs, AFLPs and SSRs allowed us to characterize the Turkish olive genotype. PMID:24058483
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples.

PubMed

Laird Smith, Melissa; Murrell, Ben; Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E; Kosakovsky Pond, Sergei L; Smith, Davey M

2016-07-01

The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences' Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data.
Second generation genetic linkage map for the gilthead sea bream Sparus aurata L.

PubMed

Tsigenopoulos, Costas S; Louro, Bruno; Chatziplis, Dimitrios; Lagnel, Jacques; Vogiatzi, Emmanouella; Loukovitis, Dimitrios; Franch, Rafaella; Sarropoulou, Elena; Power, Deborah M; Patarnello, Tomaso; Mylonas, Constantinos C; Magoulas, Antonios; Bargelloni, Luca; Canario, Adelino; Kotoulas, Georgios

2014-12-01

An updated second linkage map was constructed for the gilthead sea bream, Sparus aurata L., a fish species of great economic importance for the Mediterranean aquaculture industry. In contrast to the first linkage map which mainly consisted of genomic microsatellites (SSRs), the new linkage map is highly enriched with SSRs found in Expressed Sequence Tags (EST-SSRs), which greatly facilitates comparative mapping with other teleosts. The new map consists of 321 genetic markers in 27 linkage groups (LGs): 232 genomic microsatellites, 85 EST-SSRs and 4 SNPs; of those, 13 markers were linked to LGs but were not ordered. Eleven markers (5 SSRs, 5 EST-SSRs and 1 SNP) are not assigned to any LG. The total length of the sex-averaged map is 1769.7cM, 42% longer than the previously published one, and the number of markers in each LG ranges from 2 to 30. The inter-marker distance varies from 0 to 75.6cM, with an average of 5.75cM. The male and female maps have a length of 1349.2 and 2172.1cM, respectively, and the average distance between markers is 4.38 and 7.05cM, respectively. Comparative mapping with the three-spined stickleback (Gasterosteus acuulatus) chromosomes and scaffolds showed conserved synteny with 132 S. aurata markers (42.9% of those mapped) having a hit on the stickleback genome. Copyright © 2014 Elsevier B.V. All rights reserved.
A new molecular evolution model for limited insertion independent of substitution.

PubMed

Lèbre, Sophie; Michel, Christian J

2013-10-01

We recently introduced a new molecular evolution model called the IDIS model for Insertion Deletion Independent of Substitution [13,14]. In the IDIS model, the three independent processes of substitution, insertion and deletion of residues have constant rates. In order to control the genome expansion during evolution, we generalize here the IDIS model by introducing an insertion rate which decreases when the sequence grows and tends to 0 for a maximum sequence length nmax. This new model, called LIIS for Limited Insertion Independent of Substitution, defines a matrix differential equation satisfied by a vector P(t) describing the sequence content in each residue at evolution time t. An analytical solution is obtained for any diagonalizable substitution matrix M. Thus, the LIIS model gives an expression of the sequence content vector P(t) in each residue under evolution time t as a function of the eigenvalues and the eigenvectors of matrix M, the residue insertion rate vector R, the total insertion rate r, the initial and maximum sequence lengths n0 and nmax, respectively, and the sequence content vector P(t0) at initial time t0. The derivation of the analytical solution is much more technical, compared to the IDIS model, as it involves Gauss hypergeometric functions. Several propositions of the LIIS model are derived: proof that the IDIS model is a particular case of the LIIS model when the maximum sequence length nmax tends to infinity, fixed point, time scale, time step and time inversion. Using a relation between the sequence length l and the evolution time t, an expression of the LIIS model as a function of the sequence length l=n(t) is obtained. Formulas for 'insertion only', i.e. when the substitution rates are all equal to 0, are derived at evolution time t and sequence length l. Analytical solutions of the LIIS model are explicitly derived, as a function of either evolution time t or sequence length l, for two classical substitution matrices: the 3-parameter symmetric substitution matrix [12] (LIIS-SYM3) and the HKY asymmetric substitution matrix[9] (LIIS-HKY). An evaluation of the LIIS model (precisely, LIIS-HKY) based on four statistical analyses of the GC content in complete genomes of four prokaryotic taxonomic groups, namely Chlamydiae, Crenarchaeota, Spirochaetes and Thermotogae, shows the expected improvement from the theory of the LIIS model compared to the IDIS model. Copyright © 2013 Elsevier Inc. All rights reserved.
What is a melody? On the relationship between pitch and brightness of timbre

PubMed Central

Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel

2014-01-01

Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners’ task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities. PMID:24478638
Length and sequence dependence in the association of Huntingtin protein with lipid membranes

NASA Astrophysics Data System (ADS)

Jawahery, Sudi; Nagarajan, Anu; Matysiak, Silvina

2013-03-01

There is a fundamental gap in our understanding of how aggregates of mutant Huntingtin protein (htt) with overextended polyglutamine (polyQ) sequences gain the toxic properties that cause Huntington's disease (HD). Experimental studies have shown that the most important step associated with toxicity is the binding of mutant htt aggregates to lipid membranes. Studies have also shown that flanking amino acid sequences around the polyQ sequence directly affect interactions with the lipid bilayer, and that polyQ sequences of greater than 35 glutamine repeats in htt are a characteristic of HD. The key steps that determine how flanking sequences and polyQ length affect the structure of lipid bilayers remain unknown. In this study, we use atomistic molecular dynamics simulations to study the interactions between lipid membranes of varying compositions and polyQ peptides of varying lengths and flanking sequences. We find that overextended polyQ interactions do cause deformation in model membranes, and that the flanking sequences do play a role in intensifying this deformation by altering the shape of the affected regions.

The Relationship Between X-Ray Radiance and Magnetic Flux

NASA Astrophysics Data System (ADS)

Pevtsov, Alexei A.; Fisher, George H.; Acton, Loren W.; Longcope, Dana W.; Johns-Krull, Christopher M.; Kankelborg, Charles C.; Metcalf, Thomas R.

2003-12-01

We use soft X-ray and magnetic field observations of the Sun (quiet Sun, X-ray bright points, active regions, and integrated solar disk) and active stars (dwarf and pre-main-sequence) to study the relationship between total unsigned magnetic flux, Φ, and X-ray spectral radiance, LX. We find that Φ and LX exhibit a very nearly linear relationship over 12 orders of magnitude, albeit with significant levels of scatter. This suggests a universal relationship between magnetic flux and the power dissipated through coronal heating. If the relationship can be assumed linear, it is consistent with an average volumetric heating rate Q~B/L, where B is the average field strength along a closed field line and L is its length between footpoints. The Φ-LX relationship also indicates that X-rays provide a useful proxy for the magnetic flux on stars when magnetic measurements are unavailable.
Exon–intron organization of genes in the slime mold Physarum polycephalum

PubMed Central

Trzcinska-Danielewicz, Joanna; Fronk, Jan

2000-01-01

The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon–intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon–intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon–intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3′-ends. PMID:10982858
SURVEY AND SUMMARY: exon-intron organization of genes in the slime mold Physarum polycephalum.

PubMed

Trzcinska-Danielewicz, J; Fronk, J

2000-09-15

The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon-intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon-intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon-intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3'-ends.
TypeLoader: A fast and efficient automated workflow for the annotation and submission of novel full-length HLA alleles.

PubMed

Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V

2017-07-01

Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium

PubMed Central

Yang, Fengxi; Zhu, Genfa

2015-01-01

Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral patterning of Cymbidium and supports a valuable resource for molecular breeding of the orchid plant. PMID:26580566
MuffinInfo: HTML5-Based Statistics Extractor from Next-Generation Sequencing Data.

PubMed

Alic, Andy S; Blanquer, Ignacio

2016-09-01

Usually, the information known a priori about a newly sequenced organism is limited. Even resequencing the same organism can generate unpredictable output. We introduce MuffinInfo, a FastQ/Fasta/SAM information extractor implemented in HTML5 capable of offering insights into next-generation sequencing (NGS) data. Our new tool can run on any software or hardware environment, in command line or graphically, and in browser or standalone. It presents information such as average length, base distribution, quality scores distribution, k-mer histogram, and homopolymers analysis. MuffinInfo improves upon the existing extractors by adding the ability to save and then reload the results obtained after a run as a navigable file (also supporting saving pictures of the charts), by supporting custom statistics implemented by the user, and by offering user-adjustable parameters involved in the processing, all in one software. At the moment, the extractor works with all base space technologies such as Illumina, Roche, Ion Torrent, Pacific Biosciences, and Oxford Nanopore. Owing to HTML5, our software demonstrates the readiness of web technologies for mild intensive tasks encountered in bioinformatics.
High-throughput pyrosequencing used for the discovery of a novel cellulase from a thermophilic cellulose-degrading microbial consortium.

PubMed

Zhao, Chao; Chu, Yanan; Li, Yanhong; Yang, Chengfeng; Chen, Yuqing; Wang, Xumin; Liu, Bin

2017-01-01

To analyze the microbial diversity and gene content of a thermophilic cellulose-degrading consortium from hot springs in Xiamen, China using 454 pyrosequencing for discovering cellulolytic enzyme resources. A thermophilic cellulose-degrading consortium, XM70 that was isolated from a hot spring, used sugarcane bagasse as sole carbon and energy source. DNA sequencing of the XM70 sample resulted in 349,978 reads with an average read length of 380 bases, accounting for 133,896,867 bases of sequence information. The characterization of sequencing reads and assembled contigs revealed that most microbes were derived from four phyla: Geobacillus (Firmicutes), Thermus, Bacillus, and Anoxybacillus. Twenty-eight homologous genes belonging to 15 glycoside hydrolase families were detected, including several cellulase genes. A novel hot spring metagenome-derived thermophilic cellulase was expressed and characterized. The application value of thermostable sugarcane bagasse-degrading enzymes is shown for production of cellulosic biofuel. The practical power of using a short-read-based metagenomic approach for harvesting novel microbial genes is also demonstrated.
Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs.

PubMed

Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi

2018-02-12

Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Increased fMRI Sensitivity at Equal Data Burden Using Averaged Shifted Echo Acquisition

PubMed Central

Witt, Suzanne T.; Warntjes, Marcel; Engström, Maria

2016-01-01

There is growing evidence as to the benefits of collecting BOLD fMRI data with increased sampling rates. However, many of the newly developed acquisition techniques developed to collect BOLD data with ultra-short TRs require hardware, software, and non-standard analytic pipelines that may not be accessible to all researchers. We propose to incorporate the method of shifted echo into a standard multi-slice, gradient echo EPI sequence to achieve a higher sampling rate with a TR of <1 s with acceptable spatial resolution. We further propose to incorporate temporal averaging of consecutively acquired EPI volumes to both ameliorate the reduced temporal signal-to-noise inherent in ultra-fast EPI sequences and reduce the data burden. BOLD data were collected from 11 healthy subjects performing a simple, event-related visual-motor task with four different EPI sequences: (1) reference EPI sequence with TR = 1440 ms, (2) shifted echo EPI sequence with TR = 700 ms, (3) shifted echo EPI sequence with every two consecutively acquired EPI volumes averaged and effective TR = 1400 ms, and (4) shifted echo EPI sequence with every four consecutively acquired EPI volumes averaged and effective TR = 2800 ms. Both the temporally averaged sequences exhibited increased temporal signal-to-noise over the shifted echo EPI sequence. The shifted echo sequence with every two EPI volumes averaged also had significantly increased BOLD signal change compared with the other three sequences, while the shifted echo sequence with every four EPI volumes averaged had significantly decreased BOLD signal change compared with the other three sequences. The results indicated that incorporating the method of shifted echo into a standard multi-slice EPI sequence is a viable method for achieving increased sampling rate for collecting event-related BOLD data. Further, consecutively averaging every two consecutively acquired EPI volumes significantly increased the measured BOLD signal change and the subsequently calculated activation map statistics. PMID:27932947
Evaluation of the Terminal Sequencing and Spacing System for Performance Based Navigation Arrivals

NASA Technical Reports Server (NTRS)

Thipphavong, Jane; Jung, Jaewoo; Swenson, Harry N.; Martin, Lynne; Lin, Melody; Nguyen, Jimmy

2013-01-01

NASA has developed the Terminal Sequencing and Spacing (TSS) system, a suite of advanced arrival management technologies combining timebased scheduling and controller precision spacing tools. TSS is a ground-based controller automation tool that facilitates sequencing and merging arrivals that have both current standard ATC routes and terminal Performance-Based Navigation (PBN) routes, especially during highly congested demand periods. In collaboration with the FAA and MITRE's Center for Advanced Aviation System Development (CAASD), TSS system performance was evaluated in human-in-the-loop (HITL) simulations with currently active controllers as participants. Traffic scenarios had mixed Area Navigation (RNAV) and Required Navigation Performance (RNP) equipage, where the more advanced RNP-equipped aircraft had preferential treatment with a shorter approach option. Simulation results indicate the TSS system achieved benefits by enabling PBN, while maintaining high throughput rates-10% above baseline demand levels. Flight path predictability improved, where path deviation was reduced by 2 NM on average and variance in the downwind leg length was 75% less. Arrivals flew more fuel-efficient descents for longer, spending an average of 39 seconds less in step-down level altitude segments. Self-reported controller workload was reduced, with statistically significant differences at the p less than 0.01 level. The RNP-equipped arrivals were also able to more frequently capitalize on the benefits of being "Best-Equipped, Best- Served" (BEBS), where less vectoring was needed and nearly all RNP approaches were conducted without interruption.
Sequencing of real-world samples using a microfabricated hybrid device having unconstrained straight separation channels.

PubMed

Liu, Shaorong; Elkin, Christopher; Kapur, Hitesh

2003-11-01

We describe a microfabricated hybrid device that consists of a microfabricated chip containing multiple twin-T injectors attached to an array of capillaries that serve as the separation channels. A new fabrication process was employed to create two differently sized round channels in a chip. Twin-T injectors were formed by the smaller round channels that match the bore of the separation capillaries and separation capillaries were incorporated to the injectors through the larger round channels that match the outer diameter of the capillaries. This allows for a minimum dead volume and provides a robust chip/capillary interface. This hybrid design takes full advantage, such as sample stacking and purification and uniform signal intensity profile, of the unique chip injection scheme for DNA sequencing while employing long straight capillaries for the separations. In essence, the separation channel length is optimized for both speed and resolution since it is unconstrained by chip size. To demonstrate the reliability and practicality of this hybrid device, we sequenced over 1000 real-world samples from Human Chromosome 5 and Ciona intestinalis, prepared at Joint Genome Institute. We achieved average Phred20 read of 675 bases in about 70 min with a success rate of 91%. For the similar type of samples on MegaBACE 1000, the average Phred20 read is about 550-600 bases in 120 min separation time with a success rate of about 80-90%.
Cloning, characterization, and expression of Cytochrome b ( Cytb)—a key mitochondrial gene from Prorocentrum donghaiense

NASA Astrophysics Data System (ADS)

Zhao, Liyuan; Mi, Tiezhu; Zhen, Yu; Yu, Zhigang

2012-05-01

Mitochondrial cytochrome b (Cytb), one of the few proteins encoded by the mitochondrial DNA, plays an important role in transferring electrons. As a mitochondrial gene, it has been widely used for phylogenetic analysis. Previously, a 949-bp fragment of the coding gene and mRNA editing were characterized from Prorocentrum donghaiense, which might prove useful for resolving P. donghaiense from closely related species. However, the full-length coding region has not been characterized. In this study, we used rapid amplification of cDNA ends (RACE) to obtain full-length, 1 124 bp cDNA. Cytb transcript contained a standard initiation codon ATG, but did not have a recognizable stop codon. Homology comparison showed that the P. donghaiense Cytb had a high sequence identity to Cytb sequences from other dinoflagellate species. Phylogenetic analysis placed Cytb from P. donghaiense in the clade of dinoflagellates and it clustered together strongly with that from P. minimum. Based on the full-length sequence, we inferred 32 editing events at different positions, accounting for 2.93% of the Cytb gene. 34.4% (11) of the changes were A to G, 25% (8) were T to C, and 25% (8) were C to U, with smaller proportions of G to C and G to A edits (9.4% (3) and 6.2% (2), respectively). The expression level of the Cytb transcript was quantified by real-time PCR with a TaqMan probe at different times during the whole growth phase. The average Cytb transcript was present at 39.27±7.46 copies of cDNA per cell during the whole growth cycle, and the expression of Cytb was relatively stable over the different phases. These results deepen our understanding of the structure and characteristics of Cytb in P. donghaiense, and confirmed that Cytb in P. donghaiense is a candidate reference gene for studying the expression of other genes.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples

PubMed Central

Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E.; Kosakovsky Pond, Sergei L.

2016-01-01

Abstract The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences’ Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data. PMID:29492273
Skeletal sequelae of radiation therapy for malignant childhood tumors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Butler, M.S.; Robertson, W.W. Jr.; Rate, W.

1990-02-01

One hundred forty-three patients who received radiation therapy for childhood tumors, and survived to the age of skeletal maturity, were studied by retrospective review of oncology records and roentgenograms. Diagnoses for the patients were the following: Hodgkin's lymphoma (44), Wilms's tumor (30), acute lymphocytic leukemia (26), non-Hodgkin's lymphoma (18), Ewing's sarcoma (nine), rhabdomyosarcoma (six), neuroblastoma (six), and others (four). Age at the follow-up examination averaged 18 years (range, 14-28 years). Average length of follow-up study was 9.9 years (range, two to 18 years). Asymmetry of the chest and ribs was seen in 51 (36%) of these children. Fifty (35%) hadmore » scoliosis; 14 had kyphosis. In two children, the scoliosis was treated with a brace, while one developed significant kyphosing scoliosis after laminectomy and had spinal fusion. Twenty-three (16%) patients complained of significant pain at the radiation sites. Twelve of the patients developed leg-length inequality; eight of those were symptomatic. Three patients developed second primary tumors. Currently, the incidence of significant skeletal sequelae is lower and the manifestations are less severe than reported in the years from 1940 to 1970. The reduction in skeletal complications may be attributed to shielding of growth centers, symmetric field selection, decreased total radiation doses, and sequence changes in chemotherapy.« less
Arc Length Coding by Interference of Theta Frequency Oscillations May Underlie Context-Dependent Hippocampal Unit Data and Episodic Memory Function

ERIC Educational Resources Information Center

Hasselmo, Michael E.

2007-01-01

Many memory models focus on encoding of sequences by excitatory recurrent synapses in region CA3 of the hippocampus. However, data and modeling suggest an alternate mechanism for encoding of sequences in which interference between theta frequency oscillations encodes the position within a sequence based on spatial arc length or time. Arc length…
A space-efficient algorithm for local similarities.

PubMed

Huang, X Q; Hardison, R C; Miller, W

1990-10-01

Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.
LenVarDB: database of length-variant protein domains.

PubMed

Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

2014-01-01

Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Indexing a sequence for mapping reads with a single mismatch.

PubMed

Crochemore, Maxime; Langiu, Alessio; Rahman, M Sohel

2014-05-28

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in O(n log(1+ε) n) time and space and can answer subsequent queries in O(m log log n + K) time. Here, n is the length of the sequence, m is the length of the read, 0<ε<1 and is the optimal output size.
Design of the hairpin ribozyme for targeting specific RNA sequences.

PubMed

Hampel, A; DeYoung, M B; Galasinski, S; Siwkowski, A

1997-01-01

The following steps should be taken when designing the hairpin ribozyme to cleave a specific target sequence: 1. Select a target sequence containing BN*GUC where B is C, G, or U. 2. Select the target sequence in areas least likely to have extensive interfering structure. 3. Design the conventional hairpin ribozyme as shown in Fig. 1, such that it can form a 4 bp helix 2 and helix 1 lengths up to 10 bp. 4. Synthesize this ribozyme from single-stranded DNA templates with a double-stranded T7 promoter. 5. Prepare a series of short substrates capable of forming a range of helix 1 lengths of 5-10 bp. 6. Identify these by direct RNA sequencing. 7. Assay the extent of cleavage of each substrate to identify the optimal length of helix 1. 8. Prepare the hairpin tetraloop ribozyme to determine if catalytic efficiency can be improved.
The Impact of Mutation and Gene Conversion on the Local Diversification of Antigen Genes in African Trypanosomes

PubMed Central

Gjini, Erida; Haydon, Daniel T.; Barry, J. David; Cobbold, Christina A.

2012-01-01

Patterns of genetic diversity in parasite antigen gene families hold important information about their potential to generate antigenic variation within and between hosts. The evolution of such gene families is typically driven by gene duplication, followed by point mutation and gene conversion. There is great interest in estimating the rates of these processes from molecular sequences for understanding the evolution of the pathogen and its significance for infection processes. In this study, a series of models are constructed to investigate hypotheses about the nucleotide diversity patterns between closely related gene sequences from the antigen gene archive of the African trypanosome, the protozoan parasite causative of human sleeping sickness in Equatorial Africa. We use a hidden Markov model approach to identify two scales of diversification: clustering of sequence mismatches, a putative indicator of gene conversion events with other lower-identity donor genes in the archive, and at a sparser scale, isolated mismatches, likely arising from independent point mutations. In addition to quantifying the respective probabilities of occurrence of these two processes, our approach yields estimates for the gene conversion tract length distribution and the average diversity contributed locally by conversion events. Model fitting is conducted using a Bayesian framework. We find that diversifying gene conversion events with lower-identity partners occur at least five times less frequently than point mutations on variant surface glycoprotein (VSG) pairs, and the average imported conversion tract is between 14 and 25 nucleotides long. However, because of the high diversity introduced by gene conversion, the two processes have almost equal impact on the per-nucleotide rate of sequence diversification between VSG subfamily members. We are able to disentangle the most likely locations of point mutations and conversions on each aligned gene pair. PMID:22735079

Characterization of full-length MHC class II sequences in Indonesian and Vietnamese cynomolgus macaques.

PubMed

Creager, Hannah M; Becker, Ericka A; Sandman, Kelly K; Karl, Julie A; Lank, Simon M; Bimber, Benjamin N; Wiseman, Roger W; Hughes, Austin L; O'Connor, Shelby L; O'Connor, David H

2011-09-01

In recent years, the use of cynomolgus macaques in biomedical research has increased greatly. However, with the exception of the Mauritian population, knowledge of the MHC class II genetics of the species remains limited. Here, using cDNA cloning and Sanger sequencing, we identified 127 full-length MHC class II alleles in a group of 12 Indonesian and 12 Vietnamese cynomolgus macaques. Forty two of these were completely novel to cynomolgus macaques while 61 extended the sequence of previously identified alleles from partial to full length. This more than doubles the number of full-length cynomolgus macaque MHC class II alleles available in GenBank, significantly expanding the allele library for the species and laying the groundwork for future evolutionary and functional studies.
An EST dataset for Metasequoia glyptostroboides buds: the first EST resource for molecular genomics studies in Metasequoia.

PubMed

Zhao, Ying; Thammannagowda, Shivegowda; Staton, Margaret; Tang, Sha; Xia, Xinli; Yin, Weilun; Liang, Haiying

2013-03-01

The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomic resources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.
Optimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome

PubMed Central

Ghangal, Rajesh; Chaudhary, Saurabh; Jain, Mukesh; Purty, Ram Singh; Chand Sharma, Prakash

2013-01-01

Seabuckthorn ( Hippophae rhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of the seabuckthorn transcriptome. We assembled 86,253,874 high quality short reads using six assembly tools. At our hand, assembly of non-redundant short reads following a two-step procedure was found to be the best considering various assembly quality parameters. Initially, ABySS tool was used following an additive k-mer approach. The assembled transcripts were subsequently subjected to TGICL suite. Finally, de novo short read assembly yielded 88,297 transcripts (> 100 bp), representing about 53 Mb of seabuckthorn transcriptome. The average length of transcripts was 610 bp, N50 length 1198 BP and 91% of the short reads uniquely mapped back to seabuckthorn transcriptome. A total of 41,340 (46.8%) transcripts showed significant similarity with sequences present in nr protein databases of NCBI (E-value < 1E-06). We also screened the assembled transcripts for the presence of transcription factors and simple sequence repeats. Our strategy involving the use of short read assembler (ABySS) followed by TGICL will be useful for the researchers working with a non-model organism’s transcriptome in terms of saving time and reducing complexity in data management. The seabuckthorn transcriptome data generated here provide a valuable resource for gene discovery and development of functional molecular markers. PMID:23991119
Activation of the ALT pathway for telomere maintenance can affect other sequences in the human genome.

PubMed

Jeyapalan, Jennie N; Varley, Helen; Foxon, Jenny L; Pollock, Raphael E; Jeffreys, Alec J; Henson, Jeremy D; Reddel, Roger R; Royle, Nicola J

2005-07-01

Immortal human cells maintain telomere length by the expression of telomerase or through the alternative lengthening of telomeres (ALT). The ALT mechanism involves a recombination-like process that allows the rapid elongation of shortened telomeres. However, it is not known whether activation of the ALT pathway affects other sequences in the genome. To address this we have investigated, in ALT-expressing cell lines and tumours, the stability of tandem repeat sequences known to mutate via homologous recombination in the human germline. We have shown extraordinary somatic instability in the human minisatellite MS32 (D1S8) in ALT-expressing (ALT+) but not in normal or telomerase-expressing cell lines. The MS32 mutation frequency varied across 15 ALT+ cell lines and was on average 55-fold greater than in ALT- cell lines. The MS32 minisatellite was also highly unstable in three of eight ALT+ soft tissue sarcomas, indicating that somatic destabilization occurs in vivo. The MS32 mutation rates estimated for two ALT+ cell lines were similar to that seen in the germline. However, the internal structures of ALT and germline mutant alleles are very different, indicating differences in the underlying mutation mechanisms. Five other hypervariable minisatellites did not show elevated instability in ALT-expressing cell lines, indicating that minisatellite destabilization is not universal. The elevation of MS32 instability upon activation of the ALT pathway and telomere length maintenance suggests there is overlap between the underlying processes that may be tractable through analysis of the D1S8 locus.
Assessing the performance of the Oxford Nanopore Technologies MinION

PubMed Central

Laver, T.; Harrison, J.; O’Neill, P.A.; Moore, K.; Farbos, A.; Paszkiewicz, K.; Studholme, D.J.

2015-01-01

The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data. PMID:26753127
Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.

PubMed

Bai, Xin; Tang, Kujin; Ren, Jie; Waterman, Michael; Sun, Fengzhu

2017-10-03

Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2 -statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2 , respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1 ,r 2 )+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.
Rate-determining Step of Flap Endonuclease 1 (FEN1) Reflects a Kinetic Bias against Long Flaps and Trinucleotide Repeat Sequences.

PubMed

Tarantino, Mary E; Bilotti, Katharina; Huang, Ji; Delaney, Sarah

2015-08-21

Flap endonuclease 1 (FEN1) is a structure-specific nuclease responsible for removing 5'-flaps formed during Okazaki fragment maturation and long patch base excision repair. In this work, we use rapid quench flow techniques to examine the rates of 5'-flap removal on DNA substrates of varying length and sequence. Of particular interest are flaps containing trinucleotide repeats (TNR), which have been proposed to affect FEN1 activity and cause genetic instability. We report that FEN1 processes substrates containing flaps of 30 nucleotides or fewer at comparable single-turnover rates. However, for flaps longer than 30 nucleotides, FEN1 kinetically discriminates substrates based on flap length and flap sequence. In particular, FEN1 removes flaps containing TNR sequences at a rate slower than mixed sequence flaps of the same length. Furthermore, multiple-turnover kinetic analysis reveals that the rate-determining step of FEN1 switches as a function of flap length from product release to chemistry (or a step prior to chemistry). These results provide a kinetic perspective on the role of FEN1 in DNA replication and repair and contribute to our understanding of FEN1 in mediating genetic instability of TNR sequences. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Integrating De Novo Transcriptome Assembly and Cloning to Obtain Chicken Ovocleidin-17 Full-Length cDNA

PubMed Central

Ning, ZhongHua; Hincke, Maxwell T.; Yang, Ning; Hou, ZhuoCheng

2014-01-01

Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not ‘finished’. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences. PMID:24676480
Integrating de novo transcriptome assembly and cloning to obtain chicken Ovocleidin-17 full-length cDNA.

PubMed

Zhang, Quan; Liu, Long; Zhu, Feng; Ning, ZhongHua; Hincke, Maxwell T; Yang, Ning; Hou, ZhuoCheng

2014-01-01

Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not 'finished'. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences.
Recent Advances in Insect Olfaction, Specifically Regarding the Morphology and Sensory Physiology of Antennal Sensilla of the Female Sphinx Moth Manduca sexta

PubMed Central

SHIELDS, VONNIE D.C.; HILDEBRAND, JOHN G.

2008-01-01

The antennal flagellum of female Manduca sexta bears eight sensillum types: two trichoid, two basiconic, one auriculate, two coeloconic, and one styliform complex sensilla. The first type of trichoid sensillum averages 34 μm in length and is innervated by two sensory cells. The second type averages 26 μm in length and is innervated by either one or three sensory cells. The first type of basiconic sensillum averages 22 μm in length, while the second type averages 15 μm in length. Both types are innervated by three bipolar sensory cells. The auriculate sensillum averages 4 μm in length and is innervated by two bipolar sensory cells. The coeloconic type-A and type-B both average 2 μm in length. The former type is innervated by five bipolar sensory cells, while the latter type, by three bipolar sensory cells. The styliform complex sensillum occurs singly on each annulus and averages 38-40 μm in length. It is formed by several contiguous sensilla. Each unit is innervated by three bipolar sensory cells. A total of 2,216 sensilla were found on a single annulus (annulus 21) of the flagellum. Electrophysiological responses from type-A trichoid sensilla to a large panel of volatile odorants revealed three different subsets of olfactory receptor cells (ORCs). Two subsets responded strongly to only a narrow range of odorants, while the third responded strongly to a broad range of odorants. Anterograde labeling of ORCs from type-A trichoid sensilla revealed that their axons projected mainly to two large female glomeruli of the antennal lobe. PMID:11754510
Long-range correlations and charge transport properties of DNA sequences

NASA Astrophysics Data System (ADS)

Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

2010-04-01

By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5
Universal sequence map (USM) of arbitrary discrete sequences

PubMed Central

2002-01-01

Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567
The Distribution of Lightning Channel Lengths in Northern Alabama Thunderstorms

NASA Technical Reports Server (NTRS)

Peterson, H. S.; Koshak, W. J.

2010-01-01

Lightning is well known to be a major source of tropospheric NOx, and in most cases is the dominant natural source (Huntreiser et al 1998, Jourdain and Hauglustaine 2001). Production of NOx by a segment of a lightning channel is a function of channel segment energy density and channel segment altitude. A first estimate of NOx production by a lightning flash can be found by multiplying production per segment [typically 104 J/m; Hill (1979)] by the total length of the flash s channel. The purpose of this study is to determine average channel length for lightning flashes near NALMA in 2008, and to compare average channel length of ground flashes to the average channel length of cloud flashes.
Self-assembly assisted polymerization (SAAP): approaching long multi-block copolymers with an ordered chain sequence and controllable block length.

PubMed

Wu, Chi; Xie, Zuowei; Zhang, Guangzhao; Zi, Guofu; Tu, Yingfeng; Yang, Yali; Cai, Ping; Nie, Ting

2002-12-07

A combination of polymer physics and synthetic chemistry has enabled us to develop self-assembly assisted polymerization (SAAP), leading to the preparation of long multi-block copolymers with an ordered chain sequence and controllable block lengths.
Survey and Analysis of Microsatellites in the Silkworm, Bombyx mori

PubMed Central

Prasad, M. Dharma; Muthulakshmi, M.; Madhu, M.; Archak, Sunil; Mita, K.; Nagaraju, J.

2005-01-01

We studied microsatellite frequency and distribution in 21.76-Mb random genomic sequences, 0.67-Mb BAC sequences from the Z chromosome, and 6.3-Mb EST sequences of Bombyx mori. We mined microsatellites of ≥15 bases of mononucleotide repeats and ≥5 repeat units of other classes of repeats. We estimated that microsatellites account for 0.31% of the genome of B. mori. Microsatellite tracts of A, AT, and ATT were the most abundant whereas their number drastically decreased as the length of the repeat motif increased. In general, tri- and hexanucleotide repeats were overrepresented in the transcribed sequences except TAA, GTA, and TGA, which were in excess in genomic sequences. The Z chromosome sequences contained shorter repeat types than the rest of the chromosomes in addition to a higher abundance of AT-rich repeats. Our results showed that base composition of the flanking sequence has an influence on the origin and evolution of microsatellites. Transitions/transversions were high in microsatellites of ESTs, whereas the genomic sequence had an equal number of substitutions and indels. The average heterozygosity value for 23 polymorphic microsatellite loci surveyed in 13 diverse silkmoth strains having 2–14 alleles was 0.54. Only 36 (18.2%) of 198 microsatellite loci were polymorphic between the two divergent silkworm populations and 10 (5%) loci revealed null alleles. The microsatellite map generated using these polymorphic markers resulted in 8 linkage groups. B. mori microsatellite loci were the most conserved in its immediate ancestor, B. mandarina, followed by the wild saturniid silkmoth, Antheraea assama. PMID:15371363
Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms.

PubMed

Au, Chun Hang; Wa, Anna; Ho, Dona N; Chan, Tsun Leung; Ma, Edmond S K

2016-01-22

Genomic techniques in recent years have allowed the identification of many mutated genes important in the pathogenesis of acute myeloid leukemia (AML). Together with cytogenetic aberrations, these gene mutations are powerful prognostic markers in AML and can be used to guide patient management, for example selection of optimal post-remission therapy. The mutated genes also hold promise as therapeutic targets themselves. We evaluated the applicability of a gene panel for the detection of AML mutations in a diagnostic molecular pathology laboratory. Fifty patient samples comprising 46 AML and 4 other myeloid neoplasms were accrued for the study. They consisted of 19 males and 31 females at a median age of 60 years (range: 18-88 years). A total of 54 genes (full coding exons of 15 genes and exonic hotspots of 39 genes) were targeted by 568 amplicons that ranged from 225 to 275 bp. The combined coverage was 141 kb in sequence length. Amplicon libraries were prepared by TruSight myeloid sequencing panel (Illumina, CA) and paired-end sequencing runs were performed on a MiSeq (Illumina) genome sequencer. Sequences obtained were analyzed by in-house bioinformatics pipeline, namely BWA-MEM, Samtools, GATK, Pindel, Ensembl Variant Effect Predictor and a novel algorithm ITDseek. The mean count of sequencing reads obtained per sample was 3.81 million and the mean sequencing depth was over 3000X. Seventy-seven mutations in 24 genes were detected in 37 of 50 samples (74 %). On average, 2 mutations (range 1-5) were detected per positive sample. TP53 gene mutations were found in 3 out of 4 patients with complex and unfavorable cytogenetics. Comparing NGS results with that of conventional molecular testing showed a concordance rate of 95.5 %. After further resolution and application of a novel bioinformatics algorithm ITDseek to aid the detection of FLT3 internal tandem duplication (ITD), the concordance rate was revised to 98.2 %. Gene panel testing by NGS approach was applicable for sensitive and accurate detection of actionable AML gene mutations in the clinical laboratory to individualize patient management. A novel algorithm ITDseek was presented that improved the detection of FLT3-ITD of varying length, position and at low allelic burden.
High-Throughput Next-Generation Sequencing of Polioviruses

PubMed Central

Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

2016-01-01

ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
An annotated cDNA library of juvenile Euprymna scolopes with and without colonization by the symbiont Vibrio fischeri

PubMed Central

Chun, Carlene K; Scheetz, Todd E; Bonaldo, Maria de Fatima; Brown, Bartley; Clemens, Anik; Crookes-Goodson, Wendy J; Crouch, Keith; DeMartini, Tad; Eyestone, Mari; Goodson, Michael S; Janssens, Bernadette; Kimbell, Jennifer L; Koropatnick, Tanya A; Kucaba, Tamara; Smith, Christina; Stewart, Jennifer J; Tong, Deyan; Troll, Joshua V; Webster, Sarahrose; Winhall-Rice, Jane; Yap, Cory; Casavant, Thomas L; McFall-Ngai, Margaret J; Soares, M Bento

2006-01-01

Background Biologists are becoming increasingly aware that the interaction of animals, including humans, with their coevolved bacterial partners is essential for health. This growing awareness has been a driving force for the development of models for the study of beneficial animal-bacterial interactions. In the squid-vibrio model, symbiotic Vibrio fischeri induce dramatic developmental changes in the light organ of host Euprymna scolopes over the first hours to days of their partnership. We report here the creation of a juvenile light-organ specific EST database. Results We generated eleven cDNA libraries from the light organ of E. scolopes at developmentally significant time points with and without colonization by V. fischeri. Single pass 3' sequencing efforts generated 42,564 expressed sequence tags (ESTs) of which 35,421 passed our quality criteria and were then clustered via the UIcluster program into 13,962 nonredundant sequences. The cDNA clones representing these nonredundant sequences were sequenced from the 5' end of the vector and 58% of these resulting sequences overlapped significantly with the associated 3' sequence to generate 8,067 contigs with an average sequence length of 1,065 bp. All sequences were annotated with BLASTX (E-value < -03) and Gene Ontology (GO). Conclusion Both the number of ESTs generated from each library and GO categorizations are reflective of the activity state of the light organ during these early stages of symbiosis. Future analyses of the sequences identified in these libraries promise to provide valuable information not only about pathways involved in colonization and early development of the squid light organ, but also about pathways conserved in response to bacterial colonization across the animal kingdom. PMID:16780587
Mining new crystal protein genes from Bacillus thuringiensis on the basis of mixed plasmid-enriched genome sequencing and a computational pipeline.

PubMed

Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming

2012-07-01

We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
Is plant mitochondrial RNA editing a source of phylogenetic incongruence? An answer from in silico and in vivo data sets.

PubMed

Picardi, Ernesto; Quagliariello, Carla

2008-03-26

In plant mitochondria, the post-transcriptional RNA editing process converts C to U at a number of specific sites of the mRNA sequence and usually restores phylogenetically conserved codons and the encoded amino acid residues. Sites undergoing RNA editing evolve at a higher rate than sites not modified by the process. As a result, editing sites strongly affect the evolution of plant mitochondrial genomes, representing an important source of sequence variability and potentially informative characters. To date no clear and convincing evidence has established whether or not editing sites really affect the topology of reconstructed phylogenetic trees. For this reason, we investigated here the effect of RNA editing on the tree building process of twenty different plant mitochondrial gene sequences and by means of computer simulations. Based on our simulation study we suggest that the editing 'noise' in tree topology inference is mainly manifested at the cDNA level. In particular, editing sites tend to confuse tree topologies when artificial genomic and cDNA sequences are generated shorter than 500 bp and with an editing percentage higher than 5.0%. Similar results have been also obtained with genuine plant mitochondrial genes. In this latter instance, indeed, the topology incongruence increases when the editing percentage goes up from about 3.0 to 14.0%. However, when the average gene length is higher than 1,000 bp (rps3, matR and atp1) no differences in the comparison between inferred genomic and cDNA topologies could be detected. Our findings by the here reported in silico and in vivo computer simulation system seem to strongly suggest that editing sites contribute in the generation of misleading phylogenetic trees if the analyzed mitochondrial gene sequence is highly edited (higher than 3.0%) and reduced in length (shorter than 500 bp). In the current lack of direct experimental evidence the results presented here encourage, thus, the use of genomic mitochondrial rather than cDNA sequences for reconstructing phylogenetic events in land plants.

A modification in the technique of computing average lengths from the scales of fishes

USGS Publications Warehouse

Van Oosten, John

1953-01-01

In virtually all the studies that employ scales, otollths, or bony structures to obtain the growth history of fishes, it has been the custom to compute lengths for each individual fish and from these data obtain the average growth rates for any particular group. This method involves a considerable amount of mathematical manipulation, time, and effort. Theoretically it should be possible to obtain the same information simply by averaging the scale measurements for each year of life and the length of the fish employed and computing the average lengths from these data. This method would eliminate all calculations for individual fish. Although Van Oosten (1929: 338) pointed out many years ago the validity of this method of computation, his statements apparently have been overlooked by subsequent investigators.
Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results.

PubMed

Just, Rebecca S; Irwin, Jodi A

2018-05-01

Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2016-02-16

The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius

2015-11-04

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius

2015-09-01

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-09-15

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius

2015-08-18

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat

PubMed Central

2012-01-01

Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs. Conclusion This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping. PMID:22559868
Transcriptome de novo assembly sequencing and analysis of the toxic dinoflagellate Alexandrium catenella using the Illumina platform.

PubMed

Zhang, Shu; Sui, Zhenghong; Chang, Lianpeng; Kang, Kyoungho; Ma, Jinhua; Kong, Fanna; Zhou, Wei; Wang, Jinguo; Guo, Liliang; Geng, Huili; Zhong, Jie; Ma, Qingxia

2014-03-10

In this article, high-throughput de novo transcriptomic sequencing was performed in Alexandrium catenella, which provided the first view of the gene repertoire in this dinoflagellate based on next-generation sequencing (NGS) technologies. A total of 118,304 unigenes were identified with an average length of 673bp (base pair). Of these unigenes, 77,936 (65.9%) were annotated with known proteins based on sequence similarities, among which 24,149 and 22,956 unigenes were assigned to gene ontology categories (GO) and clusters of orthologous groups (COGs), respectively. Furthermore, 16,467 unigenes were mapped onto 322 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). We also detected 1143 simple sequence repeats (SSRs), in which the tri-nucleotide repeat motif (69.3%) was the most abundant. The genetic facts and significance derived from the transcriptome dataset were suggested and discussed. All four core nucleosomal histones and linker histones were detected, in addition to the unigenes involved in histone modifications.190 unigenes were identified as being involved in the endocytosis pathway, and clathrin-dependent endocytosis was suggested to play a role in the heterotrophy of A. catenella. A conserved 22-nt spliced leader (SL) was identified in 21 unigenes which suggested the existence of trans-splicing processing of mRNA in A. catenella. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.
High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2.

PubMed

Abécassis, V; Pompon, D; Truan, G

2000-10-15

The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 +/- 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure-function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.
Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

PubMed Central

2011-01-01

Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684
Trace metals in liver, skin and muscle of Lethrinus lentjan fish species in relation to body length and sex.

PubMed

Al-Yousuf, M H; El-Shahawi; Al-Ghais, S M

2000-07-10

A post-Gulf sea water pollution assessment program was carried out in the liver, skin and muscle tissues of the localized Lethrinus lentjan fish species [Family: Lethrinidae (Teleost)]. Monitoring the concentration of the major heavy metals at different sites along the western coast of the United Arab Emirates (UAE) on the Arabian Gulf was studied. The concentrations of Zn, Cu and Mn were found to follow the order: liver > skin > muscle while the cadmium level follows the sequence: liver > muscle > skin. The influence of fish sex and body length on the metal accumulation of those metals in the tested fish organs was critically investigated. The average metal concentrations in liver, skin and muscle of female fish were found to be higher than those found in the male fish. The detected metal levels were generally similar to previous pre-war, 1991 levels. The study concludes that the marine fish from the Arabian Gulf are comparatively clean and do not constitute a risk for human health.
De Novo Assembly and Characterization of the Transcriptome of the Chinese Medicinal Herb, Gentiana rigescens

PubMed Central

Zhang, Xiaodong; Allan, Andrew C.; Li, Caixia; Wang, Yuanzhong; Yao, Qiuyang

2015-01-01

Gentiana rigescens is an important medicinal herb in China. The main validated medicinal component gentiopicroside is synthesized in shoots, but is mainly found in the plant’s roots. The gentiopicroside biosynthetic pathway and its regulatory control remain to be elucidated. Genome resources of gentian are limited. Next-generation sequencing (NGS) technologies can aid in supplying global gene expression profiles. In this study we present sequence and transcript abundance data for the root and leaf transcriptome of G. rigescens, obtained using the Illumina Hiseq2000. Over fifty million clean reads were obtained from leaf and root libraries. This yields 76,717 unigenes with an average length of 753 bp. Among these, 33,855 unigenes were identified as putative homologs of annotated sequences in public protein and nucleotide databases. Digital abundance analysis identified 3306 unigenes differentially enriched between leaf and root. Unigenes found in both tissues were categorized according to their putative functional categories. Of the differentially expressed genes, over 130 were annotated as related to terpenoid biosynthesis. This work is the first study of global transcriptome analyses in gentian. These sequences and putative functional data comprise a resource for future investigation of terpenoid biosynthesis in Gentianaceae species and annotation of the gentiopicroside biosynthetic pathway and its regulatory mechanisms. PMID:26006235
Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

PubMed Central

Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

2013-01-01

A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698
Tickling the retina: integration of subthreshold electrical pulses can activate retinal neurons

NASA Astrophysics Data System (ADS)

Sekhar, S.; Jalligampala, A.; Zrenner, E.; Rathbun, D. L.

2016-08-01

Objective. The field of retinal prosthetics has made major progress over the last decade, restoring visual percepts to people suffering from retinitis pigmentosa. The stimulation pulses used by present implants are suprathreshold, meaning individual pulses are designed to activate the retina. In this paper we explore subthreshold pulse sequences as an alternate stimulation paradigm. Subthreshold pulses have the potential to address important open problems such as fading of visual percepts when patients are stimulated at moderate pulse repetition rates and the difficulty in preferentially stimulating different retinal pathways. Approach. As a first step in addressing these issues we used Gaussian white noise electrical stimulation combined with spike-triggered averaging to interrogate whether a subthreshold sequence of pulses can be used to activate the mouse retina. Main results. We demonstrate that the retinal network can integrate multiple subthreshold electrical stimuli under an experimental paradigm immediately relevant to retinal prostheses. Furthermore, these characteristic stimulus sequences varied in their shape and integration window length across the population of retinal ganglion cells. Significance. Because the subthreshold sequences activate the retina at stimulation rates that would typically induce strong fading (25 Hz), such retinal ‘tickling’ has the potential to minimize the fading problem. Furthermore, the diversity found across the cell population in characteristic pulse sequences suggests that these sequences could be used to selectively address the different retinal pathways (e.g. ON versus OFF). Both of these outcomes may significantly improve visual perception in retinal implant patients.
Genetic Map Construction and Quantitative Trait Locus (QTL) Detection of Growth-Related Traits in Litopenaeus vannamei for Selective Breeding Applications

PubMed Central

Andriantahina, Farafidy; Liu, Xiaolin; Huang, Hao

2013-01-01

Growth is a priority trait from the point of view of genetic improvement. Molecular markers linked to quantitative trait loci (QTL) have been regarded as useful for marker-assisted selection (MAS) in complex traits as growth. Using an intermediate F2 cross of slow and fast growth parents, a genetic linkage map of Pacific whiteleg shrimp, Litopenaeusvannamei , based on amplified fragment length polymorphisms (AFLP) and simple sequence repeats (SSR) markers was constructed. Meanwhile, QTL analysis was performed for growth-related traits. The linkage map consisted of 451 marker loci (429 AFLPs and 22 SSRs) which formed 49 linkage groups with an average marker space of 7.6 cM; they spanned a total length of 3627.6 cM, covering 79.50% of estimated genome size. 14 QTLs were identified for growth-related traits, including three QTLs for body weight (BW), total length (TL) and partial carapace length (PCL), two QTLs for body length (BL), one QTL for first abdominal segment depth (FASD), third abdominal segment depth (TASD) and first abdominal segment width (FASW), which explained 2.62 to 61.42% of phenotypic variation. Moreover, comparison of linkage maps between L . vannamei and Penaeus japonicus was applied, providing a new insight into the genetic base of QTL affecting the growth-related traits. The new results will be useful for conducting MAS breeding schemes in L . vannamei . PMID:24086466
Multiple symbol partially coherent detection of MPSK

NASA Technical Reports Server (NTRS)

Simon, M. K.; Divsalar, D.

1992-01-01

It is shown that by using the known (or estimated) value of carrier tracking loop signal to noise ratio (SNR) in the decision metric, it is possible to improve the error probability performance of a partially coherent multiple phase-shift-keying (MPSK) system relative to that corresponding to the commonly used ideal coherent decision rule. Using a maximum-likeihood approach, an optimum decision metric is derived and shown to take the form of a weighted sum of the ideal coherent decision metric (i.e., correlation) and the noncoherent decision metric which is optimum for differential detection of MPSK. The performance of a receiver based on this optimum decision rule is derived and shown to provide continued improvement with increasing length of observation interval (data symbol sequence length). Unfortunately, increasing the observation length does not eliminate the error floor associated with the finite loop SNR. Nevertheless, in the limit of infinite observation length, the average error probability performance approaches the algebraic sum of the error floor and the performance of ideal coherent detection, i.e., at any error probability above the error floor, there is no degradation due to the partial coherence. It is shown that this limiting behavior is virtually achievable with practical size observation lengths. Furthermore, the performance is quite insensitive to mismatch between the estimate of loop SNR (e.g., obtained from measurement) fed to the decision metric and its true value. These results may be of use in low-cost Earth-orbiting or deep-space missions employing coded modulations.
Comparing K-mer based methods for improved classification of 16S sequences.

PubMed

Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars

2015-07-01

The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.

Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

PubMed Central

Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

2010-01-01

In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376
De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

PubMed Central

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689
De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

PubMed Central

2011-01-01

Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141
De Novo Sequencing and Characterization of the Floral Transcriptome of Dendrocalamus latiflorus (Poaceae: Bambusoideae)

PubMed Central

Li, De-Zhu; Guo, Zhen-Hua

2012-01-01

Background Transcriptome sequencing can be used to determine gene sequences and transcript abundance in non-model species, and the advent of next-generation sequencing (NGS) technologies has greatly decreased the cost and time required for this process. Transcriptome data are especially desirable in bamboo species, as certain members constitute an economically and culturally important group of mostly semelparous plants with remarkable flowering features, yet little bamboo genomic research has been performed. Here we present, for the first time, extensive sequence and transcript abundance data for the floral transcriptome of a key bamboo species, Dendrocalamus latiflorus, obtained using the Illumina GAII sequencing platform. Our further goal was to identify patterns of gene expression during bamboo flower development. Results Approximately 96 million sequencing reads were generated and assembled de novo, yielding 146,395 high quality unigenes with an average length of 461 bp. Of these, 80,418 were identified as putative homologs of annotated sequences in the public protein databases, of which 290 were associated with the floral transition and 47 were related to flower development. Digital abundance analysis identified 26,529 transcripts differentially enriched between two developmental stages, young flower buds and older developing flowers. Unigenes found at each stage were categorized according to their putative functional categories. These sequence and putative function data comprise a resource for future investigation of the floral transition and flower development in bamboo species. Conclusions Our results present the first broad survey of a bamboo floral transcriptome. Although it will be necessary to validate the functions carried out by these genes, these results represent a starting point for future functional research on D. latiflorus and related species. PMID:22916120
De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

PubMed

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.
Variation in leader length of bitterbrush

Treesearch

Richard L. Hubbard; David. Dunaway

1958-01-01

The estimation of herbage production andÂ· utilization in browse plants has been a problem for many years. Most range technicians have simply estimated the average length of twigs or leaders. then expressed use by deer and livestock as a percentage thereof based on the estimated average length left after grazing. Riordan used this method on mountain mahogany (
Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

PubMed

Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

2016-11-01

It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
The influence of sequence context and length on the kinetics of DNA duplex formation from complementary hairpins possessing (CNG) repeats.

PubMed

Paiva, Anthony M; Sheardy, Richard D

2005-04-20

The formation of unusual structures during DNA replication has been invoked for gene expansion in genomes possessing triplet repeat sequences, CNG, where N = A, C, G, or T. In particular, it has been suggested that the daughter strand of the leading strand partially dissociates from the parent strand and forms a hairpin. The equilibrium between the fully duplexed parent:daugter species and the parent:hairpin species is dependent upon their relative stabilities and the rates of reannealing of the daughter strand back to the parent. These stabilities and rates are ultimately influenced by the sequence context of the DNA and its length. Previous work has demonstrated that longer strands are more stable than shorter strands and that the identity of N also influences the thermal stability [Paiva, A. M.; Sheardy, R. D. Biochemistry 2004, 43, 14218-14227]. Here, we show that the rate of duplex formation from complementary hairpins is also sequence context and length dependent. In particular, longer duplexes have higher activation energies than shorter duplexes of the same sequence context. Further, [(CCG):(GGC)] duplexes have lower activation energies than corresponding [(CAG):(GTC)] duplexes of the same length. Hence, hairpins formed from long CNG sequences are more thermodynamically stable and have slower kinetics for reannealing to their complement than shorter analogues. Gene expansion can now be explained in terms of thermodynamics and kinetics.
Cellulosic nanowhiskers. Theory and application of light scattering from polydisperse spheroids in the Rayleigh-Gans-Debye regime.

PubMed

Braun, Birgit; Dorgan, John R; Chandler, John P

2008-04-01

Mathematical treatment of light scattering within the Rayleigh-Gans-Debye limit for spheroids with polydispersity in both length and diameter is developed and experimentally tested using cellulosic nanowhiskers (CNW). Polydispersity indices are obtained by fitting the theoretical formfactor to experimental data. Good agreement is achieved using a polydispersity of 2.3 for the length, independent of the type of acid used. Diameter polydispersities are 2.1 and 3.0 for sulfuric and hydrochloric acids, respectively. These polydispersities allow the determination of average dimensions from the z-average mean-square radius (z) and the weight-average molecular weight (M w) easily obtained from Berry plots. For cotton linter hydrolyzed by hydrochloric acid, the average length and diameter are 244 and 22 nm. This compares to average length and diameter of 272 and 13 nm for sulfuric acid. This study establishes a new light-scattering methodology as a quick and robust tool for size characterization of polydisperse spheroidal nanoparticles.
A retrotransposable element from the mosquito Anopheles gambiae .

PubMed Central

Besansky, N J

1990-01-01

A family of middle repetitive elements from the African malaria vector Anopheles gambiae is described. Approximately 100 copies of the element, designated T1Ag, are dispersed in the genome. Full-length elements are 4.6 kilobase pairs in length, but truncation of the 5' end is common. Nucleotide sequences of one full-length, two 5'-truncated, and two 5' ends of T1Ag elements were determined and aligned to define a consensus sequence. Sequence analysis revealed two long, overlapping open reading frames followed by a polyadenylation signal, AATAAA, and a tail consisting of tandem repetitions of the motif TGAAA. No direct or inverted long terminal repeats (LTRs) were detected. The first open reading frame, 442 amino acids in length, includes a domain resembling that of nucleic acid-binding proteins. The second open reading frame, 975 amino acids long, resembles the reverse transcriptases of a category of retrotransposable elements without LTRs, variously termed class II retrotransposons, class III elements or non-LTR retrotransposons. Similarity at the sequence and structural levels places T1Ag in this category. Images PMID:1689457
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

PubMed

Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

PubMed Central

Dasenko, Mark A.

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Using videogrammetry and 3D image reconstruction to identify crime suspects

NASA Astrophysics Data System (ADS)

Klasen, Lena M.; Fahlander, Olov

1997-02-01

The anthropometry and movements are unique for every individual human being. We identify persons we know by recognizing the way the look and move. By quantifying these measures and using image processing methods this method can serve as a tool in the work of the police as a complement to the ability of the human eye. The idea is to use virtual 3-D parameterized models of the human body to measure the anthropometry and movements of a crime suspect. The Swedish National Laboratory of Forensic Science in cooperation with SAAB Military Aircraft have developed methods for measuring the lengths of persons from video sequences. However, there is so much unused information in a digital image sequence from a crime scene. The main approach for this paper is to give an overview of the current research project at Linkoping University, Image Coding Group where methods to measure anthropometrical data and movements by using virtual 3-D parameterized models of the person in the crime scene are being developed. The length of an individual might vary up to plus or minus 10 cm depending on whether the person is in upright position or not. When measuring during the best available conditions, the length still varies within plus or minus 1 cm. Using a full 3-D model provides a rich set of anthropometric measures describing the person in the crime scene. Once having obtained such a model the movements can be quantified as well. The results depend strongly on the accuracy of the 3-D model and the strategy of having such an accurate 3-D model is to make one estimate per image frame by using 3-D scene reconstruction, and an averaged 3-D model as the final result from which the anthropometry and movements are calculated.
Minding the gap: Frequency of indels in mtDNA control region sequence data and influence on population genetic analyses

USGS Publications Warehouse

Pearce, J.M.

2006-01-01

Insertions and deletions (indels) result in sequences of various lengths when homologous gene regions are compared among individuals or species. Although indels are typically phylogenetically informative, occurrence and incorporation of these characters as gaps in intraspecific population genetic data sets are rarely discussed. Moreover, the impact of gaps on estimates of fixation indices, such as FST, has not been reviewed. Here, I summarize the occurrence and population genetic signal of indels among 60 published studies that involved alignments of multiple sequences from the mitochondrial DNA (mtDNA) control region of vertebrate taxa. Among 30 studies observing indels, an average of 12% of both variable and parsimony-informative sites were composed of these sites. There was no consistent trend between levels of population differentiation and the number of gap characters in a data block. Across all studies, the average influence on estimates of ??ST was small, explaining only an additional 1.8% of among population variance (range 0.0-8.0%). Studies most likely to observe an increase in ??ST with the inclusion of gap characters were those with < 20 variable sites, but a near equal number of studies with few variable sites did not show an increase. In contrast to studies at interspecific levels, the influence of indels for intraspecific population genetic analyses of control region DNA appears small, dependent upon total number of variable sites in the data block, and related to species-specific characteristics and the spatial distribution of mtDNA lineages that contain indels. ?? 2006 Blackwell Publishing Ltd.
Characterization of four species of Trichuris (Nematoda: Enoplida) by their second internal transcribed spacer ribosomal DNA sequence.

PubMed

Oliveros, R; Cutillas, C; De Rojas, M; Arias, P

2000-12-01

Adult worms of Trichuris ovis and T. globulosa were collected from Ovis aries (sheep) and Capra hircus (goats). T. suis was isolated from Sus scrofa domestica (swine) and T. leporis was isolated from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and a ribosomal internal transcribed spacer (ITS2) was amplified and sequenced using polymerase-chain-reaction (PCR) techniques. The ITS2 of T. ovis and T. globulosa was 407 nucleotides in length and had a GC content of about 62%. Furthermore, the ITS2 of T. suis and T. leporis was 534 and 418 nucleotides in length and had a GC content of about 64.8% and 62.4%, respectively. There was evidence of slight variation in the sequence within individuals of all species analyzed, indicating intraindividual variation in the sequence of different copies of the ribosomal DNA. Furthermore, low-level intraspecific variation was detected. Sequence analyses of ITS2 products of T. ovis and T. globulosa demonstrated no sequence difference between them. Nevertheless, differences were detected between the ITS2 sequences of T. suis, T. leporis, and T. ovis, indicating that Trichuris species can reliably be differentiated by their ITS2 sequences and PCR-linked restriction-fragment-length polymorphism (RFLP).
Stochastic Modeling based on Dictionary Approach for the Generation of Daily Precipitation Occurrences

NASA Astrophysics Data System (ADS)

Panu, U. S.; Ng, W.; Rasmussen, P. F.

2009-12-01

The modeling of weather states (i.e., precipitation occurrences) is critical when the historical data are not long enough for the desired analysis. Stochastic models (e.g., Markov Chain and Alternating Renewal Process (ARP)) of the precipitation occurrence processes generally assume the existence of short-term temporal-dependency between the neighboring states while implying the existence of long-term independency (randomness) of states in precipitation records. Existing temporal-dependent models for the generation of precipitation occurrences are restricted either by the fixed-length memory (e.g., the order of a Markov chain model), or by the reining states in segments (e.g., persistency of homogenous states within dry/wet-spell lengths of an ARP). The modeling of variable segment lengths and states could be an arduous task and a flexible modeling approach is required for the preservation of various segmented patterns of precipitation data series. An innovative Dictionary approach has been developed in the field of genome pattern recognition for the identification of frequently occurring genome segments in DNA sequences. The genome segments delineate the biologically meaningful ``words" (i.e., segments with a specific patterns in a series of discrete states) that can be jointly modeled with variable lengths and states. A meaningful “word”, in hydrology, can be referred to a segment of precipitation occurrence comprising of wet or dry states. Such flexibility would provide a unique advantage over the traditional stochastic models for the generation of precipitation occurrences. Three stochastic models, namely, the alternating renewal process using Geometric distribution, the second-order Markov chain model, and the Dictionary approach have been assessed to evaluate their efficacy for the generation of daily precipitation sequences. Comparisons involved three guiding principles namely (i) the ability of models to preserve the short-term temporal-dependency in data through the concepts of autocorrelation, average mutual information, and Hurst exponent, (ii) the ability of models to preserve the persistency within the homogenous dry/wet weather states through analysis of dry/wet-spell lengths between the observed and generated data, and (iii) the ability to assesses the goodness-of-fit of models through the likelihood estimates (i.e., AIC and BIC). Past 30 years of observed daily precipitation records from 10 Canadian meteorological stations were utilized for comparative analyses of the three models. In general, the Markov chain model performed well. The remainders of the models were found to be competitive from one another depending upon the scope and purpose of the comparison. Although the Markov chain model has a certain advantage in the generation of daily precipitation occurrences, the structural flexibility offered by the Dictionary approach in modeling the varied segment lengths of heterogeneous weather states provides a distinct and powerful advantage in the generation of precipitation sequences.
High coverage of the complete mitochondrial genome of the rare Gray's beaked whale (Mesoplodon grayi) using Illumina next generation sequencing.

PubMed

Thompson, Kirsten F; Patel, Selina; Williams, Liam; Tsai, Peter; Constantine, Rochelle; Baker, C Scott; Millar, Craig D

2016-01-01

Using an Illumina platform, we shot-gun sequenced the complete mitochondrial genome of Gray's beaked whale (Mesoplodon grayi) to an average coverage of 152X. We performed a de novo assembly using SOAPdenovo2 and determined the total mitogenome length to be 16,347 bp. The nucleotide composition was asymmetric (33.3% A, 24.6% C, 12.6% G, 29.5% T) with an overall GC content of 37.2%. The gene organization was similar to that of other cetaceans with 13 protein-coding genes, 2 rRNAs (12S and 16S), 22 predicted tRNAs and 1 control region or D-loop. We found no evidence of heteroplasmy or nuclear copies of mitochondrial DNA in this individual. Beaked whales within the genus Mesoplodon are rarely seen at sea and their basic biology is poorly understood. These data will contribute to resolving the phylogeography and population ecology of this speciose group.
Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

PubMed Central

Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

2014-01-01

• Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

PubMed Central

Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2015-01-01

Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Trends in the components of extreme water levels signal a rotation of winds in strong storms in the eastern Baltic Sea

NASA Astrophysics Data System (ADS)

Pindsoo, Katri; Soomere, Tarmo

2016-04-01

The water level time series and particularly temporal variations in water level extremes usually do not follow any simple rule. Still, the analysis of linear trends in extreme values of surge levels is a convenient tool to obtain a first approximation of the future projections of the risks associated with coastal floodings. We demonstrate how this tool can be used to extract essential information about concealed changes in the forcing factors of seas and oceans. A specific feature of the Baltic Sea is that sequences of even moderate storms may raise the average sea level by almost 1 m for a few weeks. Such events occur once in a few years. They substantially contribute to the extreme water levels in the eastern Baltic Sea: the most devastating coastal floodings occur when a strong storm from unfortunate direction arrives during such an event. We focus on the separation of subtidal (weekly-scale) processes from those which are caused by a single storm and on establishing how much these two kinds of events have contributed to the increase in the extreme water levels in the eastern Baltic Sea. The analysis relies on numerically reconstructed sea levels produced by the RCO (Rossby Center, Swedish Meteorological and Hydrological Institute) ocean model for 1961-2005. The reaction of sea surface to single storm events is isolated from the local water level time series using a running average over a fixed interval. The distribution of average water levels has an almost Gaussian shape for averaging lengths from a few days to a few months. The residual (total water level minus the average) can be interpreted as a proxy of the local storm surges. Interestingly, for the 8-day average this residual almost exactly follows the exponential distribution. Therefore, for this averaging length the heights of local storm surges reflect an underlying Poisson process. This feature is universal for the entire eastern Baltic Sea coast. The slopes of the exponential distribution for low and high water levels are different, vary markedly along the coast and provide a useful quantification of the vulnerability of single coastal segments with respect to coastal flooding. The formal linear trends in the extreme values of these water level components exhibit radically different spatial variations. The slopes of the trends in the weekly average are almost constant (~4 cm/decade for 8-day running average) along the entire eastern Baltic Sea coast. This first of all indicates that the duration of storm sequences has increased. The trends for maxima of local storm surge heights represent almost the entire spatial variability in the water level extremes. Their slopes are almost zero at the open Baltic Proper coasts of the Western Estonian archipelago. Therefore, an increase in wind speed in strong storms is unlikely in this area. In contrast, the slopes in question reach 5-7 cm/decade in the eastern Gulf of Finland and Gulf of Riga. This feature suggests that wind direction in strongest storms may have rotated in the northern Baltic Sea.

Transcriptome analysis of Bupleurum chinense focusing on genes involved in the biosynthesis of saikosaponins

PubMed Central

2011-01-01

Abstract Background Bupleurum chinense DC. is a widely used traditional Chinese medicinal plant. Saikosaponins are the major bioactive constituents of B. chinense, but relatively little is known about saikosaponin biosynthesis. The 454 pyrosequencing technology provides a promising opportunity for finding novel genes that participate in plant metabolism. Consequently, this technology may help to identify the candidate genes involved in the saikosaponin biosynthetic pathway. Results One-quarter of the 454 pyrosequencing runs produced a total of 195, 088 high-quality reads, with an average read length of 356 bases (NCBI SRA accession SRA039388). A de novo assembly generated 24, 037 unique sequences (22, 748 contigs and 1, 289 singletons), 12, 649 (52.6%) of which were annotated against three public protein databases using a basic local alignment search tool (E-value ≤1e-10). All unique sequences were compared with NCBI expressed sequence tags (ESTs) (237) and encoding sequences (44) from the Bupleurum genus, and with a Sanger-sequenced EST dataset (3, 111). The 23, 173 (96.4%) unique sequences obtained in the present study represent novel Bupleurum genes. The ESTs of genes related to saikosaponin biosynthesis were found to encode known enzymes that catalyze the formation of the saikosaponin backbone; 246 cytochrome P450 (P450s) and 102 glycosyltransferases (GTs) unique sequences were also found in the 454 dataset. Full length cDNAs of 7 P450s and 7 uridine diphosphate GTs (UGTs) were verified by reverse transcriptase polymerase chain reaction or by cloning using 5' and/or 3' rapid amplification of cDNA ends. Two P450s and three UGTs were identified as the most likely candidates involved in saikosaponin biosynthesis. This finding was based on the coordinate up-regulation of their expression with β-AS in methyl jasmonate-treated adventitious roots and on their similar expression patterns with β-AS in various B. chinense tissues. Conclusions A collection of high-quality ESTs for B. chinense obtained by 454 pyrosequencing is provided here for the first time. These data should aid further research on the functional genomics of B. chinense and other Bupleurum species. The candidate genes for enzymes involved in saikosaponin biosynthesis, especially the P450s and UGTs, that were revealed provide a substantial foundation for follow-up research on the metabolism and regulation of the saikosaponins. PMID:22047182
Knee, Shoulder, and Fundamentals of Arthroscopic Surgery Training: Validation of a Virtual Arthroscopy Simulator.

PubMed

Tofte, Josef N; Westerlind, Brian O; Martin, Kevin D; Guetschow, Brian L; Uribe-Echevarria, Bastián; Rungprai, Chamnanni; Phisitkul, Phinit

2017-03-01

To validate the knee, shoulder, and virtual Fundamentals of Arthroscopic Training (FAST) modules on a virtual arthroscopy simulator via correlations with arthroscopy case experience and postgraduate year. Orthopaedic residents and faculty from one institution performed a standardized sequence of knee, shoulder, and FAST modules to evaluate baseline arthroscopy skills. Total operation time, camera path length, and composite total score (metric derived from multiple simulator measurements) were compared with case experience and postgraduate level. Values reported are Pearson r; alpha = 0.05. 35 orthopaedic residents (6 per postgraduate year), 2 fellows, and 3 faculty members (2 sports, 1 foot and ankle), including 30 male and 5 female residents, were voluntarily enrolled March to June 2015. Knee: training year correlated significantly with year-averaged knee composite score, r = 0.92, P = .004, 95% confidence interval (CI) = 0.84, 0.96; operation time, r = -0.92, P = .004, 95% CI = -0.96, -0.84; and camera path length, r = -0.97, P = .0004, 95% CI = -0.98, -0.93. Knee arthroscopy case experience correlated significantly with composite score, r = 0.58, P = .0008, 95% CI = 0.27, 0.77; operation time, r = -0.54, P = .002, 95% CI = -0.75, -0.22; and camera path length, r = -0.62, P = .0003, 95% CI = -0.8, -0.33. Shoulder: training year correlated strongly with average shoulder composite score, r = 0.90, P = .006, 95% CI = 0.81, 0.95; operation time, r = -0.94, P = .001, 95% CI = -0.97, -0.89; and camera path length, r = -0.89, P = .007, 95% CI = -0.95, -0.80. Shoulder arthroscopy case experience correlated significantly with average composite score, r = 0.52, P = .003, 95% CI = 0.2, 0.74; strongly with operation time, r = -0.62, P = .0002, 95% CI = -0.8, -0.33; and camera path length, r = -0.37, P = .044, 95% CI = -0.64, -0.01, by training year. FAST: training year correlated significantly with 3 combined FAST activity average composite scores, r = 0.81, P = .0279, 95% CI = 0.65, 0.90; operation times, r = -0.86, P = .012, 95% CI = -0.93, -0.74; and camera path lengths, r = -0.85, P = .015, 95% CI = -0.92, -0.72. Total arthroscopy cases performed did not correlate significantly with overall FAST performance. We found significant correlations between both training year and knee and shoulder arthroscopy experience when compared with performance as measured by composite score, camera path length, and operation time during a simulated diagnostic knee and shoulder arthroscopy, respectively. Three FAST activities demonstrated significant correlations with training year but not arthroscopy case experience as measured by composite score, camera path length, and operation time. We attempt to validate an arthroscopy simulator that could be used to supplement arthroscopy skills training for orthopaedic residents. Copyright © 2016 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.
Completion of full length genome sequence of novel avian paramyxovirus strain APMV/Shimane67 isolated from migratory wild geese in Japan.

PubMed

Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi

2016-11-01

The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.
Molecular identification of Trichuris vulpis and Trichuris suis isolated from different hosts.

PubMed

Cutillas, Cristina; de Rojas, Manuel; Ariza, Concepción; Ubeda, José Manuel; Guevara, Diego

2007-01-01

Trichuris suis was isolated from the cecum of two different hosts (Sus scrofa domestica -- swine and Sus scrofa scrofa -- wild boar) and Trichuris vulpis from dogs in Sevilla, Spain. Genomic DNA was isolated and internal transcribed spacers (ITS)1-5.8S-ITS2 segment from the ribosomal DNA (rDNA) was amplified and sequenced using polymerase chain reaction techniques. The sequence of T. suis from both hosts was 1,396 bp in length while that of T. vulpis was 1,044 bp. ITS1 of both populations isolated of T. suis was 661 nucleotides in length, while the ITS2 was 534 nucleotides in length. Furthermore, the ITS1 of T. vulpis was 410 nucleotides in length, while the ITS2 was 433 nucleotides in length. One hundred fifty-four nucleotides were observed along the 5.8S gene of T. suis and T. vulpis. Intraindividual and intraspecific variations were detected in the rDNA of both species. The presence of microsatellites was observed in all the individuals assayed. Sequence analysis of the ITSs and the 5.8S gene has demonstrated no sequence differences between T. suis isolated from both hosts (S. scrofa domestica -- swine and S. scrofa scrofa -- wild boar). Nevertheless, clear differences were detected between the ITS1 and ITS2 of T. suis and T. vulpis. Furthermore, a comparative molecular analysis between both species and the previously published ITS1-5.8S-ITS2 sequence data of Trichuris ovis, Trichuris leporis, Trichuris muris, Trichuris arvicolae, and Trichuris skrjabini was carried out. A common homology zone was detected in the ITS1 sequence of all species of trichurids.
Molecular cloning and sequence analysis of full-length growth hormone cDNAs from six important economic fishes.

PubMed

Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang

2005-01-01

In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.
Large-Scale SNP Discovery and Genotyping for Constructing a High-Density Genetic Map of Tea Plant Using Specific-Locus Amplified Fragment Sequencing (SLAF-seq)

PubMed Central

Ma, Chun-Lei; Jin, Ji-Qiang; Li, Chun-Fang; Wang, Rong-Kai; Zheng, Hong-Kun; Yao, Ming-Zhe; Chen, Liang

2015-01-01

Genetic maps are important tools in plant genomics and breeding. The present study reports the large-scale discovery of single nucleotide polymorphisms (SNPs) for genetic map construction in tea plant. We developed a total of 6,042 valid SNP markers using specific-locus amplified fragment sequencing (SLAF-seq), and subsequently mapped them into the previous framework map. The final map contained 6,448 molecular markers, distributing on fifteen linkage groups corresponding to the number of tea plant chromosomes. The total map length was 3,965 cM, with an average inter-locus distance of 1.0 cM. This map is the first SNP-based reference map of tea plant, as well as the most saturated one developed to date. The SNP markers and map resources generated in this study provide a wealth of genetic information that can serve as a foundation for downstream genetic analyses, such as the fine mapping of quantitative trait loci (QTL), map-based cloning, marker-assisted selection, and anchoring of scaffolds to facilitate the process of whole genome sequencing projects for tea plant. PMID:26035838
Transcriptome and gene expression analysis during flower blooming in Rosa chinensis 'Pallida'.

PubMed

Yan, Huijun; Zhang, Hao; Chen, Min; Jian, Hongying; Baudino, Sylvie; Caissard, Jean-Claude; Bendahmane, Mohammed; Li, Shubin; Zhang, Ting; Zhou, Ningning; Qiu, Xianqin; Wang, Qigang; Tang, Kaixue

2014-04-25

Rosa chinensis 'Pallida' (Rosa L.) is one of the most important ancient rose cultivars originating from China. It contributed the 'tea scent' trait to modern roses. However, little information is available on the gene regulatory networks involved in scent biosynthesis and metabolism in Rosa. In this study, the transcriptome of R. chinensis 'Pallida' petals at different developmental stages, from flower buds to senescent flowers, was investigated using Illumina sequencing technology. De novo assembly generated 89,614 clusters with an average length of 428bp. Based on sequence similarity search with known proteins, 62.9% of total clusters were annotated. Out of these annotated transcripts, 25,705 and 37,159 sequences were assigned to gene ontology and clusters of orthologous groups, respectively. The dataset provides information on transcripts putatively associated with known scent metabolic pathways. Digital gene expression (DGE) was obtained using RNA samples from flower bud, open flower and senescent flower stages. Comparative DGE and quantitative real time PCR permitted the identification of five transcripts encoding proteins putatively associated with scent biosynthesis in roses. The study provides a foundation for scent-related gene discovery in roses. Copyright © 2014. Published by Elsevier B.V.
In silico analysis of L-asparaginase from different source organisms.

PubMed

Dwivedi, Vivek Dhar; Mishra, Sarad Kumar

2014-06-01

L-asparaginases are widely distributed enzymes among plants, fungi and bacteria. This enzyme catalyzes the conversion of l-asparagine to l-aspartate and ammonia and to a lesser extent the formation of l-glutamate from l-glutamine. In the present study, forty-five full-length amino acid sequences of L-asparaginases from bacteria, fungi and plants were collected and subjected to multiple sequence alignment (MSA), domain identification, discovering individual amino acid composition, and phylogenetic tree construction. MSA revealed that two glycine residues were identically found in all analyzed species, two glycine residues were also identically found in all the fungal and bacterial sources and three glycine residues were identically found in all plant and bacterial sources while no residue was identically found in plant and fungal L-asparaginases. Two major sequence clusters were constructed by phylogenetic analysis. One cluster contains eleven species of fungi, twelve species of bacteria, and one species of plant, whereas the other one contains fourteen species of plant, four species of fungi and three species bacteria. The amino acid composition result revealed that the average frequency of amino acid alanine is 10.77 percent that is very high in comparison to other amino acids in all analyzed species.
Long aftershock sequences within continents and implications for earthquake hazard assessment.

PubMed

Stein, Seth; Liu, Mian

2009-11-05

One of the most powerful features of plate tectonics is that the known plate motions give insight into both the locations and average recurrence interval of future large earthquakes on plate boundaries. Plate tectonics gives no insight, however, into where and when earthquakes will occur within plates, because the interiors of ideal plates should not deform. As a result, within plate interiors, assessments of earthquake hazards rely heavily on the assumption that the locations of small earthquakes shown by the short historical record reflect continuing deformation that will cause future large earthquakes. Here, however, we show that many of these recent earthquakes are probably aftershocks of large earthquakes that occurred hundreds of years ago. We present a simple model predicting that the length of aftershock sequences varies inversely with the rate at which faults are loaded. Aftershock sequences within the slowly deforming continents are predicted to be significantly longer than the decade typically observed at rapidly loaded plate boundaries. These predictions are in accord with observations. So the common practice of treating continental earthquakes as steady-state seismicity overestimates the hazard in presently active areas and underestimates it elsewhere.
Characterization, Genome Sequence, and Analysis of Escherichia Phage CICC 80001, a Bacteriophage Infecting an Efficient L-Aspartic Acid Producing Escherichia coli.

PubMed

Xu, Youqiang; Ma, Yuyue; Yao, Su; Jiang, Zengyan; Pei, Jiangsen; Cheng, Chi

2016-03-01

Escherichia phage CICC 80001 was isolated from the bacteriophage contaminated medium of an Escherichia coli strain HY-05C (CICC 11022S) which could produce L-aspartic acid. The phage had a head diameter of 45-50 nm and a tail of about 10 nm. The one-step growth curve showed a latent period of 10 min and a rise period of about 20 min. The average burst size was about 198 phage particles per infected cell. Tests were conducted on the plaques, multiplicity of infection, and host range. The genome of CICC 80001 was sequenced with a length of 38,810 bp, and annotated. The key proteins leading to host-cell lysis were phylogenetically analyzed. One protein belonged to class II holin, and the other two belonged to the endopeptidase family and N-acetylmuramoyl-L-alanine amidase family, respectively. The genome showed the sequence identity of 82.7% with that of Enterobacteria phage T7, and carried ten unique open reading frames. The bacteriophage resistant E. coli strain designated CICC 11021S was breeding and its L-aspartase activity was 84.4% of that of CICC 11022S.
Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

PubMed Central

Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

2015-01-01

Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
[Complete genome sequencing and analyses of rabies viruses isolated from wild animals (Chinese Ferret-Badger) in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing

2009-08-01

Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers rabies viruses were likely to be street virus that already circulating in wildlife.
A new approach to process control using Instability Index

NASA Astrophysics Data System (ADS)

Weintraub, Jeffrey; Warrick, Scott

2016-03-01

The merits of a robust Statistical Process Control (SPC) methodology have long been established. In response to the numerous SPC rule combinations, processes, and the high cost of containment, the Instability Index (ISTAB) is presented as a tool for managing these complexities. ISTAB focuses limited resources on key issues and provides a window into the stability of manufacturing operations. ISTAB takes advantage of the statistical nature of processes by comparing the observed average run length (OARL) to the expected run length (ARL), resulting in a gap value called the ISTAB index. The ISTAB index has three characteristic behaviors that are indicative of defects in an SPC instance. Case 1: The observed average run length is excessively long relative to expectation. ISTAB > 0 is indicating the possibility that the limits are too wide. Case 2: The observed average run length is consistent with expectation. ISTAB near zero is indicating that the process is stable. Case 3: The observed average run length is inordinately short relative to expectation. ISTAB < 0 is indicating that the limits are too tight, the process is unstable or both. The probability distribution of run length is the basis for establishing an ARL. We demonstrate that the geometric distribution is a good approximation to run length across a wide variety of rule sets. Excessively long run lengths are associated with one kind of defect in an SPC instance; inordinately short run lengths are associated with another. A sampling distribution is introduced as a way to quantify excessively long and inordinately short observed run lengths. This paper provides detailed guidance for action limits on these run lengths. ISTAB as a statistical method of review facilitates automated instability detection. This paper proposes a management system based on ISTAB as an enhancement to more traditional SPC approaches.
Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGx™ forensic genomics system.

PubMed

Wendt, Frank R; Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Ng, Jillian; Oldt, Robert F; McCulloh, Kelly L; Weise, Jessica A; Smith, David Glenn; Kanthaswamy, Sreetharan; Budowle, Bruce

2016-09-01

Forensically-relevant genetic markers were typed for sixty-two Yavapai Native Americans using the ForenSeq™ DNA Signature Prep Kit.These data are invaluable to the human identity community due to the greater genetic differentiation among Native American tribes than among other subdivisions within major populations of the United States. Autosomal, X-chromosomal, and Y-chromosomal short tandem repeat (STR) and identity-informative (iSNPs), ancestry-informative (aSNPs), and phenotype-informative (pSNPs) single nucleotide polymorphism (SNP) allele frequencies are reported. Sequence-based allelic variants were observed in 13 autosomal, 3 X, and 3 Y STRs. These observations increased observed and expected heterozygosities for autosomal STRs by 0.081±0.068 and 0.073±0.063, respectively, and decreased single-locus random match probabilities by 0.051±0.043 for 13 autosomal STRs. The autosomal random match probabilities (RMPs) were 2.37×10-26 and 2.81×10-29 for length-based and sequence-based alleles, respectively. There were 22 and 25 unique Y-STR haplotypes among 26 males, generating haplotype diversities of 0.95 and 0.96, for length-based and sequencebased alleles, respectively. Of the 26 haplotypes generated, 17 were assigned to haplogroup Q, three to haplogroup R1b, two each to haplogroups E1b1b and L, and one each to haplogroups R1a and I1. Male and female sequence-based X-STR random match probabilities were 3.28×10-7 and 1.22×10-6, respectively. The average observed and expected heterozygosities for 94 iSNPs were 0.39±0.12 and 0.39±0.13, respectively, and the combined iSNP RMP was 1.08×10-32. The combined STR and iSNP RMPs were 2.55×10-58 and 3.02×10-61 for length-based and sequence-based STR alleles, respectively. Ancestry and phenotypic SNP information, performed using the ForenSeq™ Universal Analysis Software, predicted black hair, brown eyes, and some probability of East Asian ancestry for all but one sample that clustered between European and Admixed American ancestry on a principal components analysis. These data serve as the first population assessment using the ForenSeq™ panel and highlight the value of employing sequence-based alleles for forensic DNA typing to increase heterozygosity, which is beneficial for identity testing in populations with reduced genetic diversity. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Transcriptome Assembly and Analysis of Tibetan Hulless Barley (Hordeum vulgare L. var. nudum) Developing Grains, with Emphasis on Quality Properties

PubMed Central

Chen, Xin; Long, Hai; Gao, Ping; Deng, Guangbing; Pan, Zhifen; Liang, Junjun; Tang, Yawei; Tashi, Nyima; Yu, Maoqun

2014-01-01

Background Hulless barley is attracting increasing attention due to its unique nutritional value and potential health benefits. However, the molecular biology of the barley grain development and nutrient storage are not well understood. Furthermore, the genetic potential of hulless barley has not been fully tapped for breeding. Methodology/Principal Findings In the present study, we investigated the transcriptome features during hulless barley grain development. Using Illumina paired-end RNA-Sequencing, we generated two data sets of the developing grain transcriptomes from two hulless barley landraces. A total of 13.1 and 12.9 million paired-end reads with lengths of 90 bp were generated from the two varieties and were assembled to 48,863 and 45,788 unigenes, respectively. A combined dataset of 46,485 All-Unigenes were generated from two transcriptomes with an average length of 542 bp, and 36,278 among were annotated with gene descriptions, conserved protein domains or gene ontology terms. Furthermore, sequences and expression levels of genes related to the biosynthesis of storage reserve compounds (starch, protein, and β-glucan) were analyzed, and their temporal and spatial patterns were deduced from the transcriptome data of cultivated barley Morex. Conclusions/Significance We established a sequences and functional annotation integrated database and examined the expression profiles of the developing grains of Tibetan hulless barley. The characterization of genes encoding storage proteins and enzymes of starch synthesis and (1–3;1–4)-β-D-glucan synthesis provided an overview of changes in gene expression associated with grain nutrition and health properties. Furthermore, the characterization of these genes provides a gene reservoir, which helps in quality improvement of hulless barley. PMID:24871534
Stability Mechanisms of Laccase Isoforms using a Modified FoldX Protocol Applicable to Widely Different Proteins.

PubMed

Christensen, Niels J; Kepp, Kasper P

2013-07-09

A recent computational protocol that accurately predicts and rationalizes protein multisite mutant stabilities has been extended to handle widely different isoforms of laccases. We apply the protocol to four isoenzymes of Trametes versicolor laccase (TvL) with variable lengths (498-503 residues) and thermostability (Topt ∼ 45-80 °C) and with 67-77% sequence identity. The extended protocol uses (i) statistical averaging, (ii) a molecular-dynamics-validated "compromise" homology model to minimize bias that causes proteins close in sequence to a structural template to be too stable due to having the benefits of the better sampled template (typically from a crystal structure), (iii) correction for hysteresis that favors the input template to overdestabilize, and (iv) a preparative protocol to provide robust input sequences of equal length. The computed ΔΔG values are in good agreement with the major trends in experimental stabilities; that is, the approach may be applicable for fast estimates of the relative stabilities of proteins with as little as 70% identity, something that is currently extremely challenging. The computed stability changes associated with variations are Gaussian-distributed, in good agreement with experimental distributions of stability effects from mutation. The residues causing the differential stability of the four isoforms are consistent with a range of compiled laccase wild type data, suggesting that we may have identified general drivers of laccase stability. Several sites near Cu, notably 79, 241, and 245, or near substrate, mainly 265, are identified that contribute to stability-function trade-offs, of relevance to the search for new proficient and stable variants of these important industrial enzymes.
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

PubMed Central

2011-01-01

Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Carbohydrate degrading polypeptide and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Peroxidase gene discovery from the horseradish transcriptome.

PubMed

Näätsaari, Laura; Krainer, Florian W; Schubert, Michael; Glieder, Anton; Thallinger, Gerhard G

2014-03-24

Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes.
Peroxidase gene discovery from the horseradish transcriptome

PubMed Central

2014-01-01

Background Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. Results In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. Conclusions This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes. PMID:24666710

Amperometric biosensor based on glassy carbon electrode modified with long-length carbon nanotube and enzyme

NASA Astrophysics Data System (ADS)

Furutaka, Hajime; Nemoto, Kentaro; Inoue, Yuki; Hidaka, Hiroki; Muguruma, Hitoshi; Inoue, Hitoshi; Ohsawa, Tatsuya

2016-05-01

An amperometric biosensor based on a glassy carbon electrode modified with long-length multiwalled carbon nanotubes (MWCNTs) and enzyme nicotinamide-adenine-dinucleotide-dependent glucose dehydrogenase (GDH) is presented. We demonstrate the effect of the MWCNT length on the amperometric response of the enzyme biosensor. The long length of MWCNT is 200 µm (average), whereas the normal length of MWCNT is 1 µm (average). The response of the long MWCNT-GDH electrode is 2 times more sensitive than that of the normal-length MWCNT-GDH electrode in the concentration range from 0.25-35 mM. The result of electrochemical impedance spectroscopy measurements suggest that the long-length MWCNT-GDH electrode formed a better electron transfer network than the normal-length one.
Convolutional encoding of self-dual codes

NASA Technical Reports Server (NTRS)

Solomon, G.

1994-01-01

There exist almost complete convolutional encodings of self-dual codes, i.e., block codes of rate 1/2 with weights w, w = 0 mod 4. The codes are of length 8m with the convolutional portion of length 8m-2 and the nonsystematic information of length 4m-1. The last two bits are parity checks on the two (4m-1) length parity sequences. The final information bit complements one of the extended parity sequences of length 4m. Solomon and van Tilborg have developed algorithms to generate these for the Quadratic Residue (QR) Codes of lengths 48 and beyond. For these codes and reasonable constraint lengths, there are sequential decodings for both hard and soft decisions. There are also possible Viterbi-type decodings that may be simple, as in a convolutional encoding/decoding of the extended Golay Code. In addition, the previously found constraint length K = 9 for the QR (48, 24;12) Code is lowered here to K = 8.
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India.

PubMed

Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N

2016-11-01

Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

PubMed Central

Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.

2016-01-01

Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections. PMID:28361829
Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

PubMed

Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

2014-01-01

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Length estimations of presumed upward connecting leaders in lightning flashes to flat water and flat ground

NASA Astrophysics Data System (ADS)

Stolzenburg, Maribeth; Marshall, Thomas C.; Karunarathne, Sumedhe; Orville, Richard E.

2018-10-01

Using video data recorded at 50,000 frames per second for nearby negative lightning flashes, estimates are derived for the length of positive upward connecting leaders (UCLs) that presumably formed prior to new ground attachments. Return strokes were 1.7 to 7.8 km distant, yielding image resolutions of 4.25 to 19.5 m. No UCLs are imaged in these data, indicating those features were too transient or too dim compared to other lightning processes that are imaged at these resolutions. Upper bound lengths for 17 presumed UCLs are determined from the height above flat ground or water of the successful stepped leader tip in the image immediately prior to (within 20 μs before) the return stroke. Better estimates of maximum UCL lengths are determined using the downward stepped leader tip's speed of advance and the estimated return stroke time within its first frame. For 17 strokes, the upper bound length of the possible UCL averages 31.6 m and ranges from 11.3 to 50.3 m. Among the close strokes (those with spatial resolution <8 m per pixel), the five which connected to water (salt water lagoon) have UCL upper bound estimates averaging significantly shorter (24.1 m) than the average for the three close strokes which connected to land (36.9 m). The better estimates of maximum UCL lengths for the eight close strokes average 20.2 m, with slightly shorter average of 18.3 m for the five that connected to water. All the better estimates of UCL maximum lengths are <38 m in this dataset
Capacity of MIMO free space optical communications using multiple partially coherent beams propagation through non-Kolmogorov strong turbulence.

PubMed

Deng, Peng; Kavehrad, Mohsen; Liu, Zhiwen; Zhou, Zhou; Yuan, Xiuhua

2013-07-01

We study the average capacity performance for multiple-input multiple-output (MIMO) free-space optical (FSO) communication systems using multiple partially coherent beams propagating through non-Kolmogorov strong turbulence, assuming equal gain combining diversity configuration and the sum of multiple gamma-gamma random variables for multiple independent partially coherent beams. The closed-form expressions of scintillation and average capacity are derived and then used to analyze the dependence on the number of independent diversity branches, power law α, refractive-index structure parameter, propagation distance and spatial coherence length of source beams. Obtained results show that, the average capacity increases more significantly with the increase in the rank of MIMO channel matrix compared with the diversity order. The effect of the diversity order on the average capacity is independent of the power law, turbulence strength parameter and spatial coherence length, whereas these effects on average capacity are gradually mitigated as the diversity order increases. The average capacity increases and saturates with the decreasing spatial coherence length, at rates depending on the diversity order, power law and turbulence strength. There exist optimal values of the spatial coherence length and diversity configuration for maximizing the average capacity of MIMO FSO links over a variety of atmospheric turbulence conditions.
An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

PubMed

Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

2011-01-01

cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species

PubMed Central

Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N.

2014-01-01

Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021
Mining candidate genes associated with powdery mildew resistance in cucumber via super-BSA by specific length amplified fragment (SLAF) sequencing.

PubMed

Zhang, Peng; Zhu, Yuqiang; Wang, Lili; Chen, Liping; Zhou, Shengjun

2015-12-14

Powdery mildew (PM) is the most common fungal disease of cucumber and other cucurbit crops, while breeding the PM-resistant materials is the effective way to defense this disease, and the recent development of modern genetics and genomics make us aware of that studying the resistance genes is the essential way to breed the PM high-resistance plant. With the ever increasing throughput of next-generation sequencing (NGS), the development of specific length amplified fragment sequencing (SLAF-seq) as a high-resolution strategy for large-scale de novo SNP discovery is gradually applied for functional gene mining. Here we combined the bulked segregant analysis (BSA) with SLAF-seq to identify candidate genes associated with PM resistance in cucumber. A segregating population comprising 251 F2 individuals was developed using H136 (female parent) as susceptible parent and BK2 (male parent) as resistance donor. After PMR test, total genomic DNA was prepared from each plant. Systemic genomic analysis of the GC content, repeat sequence, etc. was carried out by prediction software SLAF_Predict to establish condition to ensure the uniformity and density of the molecular markers. After samples were gel purified, SLAFs were generated at Biomarker Technologies Corporation in Beijing. Based on SLAF tags and the PMR test result, the hot region were annotated. A total of 73,100 high-quality SLAF tags with an average depth of 99.11× were sequenced. Among these, 5,355 polymorphic tags were identified with a polymorphism rate of 7.34 %, including 7.09 % SNPs and other polymorphism types. Finally, 140 associated SLAFs were identified, and two main Hot Regions were detected on chromosome 1 and 6, which contained five genes invovled in defense response, toxin metabolism, cell stress response, and injury response in cucumber. Associated markers identified by super-BSA in this study, could not only speed up the study of the PMR genes, but also provide a feasible solution for breeding the marker-assisted PMR cucumber. Moreover, this study could also be extended to any other species with reference genome.
8 January 2013 Mw=5.7 North Aegean Sea Earthquake Sequence

NASA Astrophysics Data System (ADS)

Kürçer, Akın; Yalçın, Hilal; Gülen, Levent; Kalafat, Doǧan

2014-05-01

The deformation of the North Aegean Sea is mainly controlled by the westernmost segments of North Anatolian Fault Zone (NAFZ). On January 8, 2013, a moderate earthquake (Mw= 5.7) occurred in the North Aegean Sea, which may be considered to be a part of westernmost splay of the NAFZ. A series of aftershocks were occurred within four months following the mainschock, which have magnitudes varying from 1.9 to 5.0. In this study, a total of 23 earthquake moment tensor solutions that belong to the 2013 earthquake sequence have been obtained by using KOERI and AFAD seismic data. The most widely used Gephart & Forsyth (1984) and Michael (1987) methods have been used to carry out stress tensor inversions. Based on the earthquake moment tensor solutions, distribution of epicenters and seismotectonic setting, the source of this earthquake sequence is a N75°E trending pure dextral strike-slip fault. The temporal and spatial distribution of earthquakes indicate that the rupture unilaterally propagated from SW to NE. The length of the fault has been calculated as approximately 12 km. using the afterschock distribution and empirical equations, suggested by Wells and Coppersmith (1994). The stress tensor analysis indicate that the dominant faulting type in the region is strike-slip and the direction of the regional compressive stress is WNW-ESE. The 1968 Aghios earthquake (Ms=7.3; Ambraseys and Jackson, 1998) and 2013 North Aegean Sea earthquake sequences clearly show that the regional stress has been transferred from SW to NE in this region. The last historical earthquake, the Bozcaada earthquake (M=7.05) had been occurred in the northeast of the 2013 earthquake sequence in 1672. The elapsed time (342 year) and regional stress transfer point out that the 1672 earthquake segment is probably a seismic gap. According to the empirical equations, the surface rupture length of the 1672 Earthquake segment was about 47 km, with a maximum displacement of 170 cm and average displacement of 107 cm. These values indicate that the 1672 earthquake segment is a potential earthquake hazard for this region.
Changes in average length of stay and average charges generated following institution of PSRO review.

PubMed Central

Westphal, M; Frazier, E; Miller, M C

1979-01-01

A five-year review of accounting data at a university hospital shows that immediately following institution of concurrent PSRO admission and length of stay review of Medicare-Medicaid patients, there was a significant decrease in length of stay and a fall in average charges generated per patient against the inflationary trend. Similar changes did not occur for the non-Medicare-Medicaid patients who were not reviewed. The observed changes occurred even though the review procedure rarely resulted in the denial of services to patients, suggesting an indirect effect of review. PMID:393658
Influence of F0 and Sequence Length of Audio and Electroglottographic Signals on Perturbation Measures for Voice Assessment.

PubMed

Hohm, Julian; Döllinger, Michael; Bohr, Christopher; Kniesburges, Stefan; Ziethe, Anke

2015-07-01

Within the functional assessment of voice disorders, an objective analysis of measured parameters from audio, electroglottographic (EGG), or visual signals is desired. In a typical clinical situation, reliable objective analysis is not always possible due to missing standardization and unknown stability of the clinical parameters. The aim of this study was to investigate the robustness/stability of measured clinical parameters of the audio and EGG signals in a typical clinical setting to ensure a reliable objective analysis. In particular, the influence of F0 and of the sequence length on several definitions of jitter and shimmer will be analyzed. Seventy-four young healthy women produced a sustained vowel /a/ and an upward triad with abrupt changeovers. Different sequence lengths (100, 150, 500, and 1000 ms) of sustained phonation and triads (100 and 150 ms) were extracted from the audio and EGG signals. In total, six variations of jitter and four variations of shimmer parameters were analyzed. Jitter%, Jitter11p, and JitterPPQ of the audio signal as well as Jittermean, Shimmer, and Shimmer11p of the EGG signal are unaffected by both sequence length and F0. Influence of F0 and sequence length on several perturbation measures of the audio and EGG signals was identified. For an objective clinical voice assessment, unaffected definitions of jitter and shimmer should be preferred and applied to enable comparability between different recordings, examinations, and studies. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Structural analysis of two length variants of the rDNA intergenic spacer from Eruca sativa.

PubMed

Lakshmikumaran, M; Negi, M S

1994-03-01

Restriction enzyme analysis of the rRNA genes of Eruca sativa indicated the presence of many length variants within a single plant and also between different cultivars which is unusual for most crucifers studied so far. Two length variants of the rDNA intergenic spacer (IGS) from a single individual E. sativa (cv. Itsa) plant were cloned and characterized. The complete nucleotide sequences of both the variants (3 kb and 4 kb) were determined. The intergenic spacer contains three families of tandemly repeated DNA sequences denoted as A, B and C. However, the long (4 kb) variant shows the presence of an additional repeat, denoted as D, which is a duplication of a 224 bp sequence just upstream of the putative transcription initiation site. Repeat units belonging to the three different families (A, B and C) were in the size range of 22 to 30 bp. Such short repeat elements are present in the IGS of most of the crucifers analysed so far. Sequence analysis of the variants (3 kb and 4 kb) revealed that the length heterogeneity of the spacer is located at three different regions and is due to the varying copy numbers of repeat units belonging to families A and B. Length variation of the spacer is also due to the presence of a large duplication (D repeats) in the 4 kb variant which is absent in the 3 kb variant. The putative transcription initiation site was identified by comparisons with the rDNA sequences from other plant species.
DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

PubMed

Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

2010-06-01

A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.
Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

PubMed Central

Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

2006-01-01

Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935
Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data.

PubMed

Parker, Nicolas J; Parker, Andrew G

2008-04-18

The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.
Investigation of timing effects in modified composite quadrupolar echo pulse sequences by mean of average Hamiltonian theory

NASA Astrophysics Data System (ADS)

Mananga, Eugene Stephane

2018-01-01

The utility of the average Hamiltonian theory and its antecedent the Magnus expansion is presented. We assessed the concept of convergence of the Magnus expansion in quadrupolar spectroscopy of spin-1 via the square of the magnitude of the average Hamiltonian. We investigated this approach for two specific modified composite pulse sequences: COM-Im and COM-IVm. It is demonstrated that the size of the square of the magnitude of zero order average Hamiltonian obtained on the appropriated basis is a viable approach to study the convergence of the Magnus expansion. The approach turns to be efficient in studying pulse sequences in general and can be very useful to investigate coherent averaging in the development of high resolution NMR technique in solids. This approach allows comparing theoretically the two modified composite pulse sequences COM-Im and COM-IVm. We also compare theoretically the current modified composite sequences (COM-Im and COM-IVm) to the recently published modified composite pulse sequences (MCOM-I, MCOM-IV, MCOM-I_d, MCOM-IV_d).
Analysis of deep learning methods for blind protein contact prediction in CASP12.

PubMed

Wang, Sheng; Sun, Siqi; Xu, Jinbo

2018-03-01

Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.
High-Density Genetic Linkage Map Construction and Quantitative Trait Locus Mapping for Hawthorn (Crataegus pinnatifida Bunge).

PubMed

Zhao, Yuhui; Su, Kai; Wang, Gang; Zhang, Liping; Zhang, Jijun; Li, Junpeng; Guo, Yinshan

2017-07-14

Genetic linkage maps are an important tool in genetic and genomic research. In this study, two hawthorn cultivars, Qiujinxing and Damianqiu, and 107 progenies from a cross between them were used for constructing a high-density genetic linkage map using the 2b-restriction site-associated DNA (2b-RAD) sequencing method, as well as for mapping quantitative trait loci (QTL) for flavonoid content. In total, 206,411,693 single-end reads were obtained, with an average sequencing depth of 57× in the parents and 23× in the progeny. After quality trimming, 117,896 high-quality 2b-RAD tags were retained, of which 42,279 were polymorphic; of these, 12,951 markers were used for constructing the genetic linkage map. The map contained 17 linkage groups and 3,894 markers, with a total map length of 1,551.97 cM and an average marker interval of 0.40 cM. QTL mapping identified 21 QTLs associated with flavonoid content in 10 linkage groups, which explained 16.30-59.00% of the variance. This is the first high-density linkage map for hawthorn, which will serve as a basis for fine-scale QTL mapping and marker-assisted selection of important traits in hawthorn germplasm and will facilitate chromosome assignment for hawthorn whole-genome assemblies in the future.

cWINNOWER algorithm for finding fuzzy dna motifs

NASA Technical Reports Server (NTRS)

Liang, S.; Samanta, M. P.; Biegel, B. A.

2004-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

NASA Technical Reports Server (NTRS)

Liang, Shoudan

2003-01-01

The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
Draft Genome Sequence of Pseudomonas sp. Strain LFM046, a Producer of Medium-Chain-Length Polyhydroxyalkanoate

PubMed Central

Cardinali-Rezende, Juliana; Alexandrino, Paulo Moises Raduan; Nahat, Rafael Augusto Theodoro Pereira de Souza; Sant’Ana, Débora Parrine Vieira; Silva, Luiziana Ferreira; Gomez, José Gregório Cabrera

2015-01-01

Pseudomonas sp. LFM046 is a medium-chain-length polyhydroxyalkanoate (PHAMCL) producer capable of using various carbon sources (carbohydrates, organic acids, and vegetable oils) and was first isolated from sugarcane cultivation soil in Brazil. The genome sequence was found to be 5.97 Mb long with a G+C content of 66%. PMID:26294616
Genetic linkage map and QTL identification for adventitious rooting traits in red gum eucalypts.

PubMed

Sumathi, Murugan; Bachpai, Vijaya Kumar Waman; Mayavel, A; Dasgupta, Modhumita Ghosh; Nagarajan, Binai; Rajasugunasekar, D; Sivakumar, Veerasamy; Yasodha, Ramasamy

2018-05-01

The eucalypt species, Eucalyptus tereticornis and Eucalyptus camaldulensis , show tolerance to drought and salinity conditions, respectively, and are widely cultivated in arid and semiarid regions of tropical countries. In this study, genetic linkage map was developed for interspecific cross E. tereticornis × E. camaldulensis using pseudo-testcross strategy with simple sequence repeats (SSRs), intersimple sequence repeats (ISSRs), and sequence-related amplified polymorphism (SRAP) markers. The consensus genetic map comprised totally 283 markers with 84 SSRs, 94 ISSRs, and 105 SRAP markers on 11 linkage groups spanning 1163.4 cM genetic distance. Blasting the SSR sequences against E. grandis sequences allowed an alignment of 64% and the average ratio of genetic-to-physical distance was 1.7 Mbp/cM, which strengths the evidence that high amount of synteny and colinearity exists among eucalypts genome. Blast searches also revealed that 37% of SSRs had homologies with genes, which could potentially be used in the variety of downstream applications including candidate gene polymorphism. Quantitative trait loci (QTL) analysis for adventitious rooting traits revealed six QTL for rooting percent and root length on five chromosomes with interval and composite interval mapping. All the QTL explained 12.0-14.7% of the phenotypic variance, showing the involvement of major effect QTL on adventitious rooting traits. Increasing the density of markers would facilitate the detection of more number of small-effect QTL and also underpinning the genes involved in rooting process.
cgDNA: a software package for the prediction of sequence-dependent coarse-grain free energies of B-form DNA.

PubMed

Petkevičiūtė, D; Pasi, M; Gonzalez, O; Maddocks, J H

2014-11-10

cgDNA is a package for the prediction of sequence-dependent configuration-space free energies for B-form DNA at the coarse-grain level of rigid bases. For a fragment of any given length and sequence, cgDNA calculates the configuration of the associated free energy minimizer, i.e. the relative positions and orientations of each base, along with a stiffness matrix, which together govern differences in free energies. The model predicts non-local (i.e. beyond base-pair step) sequence dependence of the free energy minimizer. Configurations can be input or output in either the Curves+ definition of the usual helical DNA structural variables, or as a PDB file of coordinates of base atoms. We illustrate the cgDNA package by comparing predictions of free energy minimizers from (a) the cgDNA model, (b) time-averaged atomistic molecular dynamics (or MD) simulations, and (c) NMR or X-ray experimental observation, for (i) the Dickerson-Drew dodecamer and (ii) three oligomers containing A-tracts. The cgDNA predictions are rather close to those of the MD simulations, but many orders of magnitude faster to compute. Both the cgDNA and MD predictions are in reasonable agreement with the available experimental data. Our conclusion is that cgDNA can serve as a highly efficient tool for studying structural variations in B-form DNA over a wide range of sequences. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Insilico profiling of microRNAs in Korean ginseng (Panax ginseng Meyer)

PubMed Central

Mathiyalagan, Ramya; Subramaniyam, Sathiyamoorthy; Natarajan, Sathishkumar; Kim, Yeon Ju; Sun, Myung Suk; Kim, Se Young; Kim, Yu-Jin; Yang, Deok Chun

2013-01-01

MicroRNAs (miRNAs) are a class of recently discovered non-coding small RNA molecules, on average approximately 21 nucleotides in length, which underlie numerous important biological roles in gene regulation in various organisms. The miRNA database (release 18) has 18,226 miRNAs, which have been deposited from different species. Although miRNAs have been identified and validated in many plant species, no studies have been reported on discovering miRNAs in Panax ginseng Meyer, which is a traditionally known medicinal plant in oriental medicine, also known as Korean ginseng. It has triterpene ginseng saponins called ginsenosides, which are responsible for its various pharmacological activities. Predicting conserved miRNAs by homology-based analysis with available expressed sequence tag (EST) sequences can be powerful, if the species lacks whole genome sequence information. In this study by using the EST based computational approach, 69 conserved miRNAs belonging to 44 miRNA families were identified in Korean ginseng. The digital gene expression patterns of predicted conserved miRNAs were analyzed by deep sequencing using small RNA sequences of flower buds, leaves, and lateral roots. We have found that many of the identified miRNAs showed tissue specific expressions. Using the insilico method, 346 potential targets were identified for the predicted 69 conserved miRNAs by searching the ginseng EST database, and the predicted targets were mainly involved in secondary metabolic processes, responses to biotic and abiotic stress, and transcription regulator activities, as well as a variety of other metabolic processes. PMID:23717176
In Planta Synthesis of Designer-Length Tobacco Mosaic Virus-Based Nano-Rods That Can Be Used to Fabricate Nano-Wires.

PubMed

Saunders, Keith; Lomonossoff, George P

2017-01-01

We have utilized plant-based transient expression to produce tobacco mosaic virus (TMV)-based nano-rods of predetermined lengths. This is achieved by expressing RNAs containing the TMV origin of assembly sequence (OAS) and the sequence of the TMV coat protein either on the same RNA molecule or on two separate constructs. We show that the length of the resulting nano-rods is dependent upon the length of the RNA that possesses the OAS element. By expressing a version of the TMV coat protein that incorporates a metal-binding peptide at its C-terminus in the presence of RNA containing the OAS we have been able to produce nano-rods of predetermined length that are coated with cobalt-platinum. These nano-rods have the properties of defined-length nano-wires that make them ideal for many developing bionanotechnological processes.
In Planta Synthesis of Designer-Length Tobacco Mosaic Virus-Based Nano-Rods That Can Be Used to Fabricate Nano-Wires

PubMed Central

Saunders, Keith; Lomonossoff, George P.

2017-01-01

We have utilized plant-based transient expression to produce tobacco mosaic virus (TMV)-based nano-rods of predetermined lengths. This is achieved by expressing RNAs containing the TMV origin of assembly sequence (OAS) and the sequence of the TMV coat protein either on the same RNA molecule or on two separate constructs. We show that the length of the resulting nano-rods is dependent upon the length of the RNA that possesses the OAS element. By expressing a version of the TMV coat protein that incorporates a metal-binding peptide at its C-terminus in the presence of RNA containing the OAS we have been able to produce nano-rods of predetermined length that are coated with cobalt-platinum. These nano-rods have the properties of defined-length nano-wires that make them ideal for many developing bionanotechnological processes. PMID:28878782
Visual ModuleOrganizer: a graphical interface for the detection and comparative analysis of repeat DNA modules

PubMed Central

2014-01-01

Background DNA repeats, such as transposable elements, minisatellites and palindromic sequences, are abundant in sequences and have been shown to have significant and functional roles in the evolution of the host genomes. In a previous study, we introduced the concept of a repeat DNA module, a flexible motif present in at least two occurences in the sequences. This concept was embedded into ModuleOrganizer, a tool allowing the detection of repeat modules in a set of sequences. However, its implementation remains difficult for larger sequences. Results Here we present Visual ModuleOrganizer, a Java graphical interface that enables a new and optimized version of the ModuleOrganizer tool. To implement this version, it was recoded in C++ with compressed suffix tree data structures. This leads to less memory usage (at least 120-fold decrease in average) and decreases by at least four the computation time during the module detection process in large sequences. Visual ModuleOrganizer interface allows users to easily choose ModuleOrganizer parameters and to graphically display the results. Moreover, Visual ModuleOrganizer dynamically handles graphical results through four main parameters: gene annotations, overlapping modules with known annotations, location of the module in a minimal number of sequences, and the minimal length of the modules. As a case study, the analysis of FoldBack4 sequences clearly demonstrated that our tools can be extended to comparative and evolutionary analyses of any repeat sequence elements in a set of genomic sequences. With the increasing number of sequences available in public databases, it is now possible to perform comparative analyses of repeated DNA modules in a graphic and friendly manner within a reasonable time period. Availability Visual ModuleOrganizer interface and the new version of the ModuleOrganizer tool are freely available at: http://lcb.cnrs-mrs.fr/spip.php?rubrique313. PMID:24678954
Towards predicting the encoding capability of MR fingerprinting sequences.

PubMed

Sommer, K; Amthor, T; Doneva, M; Koken, P; Meineke, J; Börnert, P

2017-09-01

Sequence optimization and appropriate sequence selection is still an unmet need in magnetic resonance fingerprinting (MRF). The main challenge in MRF sequence design is the lack of an appropriate measure of the sequence's encoding capability. To find such a measure, three different candidates for judging the encoding capability have been investigated: local and global dot-product-based measures judging dictionary entry similarity as well as a Monte Carlo method that evaluates the noise propagation properties of an MRF sequence. Consistency of these measures for different sequence lengths as well as the capability to predict actual sequence performance in both phantom and in vivo measurements was analyzed. While the dot-product-based measures yielded inconsistent results for different sequence lengths, the Monte Carlo method was in a good agreement with phantom experiments. In particular, the Monte Carlo method could accurately predict the performance of different flip angle patterns in actual measurements. The proposed Monte Carlo method provides an appropriate measure of MRF sequence encoding capability and may be used for sequence optimization. Copyright © 2017 Elsevier Inc. All rights reserved.
A third-generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map.

PubMed

Solignac, Michel; Mougel, Florence; Vautrin, Dominique; Monnerot, Monique; Cornuet, Jean-Marie

2007-01-01

The honey bee is a key model for social behavior and this feature led to the selection of the species for genome sequencing. A genetic map is a necessary companion to the sequence. In addition, because there was originally no physical map for the honey bee genome project, a meiotic map was the only resource for organizing the sequence assembly on the chromosomes. We present the genetic (meiotic) map here and describe the main features that emerged from comparison with the sequence-based physical map. The genetic map of the honey bee is saturated and the chromosomes are oriented from the centromeric to the telomeric regions. The map is based on 2,008 markers and is about 40 Morgans (M) long, resulting in a marker density of one every 2.05 centiMorgans (cM). For the 186 megabases (Mb) of the genome mapped and assembled, this corresponds to a very high average recombination rate of 22.04 cM/Mb. Honey bee meiosis shows a relatively homogeneous recombination rate along and across chromosomes, as well as within and between individuals. Interference is higher than inferred from the Kosambi function of distance. In addition, numerous recombination hotspots are dispersed over the genome. The very large genetic length of the honey bee genome, its small physical size and an almost complete genome sequence with a relatively low number of genes suggest a very promising future for association mapping in the honey bee, particularly as the existence of haploid males allows easy bulk segregant analysis.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

PubMed

Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

1995-05-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

NASA Technical Reports Server (NTRS)

Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

PubMed Central

2011-01-01

Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
Rapid microsatellite marker development for African mahogany (Khaya senegalensis, Meliaceae) using next-generation sequencing and assessment of its intra-specific genetic diversity.

PubMed

Karan, M; Evans, D S; Reilly, D; Schulte, K; Wright, C; Innes, D; Holton, T A; Nikles, D G; Dickinson, G R

2012-03-01

Khaya senegalensis (African mahogany or dry-zone mahogany) is a high-value hardwood timber species with great potential for forest plantations in northern Australia. The species is distributed across the sub-Saharan belt from Senegal to Sudan and Uganda. Because of heavy exploitation and constraints on natural regeneration and sustainable planting, it is now classified as a vulnerable species. Here, we describe the development of microsatellite markers for K. senegalensis using next-generation sequencing to assess its intra-specific diversity across its natural range, which is a key for successful breeding programs and effective conservation management of the species. Next-generation sequencing yielded 93,943 sequences with an average read length of 234 bp. The assembled sequences contained 1030 simple sequence repeats, with primers designed for 522 microsatellite loci. Twenty-one microsatellite loci were tested with 11 showing reliable amplification and polymorphism in K. senegalensis. The 11 novel microsatellites, together with one previously published, were used to assess 73 accessions belonging to the Australian K. senegalensis domestication program, sampled from across the natural range of the species. STRUCTURE analysis shows two major clusters, one comprising mainly accessions from west Africa (Senegal to Benin) and the second based in the far eastern limits of the range in Sudan and Uganda. Higher levels of genetic diversity were found in material from western Africa. This suggests that new seed collections from this region may yield more diverse genotypes than those originating from Sudan and Uganda in eastern Africa. © 2011 Blackwell Publishing Ltd.
RAPD and Internal Transcribed Spacer Sequence Analyses Reveal Zea nicaraguensis as a Section Luxuriantes Species Close to Zea luxurians

PubMed Central

Wang, Pei; Lu, Yanli; Zheng, Mingmin; Rong, Tingzhao; Tang, Qilin

2011-01-01

Genetic relationship of a newly discovered teosinte from Nicaragua, Zea nicaraguensis with waterlogging tolerance, was determined based on randomly amplified polymorphic DNA (RAPD) markers and the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA using 14 accessions from Zea species. RAPD analysis showed that a total of 5,303 fragments were produced by 136 random decamer primers, of which 84.86% bands were polymorphic. RAPD-based UPGMA analysis demonstrated that the genus Zea can be divided into section Luxuriantes including Zea diploperennis, Zea luxurians, Zea perennis and Zea nicaraguensis, and section Zea including Zea mays ssp. mexicana, Zea mays ssp. parviglumis, Zea mays ssp. huehuetenangensis and Zea mays ssp. mays. ITS sequence analysis showed the lengths of the entire ITS region of the 14 taxa in Zea varied from 597 to 605 bp. The average GC content was 67.8%. In addition to the insertion/deletions, 78 variable sites were recorded in the total ITS region with 47 in ITS1, 5 in 5.8S, and 26 in ITS2. Sequences of these taxa were analyzed with neighbor-joining (NJ) and maximum parsimony (MP) methods to construct the phylogenetic trees, selecting Tripsacum dactyloides L. as the outgroup. The phylogenetic relationships of Zea species inferred from the ITS sequences are highly concordant with the RAPD evidence that resolved two major subgenus clades. Both RAPD and ITS sequence analyses indicate that Zea nicaraguensis is more closely related to Zea luxurians than the other teosintes and cultivated maize, which should be regarded as a section Luxuriantes species. PMID:21525982
Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure

PubMed Central

2013-01-01

Background Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved. Results We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans. Conclusions The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution. PMID:24025428
Isolation and characterization of full-length putative alcohol dehydrogenase genes from polygonum minus

NASA Astrophysics Data System (ADS)

Hamid, Nur Athirah Abd; Ismail, Ismanizan

2013-11-01

Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Queues with Choice via Delay Differential Equations

NASA Astrophysics Data System (ADS)

Pender, Jamol; Rand, Richard H.; Wesson, Elizabeth

Delay or queue length information has the potential to influence the decision of a customer to join a queue. Thus, it is imperative for managers of queueing systems to understand how the information that they provide will affect the performance of the system. To this end, we construct and analyze two two-dimensional deterministic fluid models that incorporate customer choice behavior based on delayed queue length information. In the first fluid model, customers join each queue according to a Multinomial Logit Model, however, the queue length information the customer receives is delayed by a constant Δ. We show that the delay can cause oscillations or asynchronous behavior in the model based on the value of Δ. In the second model, customers receive information about the queue length through a moving average of the queue length. Although it has been shown empirically that giving patients moving average information causes oscillations and asynchronous behavior to occur in U.S. hospitals, we analytically and mathematically show for the first time that the moving average fluid model can exhibit oscillations and determine their dependence on the moving average window. Thus, our analysis provides new insight on how operators of service systems should report queue length information to customers and how delayed information can produce unwanted system dynamics.

RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

PubMed

Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

2016-02-01

During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.
Physical Activity, Physical Fitness and Leukocyte Telomere Length: the Cardiovascular Health Study.

PubMed Central

Soares-Miranda, Luisa; Imamura, Fumiaki; Siscovick, David; Jenny, Nancy Swords; Fitzpatrick, Annette L; Mozaffarian, Dariush

2015-01-01

Introduction The influence of physical activity (PA) and physical fitness (PF) at older ages on changes in telomere length (TL), repetitive DNA sequences that may mark biologic aging, is not well-established. Few prior studies have been conducted in older adults, these were mainly cross-sectional, and few evaluated PF. Methods We investigated cross-sectional and prospective associations of PA and PF with leukocyte TL among 582 older adults (age 73±5 y at baseline) in the Cardiovascular Health Study, having serial TL measures and PA and PF assessed multiple times. Cross-sectional associations were assessed using multivariable repeated-measures regression, in which cumulatively averaged PA and PF measures were related to TL. Longitudinal analyses assessed cumulatively averaged PA and PF against later changes in TL; and changes in cumulatively averaged PA and PF against changes in TL. Results Cross-sectionally, greater walking distance and chair test performance, but not other PA and PF measures, were each associated with longer TL (p-trend=0.007, 0.04 respectively). In longitudinal analyses, no significant associations of baseline PA and PF with change in TL were observed. In contrast, changes in leisure-time activity and chair test performance were each inversely associated with changes in TL. Conclusions Cross-sectional analyses suggest that greater PA and PF are associated with longer TL. Prospective analyses show that changes in PA and PF are associated with differences in changes in TL. Even later in life, changes in certain PA and PF measures are associated with changes in TL, suggesting that leisure-time activity and fitness could reduce leukocyte telomere attrition among older adults. PMID:26083773
The Impact of Averaging Window Length on the “Desaturation” Indexes Obtained Via Overnight Pulse Oximetry at High Altitude

PubMed Central

Cross, Troy J.; Keller-Ross, Manda; Issa, Amine; Wentz, Robert; Taylor, Bryan; Johnson, Bruce

2015-01-01

Study Objectives: To determine the impact of averaging window-length on the “desaturation” indexes (DIs) obtained via overnight pulse oximetry (SpO2) at high altitude. Design: Overnight SpO2 data were collected during a 10-day sojourn at high altitude. SpO2 was obtained using a commercial wrist-worn finger oximeter whose firmware was modified to store unaveraged beat-to-beat data. Simple moving averages of window lengths spanning 2 to 20 cardiac beats were retrospectively applied to beat-to-beat SpO2 datasets. After SpO2 artifacts were removed, the following DIs were then calculated for each of the averaged datasets: oxygen desaturation index (ODI); total sleep time with SpO2 < 80% (TST < 80), and the lowest SpO2 observed during sleep (SpO2 low). Setting: South Base Camp, Mt. Everest (5,364 m elevation). Participants: Five healthy, adult males (35 ± 5 y; 180 ± 1 cm; 85 ± 4 kg). Interventions: N/A. Measurements and Results: 49 datasets were obtained from the 5 participants, totalling 239 hours of data. For all window lengths ≥ 2 beats, ODI and TST < 80 were lower, and SpO2 low was higher than those values obtained from the beat-to-beat SpO2 time series data (P < 0.05). Conclusions: Our findings indicate that increasing oximeter averaging window length progressively underestimates the frequency and magnitude of sleep disordered breathing events at high altitude, as indirectly assessed via the desaturation indexes. Citation: Cross TJ, Keller-Ross M, Issa A, Wentz R, Taylor B, Johnson B. The impact of averaging window length on the “desaturation” indexes obtained via overnight pulse oximetry at high altitude. SLEEP 2015;38(8):1331–1334. PMID:25581919
De Novo Transcriptome of the Hemimetabolous German Cockroach (Blattella germanica)

PubMed Central

Zhou, Xiaojie; Qian, Kun; Tong, Ying; Zhu, Junwei Jerry; Qiu, Xinghui; Zeng, Xiaopeng

2014-01-01

Background The German cockroach, Blattella germanica, is an important insect pest that transmits various pathogens mechanically and causes severe allergic diseases. This insect has long served as a model system for studies of insect biology, physiology and ecology. However, the lack of genome or transcriptome information heavily hinder our further understanding about the German cockroach in every aspect at a molecular level and on a genome-wide scale. To explore the transcriptome and identify unique sequences of interest, we subjected the B. germanica transcriptome to massively parallel pyrosequencing and generated the first reference transcriptome for B. germanica. Methodology/Principal Findings A total of 1,365,609 raw reads with an average length of 529 bp were generated via pyrosequencing the mixed cDNA library from different life stages of German cockroach including maturing oothecae, nymphs, adult females and males. The raw reads were de novo assembled to 48,800 contigs and 3,961 singletons with high-quality unique sequences. These sequences were annotated and classified functionally in terms of BLAST, GO and KEGG, and the genes putatively coding detoxification enzyme systems, insecticide targets, key components in systematic RNA interference, immunity and chemoreception pathways were identified. A total of 3,601 SSRs (Simple Sequence Repeats) loci were also predicted. Conclusions/Significance The whole transcriptome pyrosequencing data from this study provides a usable genetic resource for future identification of potential functional genes involved in various biological processes. PMID:25265537
Draft sequencing and analysis of the genome of pufferfish Takifugu flavidus.

PubMed

Gao, Yang; Gao, Qiang; Zhang, Huan; Wang, Lingling; Zhang, Fuchong; Yang, Chuanyan; Song, Linsheng

2014-12-01

The pufferfish Takifugu flavidus is an important economic species due to its outstanding flavour and high market value. It has been regarded as an excellent model of genetic study for decades as well. In the present study, three mate-pair libraries of T. flavidus genome were sequenced by the SOLiD 4 next-generation sequencing platform, and the draft genome was constructed with the short reads using an assisted assembly strategy. The draft consists of 50,947 scaffolds with an N50 value of 305.7 kb, and the average GC content was 45.2%. The combined length of repetitive sequences was 26.5 Mb, which accounted for 6.87% of the genome, indicating that the compactness of T. flavidus genome was approximative with that of T. rubripes genome. A total of 1,253 non-coding RNA genes and 30,285 protein-encoding genes were assigned to the genome. There were 132,775 and 394 presumptive genes playing roles in the colour pattern variation, the relatively slow growth and the lipid metabolism, respectively. Among them, genes involved in the microtubule-dependent transport system, angiogenesis, decapentaplegic pathway and lipid mobilization were significantly expanded in the T. flavidus genome. This draft genome provides a valuable resource for understanding and improving both fundamental and applied research with pufferfish in the future. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Transcriptome sequencing and annotation of the microalgae Dunaliella tertiolecta: Pathway description and gene discovery for production of next-generation biofuels

PubMed Central

2011-01-01

Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock. PMID:21401935
Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics.

PubMed

Beres, Stephen B; Carroll, Ronan K; Shea, Patrick R; Sitkiewicz, Izabela; Martinez-Gutierrez, Juan Carlos; Low, Donald E; McGeer, Allison; Willey, Barbara M; Green, Karen; Tyrrell, Gregory J; Goldman, Thomas D; Feldgarden, Michael; Birren, Bruce W; Fofanov, Yuriy; Boos, John; Wheaton, William D; Honisch, Christiane; Musser, James M

2010-03-02

Understanding the fine-structure molecular architecture of bacterial epidemics has been a long-sought goal of infectious disease research. We used short-read-length DNA sequencing coupled with mass spectroscopy analysis of SNPs to study the molecular pathogenomics of three successive epidemics of invasive infections involving 344 serotype M3 group A Streptococcus in Ontario, Canada. Sequencing the genome of 95 strains from the three epidemics, coupled with analysis of 280 biallelic SNPs in all 344 strains, revealed an unexpectedly complex population structure composed of a dynamic mixture of distinct clonally related complexes. We discovered that each epidemic is dominated by micro- and macrobursts of multiple emergent clones, some with distinct strain genotype-patient phenotype relationships. On average, strains were differentiated from one another by only 49 SNPs and 11 insertion-deletion events (indels) in the core genome. Ten percent of SNPs are strain specific; that is, each strain has a unique genome sequence. We identified nonrandom temporal-spatial patterns of strain distribution within and between the epidemic peaks. The extensive full-genome data permitted us to identify genes with significantly increased rates of nonsynonymous (amino acid-altering) nucleotide polymorphisms, thereby providing clues about selective forces operative in the host. Comparative expression microarray analysis revealed that closely related strains differentiated by seemingly modest genetic changes can have significantly divergent transcriptomes. We conclude that enhanced understanding of bacterial epidemics requires a deep-sequencing, geographically centric, comparative pathogenomics strategy.
Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Jian; Luo, Mao; Zhu, Ye

2015-03-27

Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries ofmore » untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.« less
Experimental evidence of a φ Josephson junction.

PubMed

Sickinger, H; Lipman, A; Weides, M; Mints, R G; Kohlstedt, H; Koelle, D; Kleiner, R; Goldobin, E

2012-09-07

We demonstrate experimentally the existence of Josephson junctions having a doubly degenerate ground state with an average Josephson phase ψ=±φ. The value of φ can be chosen by design in the interval 0<φ<π. The junctions used in our experiments are fabricated as 0-π Josephson junctions of moderate normalized length with asymmetric 0 and π regions. We show that (a) these φ Josephson junctions have two critical currents, corresponding to the escape of the phase ψ from -φ and +φ states, (b) the phase ψ can be set to a particular state by tuning an external magnetic field, or (c) by using a proper bias current sweep sequence. The experimental observations are in agreement with previous theoretical predictions.
Identification of Medically Important Yeasts Using PCR-Based Detection of DNA Sequence Polymorphisms in the Internal Transcribed Spacer 2 Region of the rRNA Genes

PubMed Central

Chen, Y. C.; Eisner, J. D.; Kattar, M. M.; Rassoulian-Barrett, S. L.; LaFe, K.; Yarfitz, S. L.; Limaye, A. P.; Cookson, B. T.

2000-01-01

Identification of medically relevant yeasts can be time-consuming and inaccurate with current methods. We evaluated PCR-based detection of sequence polymorphisms in the internal transcribed spacer 2 (ITS2) region of the rRNA genes as a means of fungal identification. Clinical isolates (401), reference strains (6), and type strains (27), representing 34 species of yeasts were examined. The length of PCR-amplified ITS2 region DNA was determined with single-base precision in less than 30 min by using automated capillary electrophoresis. Unique, species-specific PCR products ranging from 237 to 429 bp were obtained from 92% of the clinical isolates. The remaining 8%, divided into groups with ITS2 regions which differed by ≤2 bp in mean length, all contained species-specific DNA sequences easily distinguishable by restriction enzyme analysis. These data, and the specificity of length polymorphisms for identifying yeasts, were confirmed by DNA sequence analysis of the ITS2 region from 93 isolates. Phenotypic and ITS2-based identification was concordant for 427 of 434 yeast isolates examined using sequence identity of ≥99%. Seven clinical isolates contained ITS2 sequences that did not agree with their phenotypic identification, and ITS2-based phylogenetic analyses indicate the possibility of new or clinically unusual species in the Rhodotorula and Candida genera. This work establishes an initial database, validated with over 400 clinical isolates, of ITS2 length and sequence polymorphisms for 34 species of yeasts. We conclude that size and restriction analysis of PCR-amplified ITS2 region DNA is a rapid and reliable method to identify clinically significant yeasts, including potentially new or emerging pathogenic species. PMID:10834993
Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

PubMed

Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

2015-12-01

Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
De novo characterization of fall dormant and nondormant alfalfa (Medicago sativa L.) leaf transcriptome and identification of candidate genes related to fall dormancy.

PubMed

Zhang, Senhao; Shi, Yinghua; Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

2015-01-01

Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa.
De Novo Characterization of Fall Dormant and Nondormant Alfalfa (Medicago sativa L.) Leaf Transcriptome and Identification of Candidate Genes Related to Fall Dormancy

PubMed Central

Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

2015-01-01

Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa. PMID:25799491
RNA-seq analysis of Rubus idaeus cv. Nova: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

PubMed

Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

2014-10-01

Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.
De novo sequencing analysis of the Rosa roxburghii fruit transcriptome reveals putative ascorbate biosynthetic genes and EST-SSR markers.

PubMed

Yan, Xiuqin; Zhang, Xue; Lu, Min; He, Yong; An, Huaming

2015-04-25

Rosa roxburghii Tratt. is a well-known ornamental rose species native to China. In addition, the fruits of this species are valued for their nutritional and medicinal characteristics, especially their high ascorbic acid (AsA) levels. Nevertheless, AsA biosynthesis in R. roxburghii fruit has not been explored in detail because of a lack of genomic resources for this species. High-throughput transcriptomic sequencing generating large volumes of transcript sequence data can aid in gene discovery and molecular marker development. In this study, we generated more than 53 million clean reads using Illumina paired-end sequencing technology. De novo assembly yielded 106,590 unigenes, with an average length of 343 bp. On the basis of sequence similarity to known proteins, 9301 and 2393 unigenes were classified into Gene Ontology and Clusters of Orthologous Group categories, respectively. There were 7480 unigenes assigned to 124 pathways in the Kyoto Encyclopedia of Gene and Genome pathway database. BLASTx searches identified 498 unique putative transcripts encoding various transcription factors, some known to regulate fruit development. qRT-PCR validated the expressions of most of the genes encoding the main enzymes involved in ascorbate biosynthesis. In addition, 9131 potential simple sequence repeat (SSR) loci were identified among the unigenes. One hundred and two primer pairs were synthesized and 71 pairs produced an amplification product during initial screening. Among the amplified products, 30 were polymorphic in the 16 R. roxburghii germplasms tested. Our study was the first to produce a large volume of transcriptome data from R. roxburghii. The resulting sequence collection is a valuable resource for gene discovery and marker-assisted selective breeding in this rose species. Copyright © 2015 Elsevier B.V. All rights reserved.
RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

PubMed Central

Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859
Sequence capture of ultraconserved elements from bird museum specimens.

PubMed

McCormack, John E; Tsai, Whitney L E; Faircloth, Brant C

2016-09-01

New DNA sequencing technologies are allowing researchers to explore the genomes of the millions of natural history specimens collected prior to the molecular era. Yet, we know little about how well specific next-generation sequencing (NGS) techniques work with the degraded DNA typically extracted from museum specimens. Here, we use one type of NGS approach, sequence capture of ultraconserved elements (UCEs), to collect data from bird museum specimens as old as 120 years. We targeted 5060 UCE loci in 27 western scrub-jays (Aphelocoma californica) representing three evolutionary lineages that could be species, and we collected an average of 3749 UCE loci containing 4460 single nucleotide polymorphisms (SNPs). Despite older specimens producing fewer and shorter loci in general, we collected thousands of markers from even the oldest specimens. More sequencing reads per individual helped to boost the number of UCE loci we recovered from older specimens, but more sequencing was not as successful at increasing the length of loci. We detected contamination in some samples and determined that contamination was more prevalent in older samples that were subject to less sequencing. For the phylogeny generated from concatenated UCE loci, contamination led to incorrect placement of some individuals. In contrast, a species tree constructed from SNPs called within UCE loci correctly placed individuals into three monophyletic groups, perhaps because of the stricter analytical procedures used for SNP calling. This study and other recent studies on the genomics of museum specimens have profound implications for natural history collections, where millions of older specimens should now be considered genomic resources. © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
RNA-Seq analysis of Cocos nucifera: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

PubMed

Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.
A generalized global alignment algorithm.

PubMed

Huang, Xiaoqiu; Chao, Kun-Mao

2003-01-22

Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.
A multidimensional generalization of Heilbronn's theorem on the average length of a finite continued fraction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Illarionov, A A

2014-03-31

Heilbronn's theorem on the average length of a finite continued fraction is generalized to the multidimensional case in terms of relative minima of the lattices which were introduced by Voronoy and Minkowski. Bibliography: 21 titles.

A survey of the sorghum transcriptome using single-molecule long reads

DOE PAGES

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

2016-06-24

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads

PubMed Central

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

2016-01-01

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
Production of a full-length infectious GFP-tagged cDNA clone of Beet mild yellowing virus for the study of plant-polerovirus interactions.

PubMed

Stevens, Mark; Viganó, Felicita

2007-04-01

The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.
Analysis of the leaf transcriptome of Musa acuminata during interaction with Mycosphaerella musicola: gene assembly, annotation and marker development

PubMed Central

2013-01-01

Background Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. Results The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases. Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. Conclusions A large set of unigenes were characterized in this study for both M. acuminata Calcutta 4 and Cavendish Grande Naine, increasing the number of public domain Musa ESTs. This transcriptome is an invaluable resource for furthering our understanding of biological processes elicited during biotic stresses in Musa. Gene-based markers will facilitate molecular breeding strategies, forming the basis of genetic linkage mapping and analysis of quantitative trait loci. PMID:23379821
Simulation study of entropy production in the one-dimensional Vlasov system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Zongliang, E-mail: liangliang1223@gmail.com; Wang, Shaojie

2016-07-15

The coarse-grain averaged distribution function of the one-dimensional Vlasov system is obtained by numerical simulation. The entropy productions in cases of the random field, the linear Landau damping, and the bump-on-tail instability are computed with the coarse-grain averaged distribution function. The computed entropy production is converged with increasing length of coarse-grain average. When the distribution function differs slightly from a Maxwellian distribution, the converged value agrees with the result computed by using the definition of thermodynamic entropy. The length of the coarse-grain average to compute the coarse-grain averaged distribution function is discussed.
Seasonal differences in the testicular transcriptome profile of free-living European beavers (Castor fiber L.) determined by the RNA-Seq method

PubMed Central

Paukszto, Łukasz; Jastrzębski, Jan P.; Czerwińska, Joanna; Chojnowska, Katarzyna; Kamińska, Barbara; Kurzyńska, Aleksandra; Smolińska, Nina; Giżejewski, Zygmunt; Kamiński, Tadeusz

2017-01-01

The European beaver (Castor fiber L.) is an important free-living rodent that inhabits Eurasian temperate forests. Beavers are often referred to as ecosystem engineers because they create or change existing habitats, enhance biodiversity and prepare the environment for diverse plant and animal species. Beavers are protected in most European Union countries, but their genomic background remains unknown. In this study, gene expression patterns in beaver testes and the variations in genetic expression in breeding and non-breeding seasons were determined by high-throughput transcriptome sequencing. Paired-end sequencing in the Illumina HiSeq 2000 sequencer produced a total of 373.06 million of high-quality reads. De novo assembly of contigs yielded 130,741 unigenes with an average length of 1,369.3 nt, N50 value of 1,734, and average GC content of 46.51%. A comprehensive analysis of the testicular transcriptome revealed more than 26,000 highly expressed unigenes which exhibited the highest homology with Rattus norvegicus and Ictidomys tridecemlineatus genomes. More than 8,000 highly expressed genes were found to be involved in fundamental biological processes, cellular components or molecular pathways. The study also revealed 42 genes whose regulation differed between breeding and non-breeding seasons. During the non-breeding period, the expression of 37 genes was up-regulated, and the expression of 5 genes was down-regulated relative to the breeding season. The identified genes encode molecules which are involved in signaling transduction, DNA repair, stress responses, inflammatory processes, metabolism and steroidogenesis. Our results pave the way for further research into season-dependent variations in beaver testes. PMID:28678806
Concentration and diversity of uncultured Legionella spp. in two unchlorinated drinking water supplies with different concentrations of natural organic matter.

PubMed

Wullings, Bart A; Bakker, Geo; van der Kooij, Dick

2011-01-01

Two unchlorinated drinking water supplies were investigated to assess the potential of water treatment and distribution systems to support the growth of Legionella spp. The treatment plant for supply A distributed treated groundwater with a low concentration (<0.5 ppm of C) of natural organic matter (NOM), and the treatment plant for supply B distributed treated groundwater with a high NOM concentration (8 ppm of C). In both supplies, the water temperature ranged from about 10°C after treatment to 18°C during distribution. The concentrations of Legionella spp. in distributed water, analyzed with quantitative PCR (Q-PCR), averaged 2.9 (± 1.9) × 10(2) cells liter(-1) in supply A and 2.5 (± 1.6) × 10(3) cells liter(-1) in supply B. No Legionella was observed with the culture method. A total of 346 clones (96 operational taxonomical units [OTUs] with ≥97% sequence similarity) were retrieved from water and biofilms of supply A and 251 (43 OTUs) from supply B. The estimation of the average value of total species richness (Chao1) in supply A (153) was clearly higher than that for supply B (58). In each supply, about 77% of the sequences showed <97% similarity to described species. Sequences related to L. pneumophila were only incidentally observed. The Legionella populations of the two supplies are divided into two distinct clusters based on distances in the phylogenetic tree as fractions of the branch length. Thus, a large variety of mostly yet-undescribed Legionella spp. proliferates in unchlorinated water supplies at temperatures below 18°C. The lowest concentration and greatest diversity were observed in the supply with the low NOM concentration.
Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

PubMed

Conway, Tyrrell; Creecy, James P; Maddox, Scott M; Grissom, Joe E; Conkle, Trevor L; Shadid, Tyler M; Teramoto, Jun; San Miguel, Phillip; Shimada, Tomohiro; Ishihama, Akira; Mori, Hirotada; Wanner, Barry L

2014-07-08

We analyzed the transcriptome of Escherichia coli K-12 by strand-specific RNA sequencing at single-nucleotide resolution during steady-state (logarithmic-phase) growth and upon entry into stationary phase in glucose minimal medium. To generate high-resolution transcriptome maps, we developed an organizational schema which showed that in practice only three features are required to define operon architecture: the promoter, terminator, and deep RNA sequence read coverage. We precisely annotated 2,122 promoters and 1,774 terminators, defining 1,510 operons with an average of 1.98 genes per operon. Our analyses revealed an unprecedented view of E. coli operon architecture. A large proportion (36%) of operons are complex with internal promoters or terminators that generate multiple transcription units. For 43% of operons, we observed differential expression of polycistronic genes, despite being in the same operons, indicating that E. coli operon architecture allows fine-tuning of gene expression. We found that 276 of 370 convergent operons terminate inefficiently, generating complementary 3' transcript ends which overlap on average by 286 nucleotides, and 136 of 388 divergent operons have promoters arranged such that their 5' ends overlap on average by 168 nucleotides. We found 89 antisense transcripts of 397-nucleotide average length, 7 unannotated transcripts within intergenic regions, and 18 sense transcripts that completely overlap operons on the opposite strand. Of 519 overlapping transcripts, 75% correspond to sequences that are highly conserved in E. coli (>50 genomes). Our data extend recent studies showing unexpected transcriptome complexity in several bacteria and suggest that antisense RNA regulation is widespread. Importance: We precisely mapped the 5' and 3' ends of RNA transcripts across the E. coli K-12 genome by using a single-nucleotide analytical approach. Our resulting high-resolution transcriptome maps show that ca. one-third of E. coli operons are complex, with internal promoters and terminators generating multiple transcription units and allowing differential gene expression within these operons. We discovered extensive antisense transcription that results from more than 500 operons, which fully overlap or extensively overlap adjacent divergent or convergent operons. The genomic regions corresponding to these antisense transcripts are highly conserved in E. coli (including Shigella species), although it remains to be proven whether or not they are functional. Our observations of features unearthed by single-nucleotide transcriptome mapping suggest that deeper layers of transcriptional regulation in bacteria are likely to be revealed in the future. Copyright © 2014 Conway et al.
Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

PubMed

Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

2012-07-01

Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Unit and internal chain profiles of maca amylopectin.

PubMed

Zhang, Ling; Li, Guantian; Yao, Weirong; Zhu, Fan

2018-03-01

Unit chain length distributions of amylopectin and its φ, β-limit dextrins, which reflect amylopectin internal structure from three maca starches, were determined by high-performance anion-exchange chromatography with pulsed amperometric detection after debranching, and the samples were compared with maize starch. The amylopectins exhibited average chain lengths ranging from 16.72 to 17.16, with ranges of total internal chain length, external chain length, and internal chain length of the maca amylopectins at 12.49 to 13.68, 11.24 to 11.89, and 4.27 to 4.48. The average chain length, external chain length, internal chain length, and total internal chain length were comparable in three maca amylopectins. Amylopectins of the three maca genotypes studied here presented no significant differences in their unit chain length profiles, but did show significant differences in their internal chain profiles. Additional genetic variations between different maca genotypes need to be studied to provide unit- and internal chain profiles of maca amylopectin. Copyright © 2017. Published by Elsevier Ltd.
Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

PubMed

Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

2010-03-30

The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

PubMed Central

Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

2009-01-01

Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

PubMed

Luczak, Brian B; James, Benjamin T; Girgis, Hani Z

2017-12-06

Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.
Physical Properties of Umbral Dots Observed in Sunspots: A Hinode Observation

NASA Astrophysics Data System (ADS)

Yadav, Rahul; Mathew, Shibu K.

2018-04-01

Umbral dots (UDs) are small-scale bright features observed in the umbral part of sunspots and pores. It is well established that they are manifestations of magnetoconvection phenomena inside umbrae. We study the physical properties of UDs in different sunspots and their dependence on decay rate and filling factor. We have selected high-resolution, G-band continuum filtergrams of seven sunspots from Hinode to study their physical properties. We have also used Michelson Doppler Imager (MDI) continuum images to estimate the decay rate of selected sunspots. An identification and tracking algorithm was developed to identify the UDs in time sequences. The statistical analysis of UDs exhibits an averaged maximum intensity and effective diameter of 0.26 I_{QS} and 270 km. Furthermore, the lifetime, horizontal speed, trajectory length, and displacement length (birth-death distance) of UDs are 8.19 minutes, 0.5 km s-1, 284 km, and 155 km, respectively. We also find a positive correlation between intensity-diameter, intensity-lifetime, and diameter-lifetime of UDs. However, UD properties do not show any significant relation with the decay rate or filling factor.
Dynamic probability control limits for risk-adjusted Bernoulli CUSUM charts.

PubMed

Zhang, Xiang; Woodall, William H

2015-11-10

The risk-adjusted Bernoulli cumulative sum (CUSUM) chart developed by Steiner et al. (2000) is an increasingly popular tool for monitoring clinical and surgical performance. In practice, however, the use of a fixed control limit for the chart leads to a quite variable in-control average run length performance for patient populations with different risk score distributions. To overcome this problem, we determine simulation-based dynamic probability control limits (DPCLs) patient-by-patient for the risk-adjusted Bernoulli CUSUM charts. By maintaining the probability of a false alarm at a constant level conditional on no false alarm for previous observations, our risk-adjusted CUSUM charts with DPCLs have consistent in-control performance at the desired level with approximately geometrically distributed run lengths. Our simulation results demonstrate that our method does not rely on any information or assumptions about the patients' risk distributions. The use of DPCLs for risk-adjusted Bernoulli CUSUM charts allows each chart to be designed for the corresponding particular sequence of patients for a surgeon or hospital. Copyright © 2015 John Wiley & Sons, Ltd.
Analysis of the genome of fish lymphocystis disease virus isolated directly from epidermal tumours of pleuronectes.

PubMed

Darai, G; Anders, K; Koch, H G; Delius, H; Gelderblom, H; Samalecos, C; Flügel, R M

1983-04-30

Virions of fish lymphocystis disease virus (FLDV), a member of the iridovirus family, were isolated directly from lymphocystis disease lesions of individual flatfishes and purified by sucrose and subsequent cesium chloride gradient centrifugation to homogeneity as judged by electron microscopy. The isolated FLDV DNAs appear to be heterogeneous in size. Contour length measurements of 43 DNA molecules gave an average length of 49 +/- 23 microns, corresponding to 93 +/- 44 X 10(6) D. Molecular weight estimations of FLDV DNA by restriction enzyme analysis resulted in only 64.8 X 10(6) D indicating an excess length of the DNA of about 50%. FLDV DNA was sensitive to lambda 5'-exonuclease and to E. coli 3'-exonuclease III without preference of any one terminal DNA restriction fragment. Denaturation and reannealing experiments of FLDV DNA resulted in the formation of circular DNA molecules of 34.25 microns contour length (= 65.22 X 10(6) D). This result suggests that FLDV DNA contains directly repeated sequences at both ends and that it is terminally redundant. FLDV DNA is methylated in cytosine. FLDV DNA did not hybridize with frog virus DNA indicating that the two iridoviruses are not closely related to each other. Restriction enzyme analysis and Southern blot hybridizations revealed that FLDV isolates can be classified into two different strains: FLDV strain 1 occurs in flounders and plaice, whereas strain 2 is usually found in lesions of dabs.
The complete chloroplast genome sequence of Hibiscus syriacus.

PubMed

Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin

2016-09-01

The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

PubMed Central

Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

2001-01-01

Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

PubMed

Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

2001-10-09

Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Score distributions of gapped multiple sequence alignments down to the low-probability tail

NASA Astrophysics Data System (ADS)

Fieth, Pascal; Hartmann, Alexander K.

2016-08-01

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.

Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

NASA Astrophysics Data System (ADS)

Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
Design and implementation of low complexity wake-up receiver for underwater acoustic sensor networks

NASA Astrophysics Data System (ADS)

Yue, Ming

This thesis designs a low-complexity dual Pseudorandom Noise (PN) scheme for identity (ID) detection and coarse frame synchronization. The two PN sequences for a node are identical and are separated by a specified length of gap which serves as the ID of different sensor nodes. The dual PN sequences are short in length but are capable of combating severe underwater acoustic (UWA) multipath fading channels that exhibit time varying impulse responses up to 100 taps. The receiver ID detection is implemented on a microcontroller MSP430F5529 by calculating the correlation between the two segments of the PN sequence with the specified separation gap. When the gap length is matched, the correlator outputs a peak which triggers the wake-up enable. The time index of the correlator peak is used as the coarse synchronization of the data frame. The correlator is implemented by an iterative algorithm that uses only one multiplication and two additions for each sample input regardless of the length of the PN sequence, thus achieving low computational complexity. The real-time processing requirement is also met via direct memory access (DMA) and two circular buffers to accelerate data transfer between the peripherals and the memory. The proposed dual PN detection scheme has been successfully tested by simulated fading channels and real-world measured channels. The results show that, in long multipath channels with more than 60 taps, the proposed scheme achieves high detection rate and low false alarm rate using maximal-length sequences as short as 31 bits to 127 bits, therefore it is suitable as a low-power wake-up receiver. The future research will integrate the wake-up receiver with Digital Signal Processors (DSP) for payload detection.
Systematic Evaluation of the Dependence of Deoxyribozyme Catalysis on Random Region Length

PubMed Central

Velez, Tania E.; Singh, Jaydeep; Xiao, Ying; Allen, Emily C.; Wong, On Yi; Chandra, Madhavaiah; Kwon, Sarah C.; Silverman, Scott K.

2012-01-01

Functional nucleic acids are DNA and RNA aptamers that bind targets, or they are deoxyribozymes and ribozymes that have catalytic activity. These functional DNA and RNA sequences can be identified from random-sequence pools by in vitro selection, which requires choosing the length of the random region. Shorter random regions allow more complete coverage of sequence space but may not permit the structural complexity necessary for binding or catalysis. In contrast, longer random regions are sampled incompletely but may allow adoption of more complicated structures that enable function. In this study, we systematically examined random region length (N20 through N60) for two particular deoxyribozyme catalytic activities, DNA cleavage and tyrosine-RNA nucleopeptide linkage formation. For both activities, we previously identified deoxyribozymes using only N40 regions. In the case of DNA cleavage, here we found that shorter N20 and N30 regions allowed robust catalytic function, either by DNA hydrolysis or by DNA deglycosylation and strand scission via β-elimination, whereas longer N50 and N60 regions did not lead to catalytically active DNA sequences. Follow-up selections with N20, N30, and N40 regions revealed an interesting interplay of metal ion cofactors and random region length. Separately, for Tyr-RNA linkage formation, N30 and N60 regions provided catalytically active sequences, whereas N20 was unsuccessful, and the N40 deoxyribozymes were functionally superior (in terms of rate and yield) to N30 and N60. Collectively, the results indicate that with future in vitro selection experiments for DNA and RNA catalysts, and by extension for aptamers, random region length should be an important experimental variable. PMID:23088677
Novel full-length major histocompatibility complex class I allele discovery and haplotype definition in pig-tailed macaques.

PubMed

Semler, Matthew R; Wiseman, Roger W; Karl, Julie A; Graham, Michael E; Gieger, Samantha M; O'Connor, David H

2018-06-01

Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.
Designing robust watermark barcodes for multiplex long-read sequencing.

PubMed

Ezpeleta, Joaquín; Krsticevic, Flavia J; Bulacio, Pilar; Tapia, Elizabeth

2017-03-15

To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed. We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process. Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark . ezpeleta@cifasis-conicet.gov.ar. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Prediction of enhancer-promoter interactions via natural language processing.

PubMed

Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui

2018-05-09

Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.

PubMed

Du, Pu-Feng; Zhao, Wei; Miao, Yang-Yang; Wei, Le-Yi; Wang, Likun

2017-11-14

With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.
Conventional and cross-correlation brain-stem auditory evoked responses in the white leghorn chick: rate manipulations

NASA Technical Reports Server (NTRS)

Burkard, R.; Jones, S.; Jones, T.

1994-01-01

Rate-dependent changes in the chick brain-stem auditory evoked response (BAER) using conventional averaging and a cross-correlation technique were investigated. Five 15- to 19-day-old white leghorn chicks were anesthetized with Chloropent. In each chick, the left ear was acoustically stimulated. Electrical pulses of 0.1-ms duration were shaped, attenuated, and passed through a current driver to an Etymotic ER-2 which was sealed in the ear canal. Electrical activity from stainless-steel electrodes was amplified, filtered (300-3000 Hz) and digitized at 20 kHz. Click levels included 70 and 90 dB peSPL. In each animal, conventional BAERs were obtained at rates ranging from 5 to 90 Hz. BAERs were also obtained using a cross-correlation technique involving pseudorandom pulse sequences called maximum length sequences (MLSs). The minimum time between pulses, called the minimum pulse interval (MPI), ranged from 0.5 to 6 ms. Two BAERs were obtained for each condition. Dependent variables included the latency and amplitude of the cochlear microphonic (CM), wave 2 and wave 3. BAERs were observed in all chicks, for all level by rate combinations for both conventional and MLS BAERs. There was no effect of click level or rate on the latency of the CM. The latency of waves 2 and 3 increased with decreasing click level and increasing rate. CM amplitude decreased with decreasing click level, but was not influenced by click rate for the 70 dB peSPL condition. For the 90 dB peSPL click, CM amplitude was uninfluenced by click rate for conventional averaging. For MLS BAERs, CM amplitude was similar to conventional averaging for longer MPIs.(ABSTRACT TRUNCATED AT 250 WORDS).
Three-disk microswimmer in a supported fluid membrane

NASA Astrophysics Data System (ADS)

Ota, Yui; Hosaka, Yuto; Yasuda, Kento; Komura, Shigeyuki

2018-05-01

A model of three-disk micromachine swimming in a quasi-two-dimensional supported membrane is proposed. We calculate the average swimming velocity as a function of the disk size and the arm length. Due to the presence of the hydrodynamic screening length in the quasi-two-dimensional fluid, the geometric factor appearing in the average velocity exhibits three different asymptotic behaviors depending on the microswimmer size and the hydrodynamic screening length. This is in sharp contrast with a microswimmer in a three-dimensional bulk fluid that shows only a single scaling behavior. We also find that the maximum velocity is obtained when the disks are equal-sized, whereas it is minimized when the average arm lengths are identical. The intrinsic drag of the disks on the substrate does not alter the scaling behaviors of the geometric factor.
Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

PubMed

Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

2014-02-01

Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.
Intelligence and Complexity of the Averaged Evoked Potential: An Attentional Theory.

ERIC Educational Resources Information Center

Bates, Tim; And Others

1995-01-01

A study measuring average evoked potentials in 21 college students finds that intelligence test scores correlate significantly with the difference between string length in attended and nonattended conditions, a finding that suggests that previous inconsistencies in reporting string length-intelligence correlations may have resulted from confound…
Tandem alternative polyadenylation events of genes in non-eosinophilic nasal polyp tissue identified by high-throughput sequencing analysis

PubMed Central

TIAN, PENG; LI, JIE; LIU, XIANG; LI, YUXI; CHEN, MEIHENG; MA, YUN; ZHENG, YI QING; FU, YONGGUI; ZOU, HUA

2014-01-01

Nasal polyps (NP) is highly associated with the disorder of immune cells. Alternative polyadenylation (APA) produces mRNA isoforms with different length of 3′-untranslated region (UTR) and regulates gene expression. It has been proven that this APA-mediated regulation of 3′UTR length is an immune-associated phenomenon. The aim of this study was to investigate the genome-wide alternative tandem 3′UTR length switching events in non-eosinophilic nasal polyp tissue. Thirteen patients diagnosed as having non-eosinophilic nasal polyps were included in this study. Nasal polyp tissue and control mucosa were collected during surgery. The 3′ end library of cDNA was constructed. The recovered libraries were sequenced with second sequencing technology, and the sequencing data were analyzed by an in-house bioinformatics pipeline. Tandem 3′UTR length switching between samples was detected by a test of linear trend alternative to independence. We found a significant alteration in the tandem 3′UTR length in 1,920 genes in nasal polyp samples. Functional annotation results showed that several gene ontology (GO) terms were enriched in the list of genes with switched APA sites, including regulation of transcription, macromolecule catabolic localization and mRNA processing. The results suggested that APA-mediated alternative 3′UTR regulation plays an important role in the post-transcriptional regulation of gene expression in non-eosinophilic nasal polyps. PMID:24715051
An improved model for whole genome phylogenetic analysis by Fourier transform.

PubMed

Yin, Changchuan; Yau, Stephen S-T

2015-10-07

DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Miniature Transposable Sequences Are Frequently Mobilized in the Bacterial Plant Pathogen Pseudomonas syringae pv. phaseolicola

PubMed Central

Bardaji, Leire; Añorga, Maite; Jackson, Robert W.; Martínez-Bilbao, Alejandro; Yanguas-Casás, Natalia; Murillo, Jesús

2011-01-01

Mobile genetic elements are widespread in Pseudomonas syringae, and often associate with virulence genes. Genome reannotation of the model bean pathogen P. syringae pv. phaseolicola 1448A identified seventeen types of insertion sequences and two miniature inverted-repeat transposable elements (MITEs) with a biased distribution, representing 2.8% of the chromosome, 25.8% of the 132-kb virulence plasmid and 2.7% of the 52-kb plasmid. Employing an entrapment vector containing sacB, we estimated that transposition frequency oscillated between 2.6×10−5 and 1.1×10−6, depending on the clone, although it was stable for each clone after consecutive transfers in culture media. Transposition frequency was similar for bacteria grown in rich or minimal media, and from cells recovered from compatible and incompatible plant hosts, indicating that growth conditions do not influence transposition in strain 1448A. Most of the entrapped insertions contained a full-length IS801 element, with the remaining insertions corresponding to sequences smaller than any transposable element identified in strain 1448A, and collectively identified as miniature sequences. From these, fragments of 229, 360 and 679-nt of the right end of IS801 ended in a consensus tetranucleotide and likely resulted from one-ended transposition of IS801. An average 0.7% of the insertions analyzed consisted of IS801 carrying a fragment of variable size from gene PSPPH_0008/PSPPH_0017, showing that IS801 can mobilize DNA in vivo. Retrospective analysis of complete plasmids and genomes of P. syringae suggests, however, that most fragments of IS801 are likely the result of reorganizations rather than one-ended transpositions, and that this element might preferentially contribute to genome flexibility by generating homologous regions of recombination. A further miniature sequence previously found to affect host range specificity and virulence, designated MITEPsy1 (100-nt), represented an average 2.4% of the total number of insertions entrapped in sacB, demonstrating for the first time the mobilization of a MITE in bacteria. PMID:22016774
Miniature transposable sequences are frequently mobilized in the bacterial plant pathogen Pseudomonas syringae pv. phaseolicola.

PubMed

Bardaji, Leire; Añorga, Maite; Jackson, Robert W; Martínez-Bilbao, Alejandro; Yanguas-Casás, Natalia; Murillo, Jesús

2011-01-01

Mobile genetic elements are widespread in Pseudomonas syringae, and often associate with virulence genes. Genome reannotation of the model bean pathogen P. syringae pv. phaseolicola 1448A identified seventeen types of insertion sequences and two miniature inverted-repeat transposable elements (MITEs) with a biased distribution, representing 2.8% of the chromosome, 25.8% of the 132-kb virulence plasmid and 2.7% of the 52-kb plasmid. Employing an entrapment vector containing sacB, we estimated that transposition frequency oscillated between 2.6×10(-5) and 1.1×10(-6), depending on the clone, although it was stable for each clone after consecutive transfers in culture media. Transposition frequency was similar for bacteria grown in rich or minimal media, and from cells recovered from compatible and incompatible plant hosts, indicating that growth conditions do not influence transposition in strain 1448A. Most of the entrapped insertions contained a full-length IS801 element, with the remaining insertions corresponding to sequences smaller than any transposable element identified in strain 1448A, and collectively identified as miniature sequences. From these, fragments of 229, 360 and 679-nt of the right end of IS801 ended in a consensus tetranucleotide and likely resulted from one-ended transposition of IS801. An average 0.7% of the insertions analyzed consisted of IS801 carrying a fragment of variable size from gene PSPPH_0008/PSPPH_0017, showing that IS801 can mobilize DNA in vivo. Retrospective analysis of complete plasmids and genomes of P. syringae suggests, however, that most fragments of IS801 are likely the result of reorganizations rather than one-ended transpositions, and that this element might preferentially contribute to genome flexibility by generating homologous regions of recombination. A further miniature sequence previously found to affect host range specificity and virulence, designated MITEPsy1 (100-nt), represented an average 2.4% of the total number of insertions entrapped in sacB, demonstrating for the first time the mobilization of a MITE in bacteria.
Draft Genome Sequence of Pseudomonas sp. Strain LFM046, a Producer of Medium-Chain-Length Polyhydroxyalkanoate.

PubMed

Cardinali-Rezende, Juliana; Alexandrino, Paulo Moises Raduan; Nahat, Rafael Augusto Theodoro Pereira de Souza; Sant'Ana, Débora Parrine Vieira; Silva, Luiziana Ferreira; Gomez, José Gregório Cabrera; Taciro, Marilda Keico

2015-08-20

Pseudomonas sp. LFM046 is a medium-chain-length polyhydroxyalkanoate (PHAMCL) producer capable of using various carbon sources (carbohydrates, organic acids, and vegetable oils) and was first isolated from sugarcane cultivation soil in Brazil. The genome sequence was found to be 5.97 Mb long with a G+C content of 66%. Copyright © 2015 Cardinali-Rezende et al.
Identification of gyrB and rpoB gene mutations and differentially expressed proteins between a novobiocin-resistant Aeromonas hydrophila catfish vaccine strain and its virulent parent strain

USDA-ARS?s Scientific Manuscript database

Sequence comparison between the full-length 2412 bp DNA gyrase subunit B (gyrB) gene of a novobiocin resistant Aeromonas hydrophila AH11NOVO vaccine strain and that of its virulent parent strain AH11P revealed 10 missense mutations. Similarly, sequence comparison between the full-length 4092 bp RNA ...
The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes

PubMed Central

Li, Jia; Su, Yingjuan; Wang, Ting

2018-01-01

The plastid accD gene encodes a subunit of the acetyl-CoA carboxylase (ACCase) enzyme. The length of accD gene has been supposed to expand in Cryptomeria japonica, Taiwania cryptomerioides, Cephalotaxus, Taxus chinensis, and Podocarpus lambertii, and the main reason for this phenomenon was the existence of tandemly repeated sequences. However, it is still unknown whether the accD gene length in other cupressophytes has expanded. Here, in order to investigate how widespread this phenomenon was, 18 accD sequences and its surrounding regions of cupressophyte were sequenced and analyzed. Together with 39 GenBank sequence data, our taxon sampling covered all the extant gymnosperm orders. The repetitive elements and substitution rates of accD among 57 gymnosperm species were analyzed, the results show: (1) Reading frame length of accD gene in 18 cupressophytes species has also expanded. (2) Many repetitive elements were identified in accD gene of cupressophyte lineages. (3) The synonymous and non-synonymous substitution rates of accD were accelerated in cupressophytes. (4) accD was located in rearrangement endpoints. These results suggested that repetitive elements may mediate the chloroplast genome rearrangement and accelerated the substitution rates. PMID:29731764
Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

PubMed Central

Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi

2006-01-01

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Analysis of Differentially Expressed Genes Associated with Coronatine-Induced Laticifer Differentiation in the Rubber Tree by Subtractive Hybridization Suppression

PubMed Central

Zhang, Shi-Xin; Wu, Shao-Hua; Chen, Yue-Yi; Tian, Wei-Min

2015-01-01

The secondary laticifer in the secondary phloem is differentiated from the vascular cambia of the rubber tree (Hevea brasiliensis Muell. Arg.). The number of secondary laticifers is closely related to the rubber yield potential of Hevea. Pharmacological data show that jasmonic acid and its precursor linolenic acid are effective in inducing secondary laticifer differentiation in epicormic shoots of the rubber tree. In the present study, an experimental system of coronatine-induced laticifer differentiation was developed to perform SSH identification of genes with differential expression. A total of 528 positive clones were obtained by blue-white screening, of which 248 clones came from the forward SSH library while 280 clones came from the reverse SSH library. Approximately 215 of the 248 clones and 171 of the 280 clones contained cDNA inserts by colony PCR screening. A total of 286 of the 386 ESTs were detected to be differentially expressed by reverse northern blot and sequenced. Approximately 147 unigenes with an average length of 497 bp from the forward and 109 unigenes with an average length of 514 bp from the reverse SSH libraries were assembled and annotated. The unigenes were associated with the stress/defense response, plant hormone signal transduction and structure development. It is suggested that Ca2+ signal transduction and redox seem to be involved in differentiation, while PGA and EIF are associated with the division of cambium initials for COR-induced secondary laticifer differentiation in the rubber tree. PMID:26147807

Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

PubMed

Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

2015-05-01

Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.
Rumen Bacterial Diversity of 80 to 110-Day-Old Goats Using 16S rRNA Sequencing

PubMed Central

Han, Xufeng; Yang, Yuxin; Yan, Hailong; Wang, Xiaolong; Qu, Lei; Chen, Yulin

2015-01-01

The ability of rumen microorganisms to use fibrous plant matter plays an important role in ruminant animals; however, little information about rumen colonization by microbial populations after weaning has been reported. In this study, high-throughput sequencing was used to investigate the establishment of this microbial population in 80 to 110-day-old goats. Illumina sequencing of goat rumen samples yielded 101,356,610 nucleotides that were assembled into 256,868 reads with an average read length of 394 nucleotides. Taxonomic analysis of metagenomic reads indicated that the predominant phyla were distinct at different growth stages. The phyla Firmicutes and Synergistetes were predominant in samples taken from 80 to 100-day-old goats, but Bacteroidetes and Firmicutes became the most abundant phyla in samples from 110-day-old animals. There was a remarkable variation in the microbial populations with age; Firmicutes and Synergistetes decreased after weaning, but Bacteroidetes and Proteobacteria increased from 80 to 110 day of age. These findings suggested that colonization of the rumen by microorganisms is related to their function in the rumen digestive system. These results give a better understanding of the role of rumen microbes and the establishment of the microbial population, which help to maintain the host’s health and improve animal performance. PMID:25700157
A comprehensive characterization of simple sequence repeats in pepper genomes provides valuable resources for marker development in Capsicum.

PubMed

Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F; Li, Shuaicheng; Hu, Kailin

2016-01-07

The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum.
A comprehensive characterization of simple sequence repeats in pepper genomes provides valuable resources for marker development in Capsicum

PubMed Central

Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L.; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F.; Li, Shuaicheng; Hu, Kailin

2016-01-01

The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum. PMID:26739748
A systematic comparison of error correction enzymes by next-generation sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lubock, Nathan B.; Zhang, Di; Sidore, Angus M.

Gene synthesis, the process of assembling genelength fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared sixmore » different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.« less
Comprehensive Transcriptome Study to Develop Molecular Resources of the Copepod Calanus sinicus for Their Potential Ecological Applications

PubMed Central

Yang, Qing; Sun, Fanyue; Yang, Zhi; Li, Hongjun

2014-01-01

Calanus sinicus Brodsky (Copepoda, Crustacea) is a dominant zooplanktonic species widely distributed in the margin seas of the Northwest Pacific Ocean. In this study, we utilized an RNA-Seq-based approach to develop molecular resources for C. sinicus. Adult samples were sequenced using the Illumina HiSeq 2000 platform. The sequencing data generated 69,751 contigs from 58.9 million filtered reads. The assembled contigs had an average length of 928.8 bp. Gene annotation allowed the identification of 43,417 unigene hits against the NCBI database. Gene ontology (GO) and KEGG pathway mapping analysis revealed various functional genes related to diverse biological functions and processes. Transcripts potentially involved in stress response and lipid metabolism were identified among these genes. Furthermore, 4,871 microsatellites and 110,137 single nucleotide polymorphisms (SNPs) were identified in the C. sinicus transcriptome sequences. SNP validation by the melting temperature (T m)-shift method suggested that 16 primer pairs amplified target products and showed biallelic polymorphism among 30 individuals. The present work demonstrates the power of Illumina-based RNA-Seq for the rapid development of molecular resources in nonmodel species. The validated SNP set from our study is currently being utilized in an ongoing ecological analysis to support a future study of C. sinicus population genetics. PMID:24982883
Identification, characterization and expression analysis of lineage-specific genes within sweet orange (Citrus sinensis).

PubMed

Xu, Yuantao; Wu, Guizhi; Hao, Baohai; Chen, Lingling; Deng, Xiuxin; Xu, Qiang

2015-11-23

With the availability of rapidly increasing number of genome and transcriptome sequences, lineage-specific genes (LSGs) can be identified and characterized. Like other conserved functional genes, LSGs play important roles in biological evolution and functions. Two set of citrus LSGs, 296 citrus-specific genes (CSGs) and 1039 orphan genes specific to sweet orange, were identified by comparative analysis between the sweet orange genome sequences and 41 genomes and 273 transcriptomes. With the two sets of genes, gene structure and gene expression pattern were investigated. On average, both the CSGs and orphan genes have fewer exons, shorter gene length and higher GC content when compared with those evolutionarily conserved genes (ECs). Expression profiling indicated that most of the LSGs expressed in various tissues of sweet orange and some of them exhibited distinct temporal and spatial expression patterns. Particularly, the orphan genes were preferentially expressed in callus, which is an important pluripotent tissue of citrus. Besides, part of the CSGs and orphan genes expressed responsive to abiotic stress, indicating their potential functions during interaction with environment. This study identified and characterized two sets of LSGs in citrus, dissected their sequence features and expression patterns, and provided valuable clues for future functional analysis of the LSGs in sweet orange.
A systematic comparison of error correction enzymes by next-generation sequencing

DOE PAGES

Lubock, Nathan B.; Zhang, Di; Sidore, Angus M.; ...

2017-08-01

Gene synthesis, the process of assembling genelength fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality and cost of gene synthesis are limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment. Here, we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in model gene assemblies and systematically compared sixmore » different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G transversions whereas T7 Endonuclease I preferentially corrects A/T transversions. More generally, this experimental and computational pipeline is a fast, scalable and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.« less
Long-read sequencing improves assembly of Trichinella genomes 10-fold, revealing substantial synteny between lineages diverged over 7 million years.

PubMed

Thompson, Peter C; Zarlenga, Dante S; Liu, Ming-Yuan; Rosenthal, Benjamin M

2017-09-01

Genome assemblies can form the basis of comparative analyses fostering insight into the evolutionary genetics of a parasite's pathogenicity, host-pathogen interactions, environmental constraints and invasion biology; however, the length and complexity of many parasite genomes has hampered the development of well-resolved assemblies. In order to improve Trichinella genome assemblies, the genome of the sylvatic encapsulated species Trichinella murrelli was sequenced using third-generation, long-read technology and, using syntenic comparisons, scaffolded to a reference genome assembly of Trichinella spiralis, markedly improving both. A high-quality draft assembly for T. murrelli was achieved that totalled 63·2 Mbp, half of which was condensed into 26 contigs each longer than 571 000 bp. When compared with previous assemblies for parasites in the genus, ours required 10-fold fewer contigs, which were five times longer, on average. Better assembly across repetitive regions also enabled resolution of 8 Mbp of previously indeterminate sequence. Furthermore, syntenic comparisons identified widespread scaffold misassemblies in the T. spiralis reference genome. The two new assemblies, organized for the first time into three chromosomal scaffolds, will be valuable resources for future studies linking phenotypic traits within each species to their underlying genetic bases.
Campylobacter iguaniorum sp. nov., isolated from reptiles.

PubMed

Gilbert, Maarten J; Kik, Marja; Miller, William G; Duim, Birgitta; Wagenaar, Jaap A

2015-03-01

During sampling of reptiles for members of the class Epsilonproteobacteria, strains representing a member of the genus Campylobacter not belonging to any of the established taxa were isolated from lizards and chelonians. Initial amplified fragment length polymorphism, PCR and 16S rRNA sequence analysis showed that these strains were most closely related to Campylobacter fetus and Campylobacter hyointestinalis. A polyphasic study was undertaken to determine the taxonomic position of five strains. The strains were characterized by 16S rRNA and atpA sequence analysis, matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry and conventional phenotypic testing. Whole-genome sequences were determined for strains 1485E(T) and 2463D, and the average nucleotide and amino acid identities were determined for these strains. The strains formed a robust phylogenetic clade, divergent from all other species of the genus Campylobacter. In contrast to most currently known members of the genus Campylobacter, the strains showed growth at ambient temperatures, which might be an adaptation to their reptilian hosts. The results of this study clearly show that these strains isolated from reptiles represent a novel species within the genus Campylobacter, for which the name Campylobacter iguaniorum sp. nov. is proposed. The type strain is 1485E(T) ( = LMG 28143(T) = CCUG 66346(T)). © 2015 IUMS.
Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.

PubMed

Koren, Sergey; Phillippy, Adam M

2015-02-01

Like a jigsaw puzzle with large pieces, a genome sequenced with long reads is easier to assemble. However, recent sequencing technologies have favored lowering per-base cost at the expense of read length. This has dramatically reduced sequencing cost, but resulted in fragmented assemblies, which negatively affect downstream analyses and hinder the creation of finished (gapless, high-quality) genomes. In contrast, emerging long-read sequencing technologies can now produce reads tens of kilobases in length, enabling the automated finishing of microbial genomes for under $1000. This promises to improve the quality of reference databases and facilitate new studies of chromosomal structure and variation. We present an overview of these new technologies and the methods used to assemble long reads into complete genomes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Quantitative proteomics analysis with iTRAQ in human lenses with nuclear cataracts of different axial lengths.

PubMed

Zhou, Haiyan; Yan, Hong; Yan, Weijia; Wang, Xinchuan; Ma, Yong; Wang, Jianping

2016-01-01

The goal of this study was to identify and quantify the differentially expressed proteins in human nuclear cataract with different axial lengths. Thirty-six samples of human lens nuclei with hardness grade III or IV were obtained during cataract surgery with extracapsular cataract extraction (ECCE). Six healthy transparent human lens nuclei were obtained from fresh healthy cadaver eyes during corneal transplantation surgery. The lens nuclei were divided into seven groups (six lenses in each group) according to the optic axis: Group A (mean axial length 28.7±1.5 mm; average age 59.8±1.9 years), Group B (mean axial length 23.0±0.4 mm; average age 60.3±2.5 years), Group C (mean axial length 19.9±0.5 mm; average age 55.1±2.5 years), Group D (mean axial length 28.7±1.4 mm; average age 58.0±4.0 years), Group E (mean axial length 23.0±0.3 mm; average age 56.9±4.2 years), and Group F (mean axial length 20.7±0.6 mm; average age 57.6±5.3 years). The six healthy transparent human lenses were included in a younger group with standard optic axes, Group G (mean axial length 23.0±0.5 mm; average age 34.7±4.2 years).Water-soluble, water-insoluble, and water-insoluble-urea-soluble protein fractions were extracted from the samples. The three-part protein fractions from the individual lenses were combined to form the total proteins of each sample. The proteomic profiles of each group were analyzed using 8-plex isobaric tagging for relative and absolute protein quantification (iTRAQ) labeling combined with two-dimensional liquid chromatography tandem mass spectrometry (2D-LC-MS/MS). The data were analyzed with ProteinPilot software for peptide matching, protein identification, and quantification. Differentially expressed proteins were validated with western blotting. We employed biological and technical replicates and selected the intersection of the two sets of results, which included 40 proteins. From the 40 proteins identified, six were selected as differentially expressed proteins closely related to axial length. The six proteins were gap junction alpha-3 protein, beta-crystallin B2, T-complex protein 1 subunit beta, gamma-enolase, pyruvate kinase isozymes M1/M2, and sorbitol dehydrogenase. Levels of beta-crystallin B2 expression were decreased in nuclear cataracts with longer axial length. The results of the mass spectrometric analysis were consistent with the western blot validation. The discovery of these differentially expressed proteins provides valuable clues for understanding the pathogenesis of axial-related nuclear cataract. The results indicate that beta-crystallin B2 (CRBB2) may be involved in axial-related nuclear cataract pathogenesis. Further studies are needed to investigate the correlation between CRBB2 and axial-related nuclear cataract.
Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients.

PubMed

Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda M B N; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A

2003-07-21

Plasmodium vivax is the most widely distributed human malaria, responsible for 70-80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10(-30) was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.
Genomes: At the edge of chaos with maximum information capacity

NASA Astrophysics Data System (ADS)

Kong, Sing-Guan; Chen, Hong-Da; Torda, Andrew; Lee, H. C.

2016-12-01

We propose an order index, ϕ, which quantifies the notion of “life at the edge of chaos” when applied to genome sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length and base composition. The 786 complete genomic sequences in GenBank were found to have ϕ values in a very narrow range, 0.037 ± 0.027. We show this implies that genomes are halfway towards being completely random, namely, at the edge of chaos. We argue that this narrow range represents the neighborhood of a fixed-point in the space of sequences, and genomes are driven there by the dynamics of a robust, predominantly neutral evolution process.
Clonal structure in Ichthyobacterium seriolicida, the causative agent of bacterial haemolytic jaundice in yellowtail, Seriola quinqueradiata, inferred from molecular epidemiological analysis.

PubMed

Matsuyama, T; Fukuda, Y; Sakai, T; Tanimoto, N; Nakanishi, M; Nakamura, Y; Takano, T; Nakayasu, C

2017-08-01

Bacterial haemolytic jaundice caused by Ichthyobacterium seriolicida has been responsible for mortality in farmed yellowtail, Seriola quinqueradiata, in western Japan since the 1980s. In this study, polymorphic analysis of I. seriolicida was performed using three molecular methods: amplified fragment length polymorphism (AFLP) analysis, multilocus sequence typing (MLST) and multiple-locus variable-number tandem repeat analysis (MLVA). Twenty-eight isolates were analysed using AFLP, while 31 isolates were examined by MLST and MLVA. No polymorphisms were identified by AFLP analysis using EcoRI and MseI, or by MLST of internal fragments of eight housekeeping genes. However, MLVA revealed variation in repeat numbers of three elements, allowing separation of the isolates into 16 sequence types. The unweighted pair group method using arithmetic averages cluster analysis of the MLVA data identified four major clusters, and all isolates belonged to clonal complexes. It is likely that I. seriolicida populations share a common ancestor, which may be a recently introduced strain. © 2016 John Wiley & Sons Ltd.
De Novo Assembly and Characterization of Fruit Transcriptome in Black Pepper (Piper nigrum)

PubMed Central

Hu, Lisong; Hao, Chaoyun; Fan, Rui; Wu, Baoduo; Tan, Lehe; Wu, Huasong

2015-01-01

Black pepper is one of the most popular and oldest spices in the world and valued for its pungent constituent alkaloids. Pinerine is the main bioactive compound in pepper alkaloids, which perform unique physiological functions. However, the mechanisms of piperine synthesis are poorly understood. This study is the first to describe the fruit transcriptome of black pepper by sequencing on Illumina HiSeq 2000 platform. A total of 56,281,710 raw reads were obtained and assembled. From these raw reads, 44,061 unigenes with an average length of 1,345 nt were generated. During functional annotation, 40,537 unigenes were annotated in Gene Ontology categories, Kyoto Encyclopedia of Genes and Genomes pathways, Swiss-Prot database, and Nucleotide Collection (NR/NT) database. In addition, 8,196 simple sequence repeats (SSRs) were detected. In a detailed analysis of the transcriptome, housekeeping genes for quantitative polymerase chain reaction internal control, polymorphic SSRs, and lysine/ornithine metabolism-related genes were identified. These results validated the availability of our database. Our study could provide useful data for further research on piperine synthesis in black pepper. PMID:26121657
De Novo Assembly and Characterization of Fruit Transcriptome in Black Pepper (Piper nigrum).

PubMed

Hu, Lisong; Hao, Chaoyun; Fan, Rui; Wu, Baoduo; Tan, Lehe; Wu, Huasong

2015-01-01

Black pepper is one of the most popular and oldest spices in the world and valued for its pungent constituent alkaloids. Pinerine is the main bioactive compound in pepper alkaloids, which perform unique physiological functions. However, the mechanisms of piperine synthesis are poorly understood. This study is the first to describe the fruit transcriptome of black pepper by sequencing on Illumina HiSeq 2000 platform. A total of 56,281,710 raw reads were obtained and assembled. From these raw reads, 44,061 unigenes with an average length of 1,345 nt were generated. During functional annotation, 40,537 unigenes were annotated in Gene Ontology categories, Kyoto Encyclopedia of Genes and Genomes pathways, Swiss-Prot database, and Nucleotide Collection (NR/NT) database. In addition, 8,196 simple sequence repeats (SSRs) were detected. In a detailed analysis of the transcriptome, housekeeping genes for quantitative polymerase chain reaction internal control, polymorphic SSRs, and lysine/ornithine metabolism-related genes were identified. These results validated the availability of our database. Our study could provide useful data for further research on piperine synthesis in black pepper.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.

PubMed Central

Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R

1982-01-01

The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

PubMed

Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

2013-03-01

We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.

Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

PubMed

Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

2018-05-01

STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
A Comparative Analysis of Selected Mechanical Aspects of the Ice Skating Stride.

ERIC Educational Resources Information Center

Marino, G. Wayne

This study quantitatively analyzes selected aspects of the skating strides of above-average and below-average ability skaters. Subproblems were to determine how stride length and stride rate are affected by changes in skating velocity, to ascertain whether the basic assumption that stride length accurately approximates horizontal movement of the…
Applications of statistical physics and information theory to the analysis of DNA sequences

NASA Astrophysics Data System (ADS)

Grosse, Ivo

2000-10-01

DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Transcriptomics of In Vitro Immune-Stimulated Hemocytes from the Manila Clam Ruditapes philippinarum Using High-Throughput Sequencing

PubMed Central

Moreira, Rebeca; Balseiro, Pablo; Planas, Josep V.; Fuste, Berta; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio

2012-01-01

Background The Manila clam (Ruditapes philippinarum) is a worldwide cultured bivalve species with important commercial value. Diseases affecting this species can result in large economic losses. Because knowledge of the molecular mechanisms of the immune response in bivalves, especially clams, is scarce and fragmentary, we sequenced RNA from immune-stimulated R. philippinarum hemocytes by 454-pyrosequencing to identify genes involved in their immune defense against infectious diseases. Methodology and Principal Findings High-throughput deep sequencing of R. philippinarum using 454 pyrosequencing technology yielded 974,976 high-quality reads with an average read length of 250 bp. The reads were assembled into 51,265 contigs and the 44.7% of the translated nucleotide sequences into protein were annotated successfully. The 35 most frequently found contigs included a large number of immune-related genes, and a more detailed analysis showed the presence of putative members of several immune pathways and processes like the apoptosis, the toll like signaling pathway and the complement cascade. We have found sequences from molecules never described in bivalves before, especially in the complement pathway where almost all the components are present. Conclusions This study represents the first transcriptome analysis using 454-pyrosequencing conducted on R. philippinarum focused on its immune system. Our results will provide a rich source of data to discover and identify new genes, which will serve as a basis for microarray construction and the study of gene expression as well as for the identification of genetic markers. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum. PMID:22536348
Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology.

PubMed

Lei, Haiyan; Li, Tianwei; Hung, Guo-Chiuan; Li, Bingjie; Tsai, Shien; Lo, Shyh-Ching

2013-11-19

We conducted genomic sequencing to identify Epstein Barr Virus (EBV) genomes in 2 human peripheral blood B lymphocytes that underwent spontaneous immortalization promoted by mycoplasma infections in culture, using the high-throughput sequencing (HTS) Illumina MiSeq platform. The purpose of this study was to examine if rapid detection and characterization of a viral agent could be effectively achieved by HTS using a platform that has become readily available in general biology laboratories. Raw read sequences, averaging 175 bps in length, were mapped with DNA databases of human, bacteria, fungi and virus genomes using the CLC Genomics Workbench bioinformatics tool. Overall 37,757 out of 49,520,834 total reads in one lymphocyte line (# K4413-Mi) and 28,178 out of 45,335,960 reads in the other lymphocyte line (# K4123-Mi) were identified as EBV sequences. The two EBV genomes with estimated 35.22-fold and 31.06-fold sequence coverage respectively, designated K4413-Mi EBV and K4123-Mi EBV (GenBank accession number KC440852 and KC440851 respectively), are characteristic of type-1 EBV. Sequence comparison and phylogenetic analysis among K4413-Mi EBV, K4123-Mi EBV and the EBV genomes previously reported to GenBank as well as the NA12878 EBV genome assembled from database of the 1000 Genome Project showed that these 2 EBVs are most closely related to B95-8, an EBV previously isolated from a patient with infectious mononucleosis and WT-EBV. They are less similar to EBVs associated with nasopharyngeal carcinoma (NPC) from Hong Kong and China as well as the Akata strain of a case of Burkitt's lymphoma from Japan. They are most different from type 2 EBV found in Western African Burkitt's lymphoma.
A near full-length open reading frame next generation sequencing assay for genotyping and identification of resistance-associated variants in hepatitis C virus.

PubMed

Pedersen, M S; Fahnøe, U; Hansen, T A; Pedersen, A G; Jenssen, H; Bukh, J; Schønning, K

2018-06-01

The current treatment options for hepatitis C virus (HCV), based on direct acting antivirals (DAA), are dependent on virus genotype and previous treatment experience. Treatment failures have been associated with detection of resistance-associated substitutions (RASs) in the DAA targets of HCV, the NS3, NS5A and NS5 B proteins. To develop a next generation sequencing based method that provides genotype and detection of HCV NS3, NS5A, and NS5 B RASs without prior knowledge of sample genotype. In total, 101 residual plasma samples from patients with HCV covering 10 different viral subtypes across 4 genotypes with viral loads of 3.84-7.61 Log IU/mL were included. All samples were de-identified and consequently prior treatment status for patients was unknown. Almost full open reading frame amplicons (∼ 9 kb) were generated using RT-PCR with a single primer set. The resulting amplicons were sequenced with high throughput sequencing and analysed using an in-house developed script for detecting RASs. The method successfully amplified and sequenced 94% (95/101) of samples with an average coverage of 14,035; four of six failed samples were genotype 4a. Samples analysed twice yielded reproducible nucleotide frequencies across all sites. RASs were detected in 21/95 (22%) samples at a 15% threshold. The method identified one patient infected with two genotype 2b variants, and the presence of subgenomic deletion variants in 8 (8.4%) of 95 successfully sequenced samples. The presented method may provide identification of HCV genotype, RASs detection, and detect multiple HCV infection without prior knowledge of sample genotype. Copyright © 2018 Elsevier B.V. All rights reserved.
Genomic Organization of the Drosophila Telomere RetrotransposableElements

DOE Office of Scientific and Technical Information (OSTI.GOV)

George, J.A.; DeBaryshe, P.G.; Traverse, K.L.

2006-10-16

The emerging sequence of the heterochromatic portion of the Drosophila melanogaster genome, with the most recent update of euchromatic sequence, gives the first genome-wide view of the chromosomal distribution of the telomeric retrotransposons, HeT-A, TART, and Tahre. As expected, these elements are entirely excluded from euchromatin, although sequence fragments of HeT-A and TART 3 untranslated regions are found in nontelomeric heterochromatin on the Y chromosome. The proximal ends of HeT-A/TART arrays appear to be a transition zone because only here do other transposable elements mix in the array. The sharp distinction between the distribution of telomeric elements and that ofmore » other transposable elements suggests that chromatin structure is important in telomere element localization. Measurements reported here show (1) D. melanogaster telomeres are very long, in the size range reported for inbred mouse strains (averaging 46 kb per chromosome end in Drosophila stock 2057). As in organisms with telomerase, their length varies depending on genotype. There is also slight under-replication in polytene nuclei. (2) Surprisingly, the relationship between the number of HeT-A and TART elements is not stochastic but is strongly correlated across stocks, supporting the idea that the two elements are interdependent. Although currently assembled portions of the HeT-A/TART arrays are from the most-proximal part of long arrays, {approx}61% of the total HeT-A sequence in these regions consists of intact, potentially active elements with little evidence of sequence decay, making it likely that the content of the telomere arrays turns over more extensively than has been thought.« less
Spatio-Temporal Structure, Path Characteristics, and Perceptual Grouping in Immediate Serial Spatial Recall

PubMed Central

De Lillo, Carlo; Kirby, Melissa; Poole, Daniel

2016-01-01

Immediate serial spatial recall measures the ability to retain sequences of locations in short-term memory and is considered the spatial equivalent of digit span. It is tested by requiring participants to reproduce sequences of movements performed by an experimenter or displayed on a monitor. Different organizational factors dramatically affect serial spatial recall but they are often confounded or underspecified. Untangling them is crucial for the characterization of working-memory models and for establishing the contribution of structure and memory capacity to spatial span. We report five experiments assessing the relative role and independence of factors that have been reported in the literature. Experiment 1 disentangled the effects of spatial clustering and path-length by manipulating the distance of items displayed on a touchscreen monitor. Long-path sequences segregated by spatial clusters were compared with short-path sequences not segregated by clusters. Recall was more accurate for sequences segregated by clusters independently from path-length. Experiment 2 featured conditions where temporal pauses were introduced between or within cluster boundaries during the presentation of sequences with the same paths. Thus, the temporal structure of the sequences was either consistent or inconsistent with a hierarchical representation based on segmentation by spatial clusters but the effect of structure could not be confounded with effects of path-characteristics. Pauses at cluster boundaries yielded more accurate recall, as predicted by a hierarchical model. In Experiment 3, the systematic manipulation of sequence structure, path-length, and presence of path-crossings of sequences showed that structure explained most of the variance, followed by the presence/absence of path-crossings, and path-length. Experiments 4 and 5 replicated the results of the previous experiments in immersive virtual reality navigation tasks where the viewpoint of the observer changed dynamically during encoding and recall. This suggested that the effects of structure in spatial span are not dependent on perceptual grouping processes induced by the aerial view of the stimulus array typically afforded by spatial recall tasks. These results demonstrate the independence of coding strategies based on structure from effects of path characteristics and perceptual grouping in immediate serial spatial recall. PMID:27891101
Giardia telomeric sequence d(TAGGG)4 forms two intramolecular G-quadruplexes in K+ solution: effect of loop length and sequence on the folding topology.

PubMed

Hu, Lanying; Lim, Kah Wai; Bouaziz, Serge; Phan, Anh Tuân

2009-11-25

Recently, it has been shown that in K(+) solution the human telomeric sequence d[TAGGG(TTAGGG)(3)] forms a (3 + 1) intramolecular G-quadruplex, while the Bombyx mori telomeric sequence d[TAGG(TTAGG)(3)], which differs from the human counterpart only by one G deletion in each repeat, forms a chair-type intramolecular G-quadruplex, indicating an effect of G-tract length on the folding topology of G-quadruplexes. To explore the effect of loop length and sequence on the folding topology of G-quadruplexes, here we examine the structure of the four-repeat Giardia telomeric sequence d[TAGGG(TAGGG)(3)], which differs from the human counterpart only by one T deletion within the non-G linker in each repeat. We show by NMR that this sequence forms two different intramolecular G-quadruplexes in K(+) solution. The first one is a novel basket-type antiparallel-stranded G-quadruplex containing two G-tetrads, a G x (A-G) triad, and two A x T base pairs; the three loops are consecutively edgewise-diagonal-edgewise. The second one is a propeller-type parallel-stranded G-quadruplex involving three G-tetrads; the three loops are all double-chain-reversal. Recurrence of several structural elements in the observed structures suggests a "cut and paste" principle for the design and prediction of G-quadruplex topologies, for which different elements could be extracted from one G-quadruplex and inserted into another.
Complete mitogenome of the semi-aquatic grasshopper Oxya intricate (Stål.) (Insecta: Orthoptera: Catantopidae).

PubMed

Dong, Jia-Jia; Guan, De-Long; Xu, Sheng-Quan

2016-09-01

The complete mitogenome of Oxya intricate (Stål.) has been reconstructed from whole-genome Illumina sequencing data with an average coverage of 294×. The circular genome is 15,466 bp in length, and consists of 22 transfer RNAs (tRNAs), 13 protein-coding genes (PCGs), 2 ribosomal RNAs (rRNAs) and 1 D-loop region. All PCGs are initiated with ATN codons, and are terminated with TAR codons except for ND5 with the incomplete stop codon T. The nucleotide composition is asymmetric (42.5%A, 14.6%C, 10.6%G, 32.3%T) with an overall GC content of 25.2%. These data would contribute to the design of novel molecular markers for population and evolutionary studies of this and related orthopteran species.
Co-evolutionary constraints of globular proteins correlate with their folding rates.

PubMed

Mallik, Saurav; Kundu, Sudip

2015-08-04

Folding rates (lnkf) of globular proteins correlate with their biophysical properties, but relationship between lnkf and patterns of sequence evolution remains elusive. We introduce 'relative co-evolution order' (rCEO) as length-normalized average primary chain separation of co-evolving pairs (CEPs), which negatively correlates with lnkf. In addition to pairs in native 3D contact, indirectly connected and structurally remote CEPs probably also play critical roles in protein folding. Correlation between rCEO and lnkf is stronger in multi-state proteins than two-state proteins, contrasting the case of contact order (co), where stronger correlation is found in two-state proteins. Finally, rCEO, co and lnkf are fitted into a 3D linear correlation. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Genetic and Epigenetic Alterations of Brassica nigra Introgression Lines from Somatic Hybridization: A Resource for Cauliflower Improvement.

PubMed

Wang, Gui-Xiang; Lv, Jing; Zhang, Jie; Han, Shuo; Zong, Mei; Guo, Ning; Zeng, Xing-Ying; Zhang, Yue-Yun; Wang, You-Ping; Liu, Fan

2016-01-01

Broad phenotypic variations were obtained previously in derivatives from the asymmetric somatic hybridization of cauliflower "Korso" (Brassica oleracea var. botrytis, 2n = 18, CC genome) and black mustard "G1/1" (Brassica nigra, 2n = 16, BB genome). However, the mechanisms underlying these variations were unknown. In this study, 28 putative introgression lines (ILs) were pre-selected according to a series of morphological (leaf shape and color, plant height and branching, curd features, and flower traits) and physiological (black rot/club root resistance) characters. Multi-color fluorescence in situ hybridization revealed that these plants contained 18 chromosomes derived from "Korso." Molecular marker (65 simple sequence repeats and 77 amplified fragment length polymorphisms) analysis identified the presence of "G1/1" DNA segments (average 7.5%). Additionally, DNA profiling revealed many genetic and epigenetic differences among the ILs, including sequence alterations, deletions, and variation in patterns of cytosine methylation. The frequency of fragments lost (5.1%) was higher than presence of novel bands (1.4%), and the presence of fragments specific to Brassica carinata (BBCC 2n = 34) were common (average 15.5%). Methylation-sensitive amplified polymorphism analysis indicated that methylation changes were common and that hypermethylation (12.4%) was more frequent than hypomethylation (4.8%). Our results suggested that asymmetric somatic hybridization and alien DNA introgression induced genetic and epigenetic alterations. Thus, these ILs represent an important, novel germplasm resource for cauliflower improvement that can be mined for diverse traits of interest to breeders and researchers.
Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes

PubMed Central

Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin

2015-01-01

Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020
Characterization of genetic sequence variation of 58 STR loci in four major population groups.

PubMed

Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce

2016-11-01

Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A better sequence-read simulator program for metagenomics.

PubMed

Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

2014-01-01

There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
Harmonic Series Meets Fibonacci Sequence

ERIC Educational Resources Information Center

Chen, Hongwei; Kennedy, Chris

2012-01-01

The terms of a conditionally convergent series may be rearranged to converge to any prescribed real value. What if the harmonic series is grouped into Fibonacci length blocks? Or the harmonic series is arranged in alternating Fibonacci length blocks? Or rearranged and alternated into separate blocks of even and odd terms of Fibonacci length?
Sampling solution traces for the problem of sorting permutations by signed reversals

PubMed Central

2012-01-01

Background Traditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be ♯P-complete. Results We propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements. Conclusions We analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces. PMID:22704580
Career Length and Performance Among Professional Baseball Players Returning to Play After Hip Arthroscopy.

PubMed

Frangiamore, Salvatore J; Mannava, Sandeep; Briggs, Karen K; McNamara, Shannen; Philippon, Marc J

2018-05-01

Hip arthroscopy has been shown to be effective for management of symptomatic femoroacetabular impingement (FAI) in professional athletes; however, it is unclear how hip arthroscopy affects career length and performance when professional baseball players return to play. To determine the career length, performance, and return-to-play rates of professional baseball players after undergoing arthroscopy for symptomatic FAI. Case series; Level of evidence, 4. Forty-four professional baseball players (51 hips) underwent hip arthroscopy for FAI between 2000 and 2015 by a single surgeon. Data were retrieved for each player from MLB.com , MiLB.com , Baseballreference.com , and individual team websites. Age, surgical procedure, and intraoperative findings were also used in analysis. Return to play was defined as playing in a preseason or regular season major or minor league game after arthroscopy. Career length was measured as total years played before and after arthroscopy. Performance measures included earned run average for pitchers, batting average for position players, and games played for all players. Of the 44 players, there were 21 pitchers and 23 position players. Ninety-five percent (n = 42) were able to return to professional baseball after hip arthroscopy. The mean career length for all players was 9.5 years. The mean career length after return to play was 3.6 seasons (range, 1-14 seasons). Pitchers had a mean career length of 8.7 years (3.3 after surgery), and position players averaged a career length of 10 years (3.9 after surgery). There were no differences in performance measures between preinjury and postoperative values. Professional baseball players undergoing hip arthroscopy for FAI returned to sport and had similar performance as they did before injury. The average career length was 9.5 years. In our study cohort, more pitchers than position players underwent hip arthroscopy. Hip arthroscopy appears to be an effective surgical intervention, allowing for return to play after complete recovery.
Common position of indels that cause deviations from canonical genome organization in different measles virus strains.

PubMed

Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko

2016-07-29

The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.