potential coding sequences: Topics by Science.gov

Sample records for potential coding sequences

Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps.

PubMed

Aires-de-Sousa, João; Aires-de-Sousa, Luisa

2003-01-01

We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002
Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

PubMed

Suyama, Mikita; Lathe, Warren C; Bork, Peer

2005-10-10

We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.
Pattern recognition of electronic bit-sequences using a semiconductor mode-locked laser and spatial light modulators

NASA Astrophysics Data System (ADS)

Bhooplapur, Sharad; Akbulut, Mehmetkan; Quinlan, Franklyn; Delfyett, Peter J.

2010-04-01

A novel scheme for recognition of electronic bit-sequences is demonstrated. Two electronic bit-sequences that are to be compared are each mapped to a unique code from a set of Walsh-Hadamard codes. The codes are then encoded in parallel on the spectral phase of the frequency comb lines from a frequency-stabilized mode-locked semiconductor laser. Phase encoding is achieved by using two independent spatial light modulators based on liquid crystal arrays. Encoded pulses are compared using interferometric pulse detection and differential balanced photodetection. Orthogonal codes eight bits long are compared, and matched codes are successfully distinguished from mismatched codes with very low error rates, of around 10-18. This technique has potential for high-speed, high accuracy recognition of bit-sequences, with applications in keyword searches and internet protocol packet routing.
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data

PubMed Central

Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick

2011-01-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752
RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

PubMed

Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick

2011-04-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

PubMed Central

Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

2004-01-01

The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464
Evaluating the protein coding potential of exonized transposable element sequences

PubMed Central

Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

2007-01-01

Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
Sense-antisense (complementary) peptide interactions and the proteomic code; potential opportunities in biology and pharmaceutical science.

PubMed

Miller, Andrew D

2015-02-01

A sense peptide can be defined as a peptide whose sequence is coded by the nucleotide sequence (read 5' → 3') of the sense (positive) strand of DNA. Conversely, an antisense (complementary) peptide is coded by the corresponding nucleotide sequence (read 5' → 3') of the antisense (negative) strand of DNA. Research has been accumulating steadily to suggest that sense peptides are capable of specific interactions with their corresponding antisense peptides. Unfortunately, although more and more examples of specific sense-antisense peptide interactions are emerging, the very idea of such interactions does not conform to standard biology dogma and so there remains a sizeable challenge to lift this concept from being perceived as a peripheral phenomenon if not worse, into becoming part of the scientific mainstream. Specific interactions have now been exploited for the inhibition of number of widely different protein-protein and protein-receptor interactions in vitro and in vivo. Further, antisense peptides have also been used to induce the production of antibodies targeted to specific receptors or else the production of anti-idiotypic antibodies targeted against auto-antibodies. Such illustrations of utility would seem to suggest that observed sense-antisense peptide interactions are not just the consequence of a sequence of coincidental 'lucky-hits'. Indeed, at the very least, one might conclude that sense-antisense peptide interactions represent a potentially new and different source of leads for drug discovery. But could there be more to come from studies in this area? Studies on the potential mechanism of sense-antisense peptide interactions suggest that interactions may be driven by amino acid residue interactions specified from the genetic code. If so, such specified amino acid residue interactions could form the basis for an even wider amino acid residue interaction code (proteomic code) that links gene sequences to actual protein structure and function, even entire genomes to entire proteomes. The possibility that such a proteomic code should exist is discussed. So too the potential implications for biology and pharmaceutical science are also discussed were such a code to exist.
[Correlation of codon biases and potential secondary structures with mRNA translation efficiency in unicellular organisms].

PubMed

Vladimirov, N V; Likhoshvaĭ, V A; Matushkin, Iu G

2007-01-01

Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/.
Marks of Change in Sequences

NASA Astrophysics Data System (ADS)

Jürgensen, H.

2011-12-01

Given a sequence of events, how does one recognize that a change has occurred? We explore potential definitions of the concept of change in a sequence and propose that words in relativized solid codes might serve as indicators of change.
A Sabin 2-related poliovirus recombinant contains a homologous sequence of human enterovirus species C in the viral polymerase coding region.

PubMed

Zhang, Yong; Zhang, Fan; Zhu, Shuangli; Chen, Li; Yan, Dongmei; Wang, Dongyan; Tang, Ruiyan; Zhu, Hui; Hou, Xiaohui; An, Hongqiu; Zhang, Hong; Xu, Wenbo

2010-02-01

A type 2 vaccine-related poliovirus (strain CHN3024), differing from the Sabin 2 strain by 0.44% in the VP1 coding region was isolated from a patient with vaccine-associated paralytic poliomyelitis. Sequences downstream of nucleotide position 6735 (3D(pol) coding region) were derived from an unidentified sequence; no close match for a potential parent was found, but it could be classified into a non-polio human enteroviruses species C (HEV-C) phylogeny. The virus differed antigenically from the parental Sabin strain, having an amino acid substitution in the neutralizing antigenic site 1. The similarity between CHN3024 and Sabin 2 sequences suggests that the recombination was recent; this is supported by the estimation that the initiating OPV dose was given only 36-75 days before sampling. The patient's clinical manifestations, intratypic differentiation examination, and whole-genome sequencing showed that this recombinant exhibited characteristics of neurovirulent vaccine-derived polioviruses (VDPV), which may, thus, pose a potential threat to a polio-free world.
Code-modulated visual evoked potentials using fast stimulus presentation and spatiotemporal beamformer decoding.

PubMed

Wittevrongel, Benjamin; Van Wolputte, Elia; Van Hulle, Marc M

2017-11-08

When encoding visual targets using various lagged versions of a pseudorandom binary sequence of luminance changes, the EEG signal recorded over the viewer's occipital pole exhibits so-called code-modulated visual evoked potentials (cVEPs), the phase lags of which can be tied to these targets. The cVEP paradigm has enjoyed interest in the brain-computer interfacing (BCI) community for the reported high information transfer rates (ITR, in bits/min). In this study, we introduce a novel decoding algorithm based on spatiotemporal beamforming, and show that this algorithm is able to accurately identify the gazed target. Especially for a small number of repetitions of the coding sequence, our beamforming approach significantly outperforms an optimised support vector machine (SVM)-based classifier, which is considered state-of-the-art in cVEP-based BCI. In addition to the traditional 60 Hz stimulus presentation rate for the coding sequence, we also explore the 120 Hz rate, and show that the latter enables faster communication, with a maximal median ITR of 172.87 bits/min. Finally, we also report on a transition effect in the EEG signal following the onset of the stimulus sequence, and recommend to exclude the first 150 ms of the trials from decoding when relying on a single presentation of the stimulus sequence.
RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.

PubMed

Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo

2012-01-01

In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
Curated eutherian third party data gene data sets.

PubMed

Premzl, Marko

2016-03-01

The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Kangaroo – A pattern-matching program for biological sequences

PubMed Central

2002-01-01

Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
Draft genome sequence of the phenazine-producing Pseudomonas fluorescens strain 2-79

USDA-ARS?s Scientific Manuscript database

Pseudomonas fluorescens strain 2-79, a natural isolate of the rhizosphere of wheat (Triticum aestivum L.), possesses antagonistic potential toward several fungal pathogens. We report the draft genome sequence of strain 2-79, which comprises 5,674 protein-coding sequences....
The primitive code and repeats of base oligomers as the primordial protein-encoding sequence.

PubMed Central

Ohno, S; Epplen, J T

1983-01-01

Even if the prebiotic self-replication of nucleic acids and the subsequent emergence of primitive, enzyme-independent tRNAs are accepted as plausible, the origin of life by spontaneous generation still appears improbable. This is because the just-emerged primitive translational machinery had to cope with base sequences that were not preselected for their coding potentials. Particularly if the primitive mitochondria-like code with four chain-terminating base triplets preceded the universal code, the translation of long, randomly generated, base sequences at this critical stage would have merely resulted in the production of short oligopeptides instead of long polypeptide chains. We present the base sequence of a mouse transcript containing tetranucleotide repeats conserved during evolution. Even if translated in accordance with the primitive mitochondria-like code, this transcript in its three reading frames can yield 245-, 246-, and 251-residue-long tetrapeptidic periodical polypeptides that are already acquiring longer periodicities. We contend that the first set of base sequences translated at the beginning of life were such oligonucleotide repeats. By quickly acquiring longer periodicities, their products must have soon gained characteristic secondary structures--alpha-helical or beta-sheet or both. PMID:6574491
Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.

PubMed

Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka

2018-01-01

Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.
Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.

PubMed

Ohno, S

1984-01-01

Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.
Flexible and polarization-controllable diffusion metasurface with optical transparency

NASA Astrophysics Data System (ADS)

Zhuang, Yaqiang; Wang, Guangming; Liang, Jiangang; Cai, Tong; Guo, Wenlong; Zhang, Qingfeng

2017-11-01

In this paper, a novel coding metasurface is proposed to realize polarization-controllable diffusion scattering. The anisotropic Jerusalem-cross unit cell is employed as the basic coding element due to its polarization-dependent phase response. The isotropic random coding sequence is firstly designed to obtain diffusion scattering, and the anisotropic random coding sequence is subsequently realized by adding different periodic coding sequences to the original isotropic one along different directions. For demonstration, we designed and fabricated a flexible polarization-controllable diffusion metasurface (PCDM) with both chessboard diffusion and hedge diffusion under different polarizations. The specular scattering reduction performance of the anisotropic metasurface is better than the isotropic one because the scattered energies are redirected away from the specular reflection direction. For potential applications, the flexible PCDM wrapped around a cylinder structure is investigated and tested for polarization-controllable diffusion scattering. The numerical and experimental results coincide well, indicating anisotropic low scatterings with comparable performances. This paper provides an alternative approach for designing high-performance, flexible, low-scattering platforms.

COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features.

PubMed

Hu, Long; Xu, Zhiyu; Hu, Boqin; Lu, Zhi John

2017-01-09

Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes.

PubMed

Albrechtsen, A; Grarup, N; Li, Y; Sparsø, T; Tian, G; Cao, H; Jiang, T; Kim, S Y; Korneliussen, T; Li, Q; Nie, C; Wu, R; Skotte, L; Morris, A P; Ladenvall, C; Cauchi, S; Stančáková, A; Andersen, G; Astrup, A; Banasik, K; Bennett, A J; Bolund, L; Charpentier, G; Chen, Y; Dekker, J M; Doney, A S F; Dorkhan, M; Forsen, T; Frayling, T M; Groves, C J; Gui, Y; Hallmans, G; Hattersley, A T; He, K; Hitman, G A; Holmkvist, J; Huang, S; Jiang, H; Jin, X; Justesen, J M; Kristiansen, K; Kuusisto, J; Lajer, M; Lantieri, O; Li, W; Liang, H; Liao, Q; Liu, X; Ma, T; Ma, X; Manijak, M P; Marre, M; Mokrosiński, J; Morris, A D; Mu, B; Nielsen, A A; Nijpels, G; Nilsson, P; Palmer, C N A; Rayner, N W; Renström, F; Ribel-Madsen, R; Robertson, N; Rolandsson, O; Rossing, P; Schwartz, T W; Slagboom, P E; Sterner, M; Tang, M; Tarnow, L; Tuomi, T; van't Riet, E; van Leeuwen, N; Varga, T V; Vestmar, M A; Walker, M; Wang, B; Wang, Y; Wu, H; Xi, F; Yengo, L; Yu, C; Zhang, X; Zhang, J; Zhang, Q; Zhang, W; Zheng, H; Zhou, Y; Altshuler, D; 't Hart, L M; Franks, P W; Balkau, B; Froguel, P; McCarthy, M I; Laakso, M; Groop, L; Christensen, C; Brandslund, I; Lauritzen, T; Witte, D R; Linneberg, A; Jørgensen, T; Hansen, T; Wang, J; Nielsen, R; Pedersen, O

2013-02-01

Human complex metabolic traits are in part regulated by genetic determinants. Here we applied exome sequencing to identify novel associations of coding polymorphisms at minor allele frequencies (MAFs) >1% with common metabolic phenotypes. The study comprised three stages. We performed medium-depth (8×) whole exome sequencing in 1,000 cases with type 2 diabetes, BMI >27.5 kg/m(2) and hypertension and in 1,000 controls (stage 1). We selected 16,192 polymorphisms nominally associated (p < 0.05) with case-control status, from four selected annotation categories or from loci reported to associate with metabolic traits. These variants were genotyped in 15,989 Danes to search for association with 12 metabolic phenotypes (stage 2). In stage 3, polymorphisms showing potential associations were genotyped in a further 63,896 Europeans. Exome sequencing identified 70,182 polymorphisms with MAF >1%. In stage 2 we identified 51 potential associations with one or more of eight metabolic phenotypes covered by 45 unique polymorphisms. In meta-analyses of stage 2 and stage 3 results, we demonstrated robust associations for coding polymorphisms in CD300LG (fasting HDL-cholesterol: MAF 3.5%, p = 8.5 × 10(-14)), COBLL1 (type 2 diabetes: MAF 12.5%, OR 0.88, p = 1.2 × 10(-11)) and MACF1 (type 2 diabetes: MAF 23.4%, OR 1.10, p = 8.2 × 10(-10)). We applied exome sequencing as a basis for finding genetic determinants of metabolic traits and show the existence of low-frequency and common coding polymorphisms with impact on common metabolic traits. Based on our study, coding polymorphisms with MAF above 1% do not seem to have particularly high effect sizes on the measured metabolic traits.
ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences.

PubMed

Lee, Imchang; Chalita, Mauricio; Ha, Sung-Min; Na, Seong-In; Yoon, Seok-Hwan; Chun, Jongsik

2017-06-01

Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.
An electrocorticographic BCI using code-based VEP for control in video applications: a single-subject study

PubMed Central

Kapeller, Christoph; Kamada, Kyousuke; Ogawa, Hiroshi; Prueckl, Robert; Scharinger, Josef; Guger, Christoph

2014-01-01

A brain-computer-interface (BCI) allows the user to control a device or software with brain activity. Many BCIs rely on visual stimuli with constant stimulation cycles that elicit steady-state visual evoked potentials (SSVEP) in the electroencephalogram (EEG). This EEG response can be generated with a LED or a computer screen flashing at a constant frequency, and similar EEG activity can be elicited with pseudo-random stimulation sequences on a screen (code-based BCI). Using electrocorticography (ECoG) instead of EEG promises higher spatial and temporal resolution and leads to more dominant evoked potentials due to visual stimulation. This work is focused on BCIs based on visual evoked potentials (VEP) and its capability as a continuous control interface for augmentation of video applications. One 35 year old female subject with implanted subdural grids participated in the study. The task was to select one out of four visual targets, while each was flickering with a code sequence. After a calibration run including 200 code sequences, a linear classifier was used during an evaluation run to identify the selected visual target based on the generated code-based VEPs over 20 trials. Multiple ECoG buffer lengths were tested and the subject reached a mean online classification accuracy of 99.21% for a window length of 3.15 s. Finally, the subject performed an unsupervised free run in combination with visual feedback of the current selection. Additionally, an algorithm was implemented that allowed to suppress false positive selections and this allowed the subject to start and stop the BCI at any time. The code-based BCI system attained very high online accuracy, which makes this approach very promising for control applications where a continuous control signal is needed. PMID:25147509
Using hidden Markov models and observed evolution to annotate viral genomes.

PubMed

McCauley, Stephen; Hein, Jotun

2006-06-01

ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation. The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

PubMed

Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

2017-08-30

Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
Natural Antisense Transcripts: Molecular Mechanisms and Implications in Breast Cancers

PubMed Central

Latgé, Guillaume; Poulet, Christophe; Bours, Vincent; Jerusalem, Guy

2018-01-01

Natural antisense transcripts are RNA sequences that can be transcribed from both DNA strands at the same locus but in the opposite direction from the gene transcript. Because strand-specific high-throughput sequencing of the antisense transcriptome has only been available for less than a decade, many natural antisense transcripts were first described as long non-coding RNAs. Although the precise biological roles of natural antisense transcripts are not known yet, an increasing number of studies report their implication in gene expression regulation. Their expression levels are altered in many physiological and pathological conditions, including breast cancers. Among the potential clinical utilities of the natural antisense transcripts, the non-coding|coding transcript pairs are of high interest for treatment. Indeed, these pairs can be targeted by antisense oligonucleotides to specifically tune the expression of the coding-gene. Here, we describe the current knowledge about natural antisense transcripts, their varying molecular mechanisms as gene expression regulators, and their potential as prognostic or predictive biomarkers in breast cancers. PMID:29301303
Natural Antisense Transcripts: Molecular Mechanisms and Implications in Breast Cancers.

PubMed

Latgé, Guillaume; Poulet, Christophe; Bours, Vincent; Josse, Claire; Jerusalem, Guy

2018-01-02

Natural antisense transcripts are RNA sequences that can be transcribed from both DNA strands at the same locus but in the opposite direction from the gene transcript. Because strand-specific high-throughput sequencing of the antisense transcriptome has only been available for less than a decade, many natural antisense transcripts were first described as long non-coding RNAs. Although the precise biological roles of natural antisense transcripts are not known yet, an increasing number of studies report their implication in gene expression regulation. Their expression levels are altered in many physiological and pathological conditions, including breast cancers. Among the potential clinical utilities of the natural antisense transcripts, the non-coding|coding transcript pairs are of high interest for treatment. Indeed, these pairs can be targeted by antisense oligonucleotides to specifically tune the expression of the coding-gene. Here, we describe the current knowledge about natural antisense transcripts, their varying molecular mechanisms as gene expression regulators, and their potential as prognostic or predictive biomarkers in breast cancers.
Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

PubMed

Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

2007-01-01

The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
Tenebrio molitor antifreeze protein gene identification and regulation.

PubMed

Qin, Wensheng; Walker, Virginia K

2006-02-15

The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.
Reading the Second Code: Mapping Epigenomes to Understand Plant Growth, Development, and Adaptation to the Environment[OA

PubMed Central

2012-01-01

We have entered a new era in agricultural and biomedical science made possible by remarkable advances in DNA sequencing technologies. The complete sequence of an individual’s set of chromosomes (collectively, its genome) provides a primary genetic code for what makes that individual unique, just as the contents of every personal computer reflect the unique attributes of its owner. But a second code, composed of “epigenetic” layers of information, affects the accessibility of the stored information and the execution of specific tasks. Nature’s second code is enigmatic and must be deciphered if we are to fully understand and optimize the genetic potential of crop plants. The goal of the Epigenomics of Plants International Consortium is to crack this second code, and ultimately master its control, to help catalyze a new green revolution. PMID:22751210
Draft Genome Sequence of Leptolyngbya sp. KIOST-1, a Filamentous Cyanobacterium with Biotechnological Potential for Alimentary Purposes

PubMed Central

Kim, Ji Hyung

2016-01-01

Here, we report the draft genome of cyanobacterium Leptolyngbya sp. KIOST-1 isolated from a microalgal culture pond in South Korea. The genome consists of 13 contigs containing 6,320,172 bp, and a total of 5,327 coding sequences were predicted. This genomic information will allow further exploitation of its biotechnological potential for alimentary purposes. PMID:27635005
Coded spread spectrum digital transmission system design study

NASA Technical Reports Server (NTRS)

Heller, J. A.; Odenwalder, J. P.; Viterbi, A. J.

1974-01-01

Results are presented of a comprehensive study of the performance of Viterbi-decoded convolutional codes in the presence of nonideal carrier tracking and bit synchronization. A constraint length 7, rate 1/3 convolutional code and parameters suitable for the space shuttle coded communications links are used. Mathematical models are developed and theoretical and simulation results are obtained to determine the tracking and acquisition performance of the system. Pseudorandom sequence spread spectrum techniques are also considered to minimize potential degradation caused by multipath.
Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Two-terminal video coding.

PubMed

Yang, Yang; Stanković, Vladimir; Xiong, Zixiang; Zhao, Wei

2009-03-01

Following recent works on the rate region of the quadratic Gaussian two-terminal source coding problem and limit-approaching code designs, this paper examines multiterminal source coding of two correlated, i.e., stereo, video sequences to save the sum rate over independent coding of both sequences. Two multiterminal video coding schemes are proposed. In the first scheme, the left sequence of the stereo pair is coded by H.264/AVC and used at the joint decoder to facilitate Wyner-Ziv coding of the right video sequence. The first I-frame of the right sequence is successively coded by H.264/AVC Intracoding and Wyner-Ziv coding. An efficient stereo matching algorithm based on loopy belief propagation is then adopted at the decoder to produce pixel-level disparity maps between the corresponding frames of the two decoded video sequences on the fly. Based on the disparity maps, side information for both motion vectors and motion-compensated residual frames of the right sequence are generated at the decoder before Wyner-Ziv encoding. In the second scheme, source splitting is employed on top of classic and Wyner-Ziv coding for compression of both I-frames to allow flexible rate allocation between the two sequences. Experiments with both schemes on stereo video sequences using H.264/AVC, LDPC codes for Slepian-Wolf coding of the motion vectors, and scalar quantization in conjunction with LDPC codes for Wyner-Ziv coding of the residual coefficients give a slightly lower sum rate than separate H.264/AVC coding of both sequences at the same video quality.
Benzofurazane as a new redox label for electrochemical detection of DNA: towards multipotential redox coding of DNA bases.

PubMed

Balintová, Jana; Plucnara, Medard; Vidláková, Pavlína; Pohl, Radek; Havran, Luděk; Fojta, Miroslav; Hocek, Michal

2013-09-16

Benzofurazane has been attached to nucleosides and dNTPs, either directly or through an acetylene linker, as a new redox label for electrochemical analysis of nucleotide sequences. Primer extension incorporation of the benzofurazane-modified dNTPs by polymerases has been developed for the construction of labeled oligonucleotide probes. In combination with nitrophenyl and aminophenyl labels, we have successfully developed a three-potential coding of DNA bases and have explored the relevant electrochemical potentials. The combination of benzofurazane and nitrophenyl reducible labels has proved to be excellent for ratiometric analysis of nucleotide sequences and is suitable for bioanalytical applications. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Draft Genome Sequence of Bacillus licheniformis Strain YNP1-TSU Isolated from Whiterock Springs in Yellowstone National Park

PubMed Central

O'Hair, Joshua A.; Li, Hui; Thapa, Santosh; Scholz, Matthew B.

2017-01-01

ABSTRACT Novel cellulolytic microorganisms can potentially influence second-generation biofuel production. This paper reports the draft genome sequence of Bacillus licheniformis strain YNP1-TSU, isolated from hydrothermal-vegetative microbiomes inside Yellowstone National Park. The assembled sequence contigs predicted 4,230 coding genes, 66 tRNAs, and 10 rRNAs through automated annotation. PMID:28254968
Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

PubMed

Szymanski, Maciej; Karlowski, Wojciech M

2016-01-01

In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
Draft Genome Sequence of Leptolyngbya sp. KIOST-1, a Filamentous Cyanobacterium with Biotechnological Potential for Alimentary Purposes.

PubMed

Kim, Ji Hyung; Kang, Do-Hyung

2016-09-15

Here, we report the draft genome of cyanobacterium Leptolyngbya sp. KIOST-1 isolated from a microalgal culture pond in South Korea. The genome consists of 13 contigs containing 6,320,172 bp, and a total of 5,327 coding sequences were predicted. This genomic information will allow further exploitation of its biotechnological potential for alimentary purposes. Copyright © 2016 Kim and Kang.

Combined sequence and sequence-structure-based methods for analyzing RAAS gene SNPs: a computational approach.

PubMed

Singh, Kh Dhanachandra; Karthikeyan, Muthusamy

2014-12-01

The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool.
Performance of concatenated Reed-Solomon trellis-coded modulation over Rician fading channels

NASA Technical Reports Server (NTRS)

Moher, Michael L.; Lodge, John H.

1990-01-01

A concatenated coding scheme for providing very reliable data over mobile-satellite channels at power levels similar to those used for vocoded speech is described. The outer code is a shorter Reed-Solomon code which provides error detection as well as error correction capabilities. The inner code is a 1-D 8-state trellis code applied independently to both the inphase and quadrature channels. To achieve the full error correction potential of this inner code, the code symbols are multiplexed with a pilot sequence which is used to provide dynamic channel estimation and coherent detection. The implementation structure of this scheme is discussed and its performance is estimated.
Novel methodologies for spectral classification of exon and intron sequences

NASA Astrophysics Data System (ADS)

Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

2012-12-01

Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
StarScan: a web server for scanning small RNA targets from degradome sequencing data.

PubMed

Liu, Shun; Li, Jun-Hao; Wu, Jie; Zhou, Ke-Ren; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

2015-07-01

Endogenous small non-coding RNAs (sRNAs), including microRNAs, PIWI-interacting RNAs and small interfering RNAs, play important gene regulatory roles in animals and plants by pairing to the protein-coding and non-coding transcripts. However, computationally assigning these various sRNAs to their regulatory target genes remains technically challenging. Recently, a high-throughput degradome sequencing method was applied to identify biologically relevant sRNA cleavage sites. In this study, an integrated web-based tool, StarScan (sRNA target Scan), was developed for scanning sRNA targets using degradome sequencing data from 20 species. Given a sRNA sequence from plants or animals, our web server performs an ultrafast and exhaustive search for potential sRNA-target interactions in annotated and unannotated genomic regions. The interactions between small RNAs and target transcripts were further evaluated using a novel tool, alignScore. A novel tool, degradomeBinomTest, was developed to quantify the abundance of degradome fragments located at the 9-11th nucleotide from the sRNA 5' end. This is the first web server for discovering potential sRNA-mediated RNA cleavage events in plants and animals, which affords mechanistic insights into the regulatory roles of sRNAs. The StarScan web server is available at http://mirlab.sysu.edu.cn/starscan/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Draft Genome Sequence of Pseudomonas chlororaphis ATCC 9446, a Nonpathogenic Bacterium with Bioremediation and Industrial Potential.

PubMed

Moreno-Avitia, Fabian; Lozano, Luis; Utrilla, Jose; Bolívar, Francisco; Escalante, Adelfo

2017-06-08

Pseudomonas chlororaphis strain ATCC 9446 is a biocontrol-related organism. We report here its draft genome sequence assembled into 35 contigs consisting of 6,783,030 bp. Genome annotation predicted a total of 6,200 genes, 6,128 coding sequences, 81 pseudogenes, 58 tRNAs, 4 noncoding RNAs (ncRNAs), and 41 frameshifted genes. Copyright © 2017 Moreno-Avitia et al.
Role of non-coding RNAs in non-aging-related neurological disorders.

PubMed

Vieira, A S; Dogini, D B; Lopes-Cendes, I

2018-06-11

Protein coding sequences represent only 2% of the human genome. Recent advances have demonstrated that a significant portion of the genome is actively transcribed as non-coding RNA molecules. These non-coding RNAs are emerging as key players in the regulation of biological processes, and act as "fine-tuners" of gene expression. Neurological disorders are caused by a wide range of genetic mutations, epigenetic and environmental factors, and the exact pathophysiology of many of these conditions is still unknown. It is currently recognized that dysregulations in the expression of non-coding RNAs are present in many neurological disorders and may be relevant in the mechanisms leading to disease. In addition, circulating non-coding RNAs are emerging as potential biomarkers with great potential impact in clinical practice. In this review, we discuss mainly the role of microRNAs and long non-coding RNAs in several neurological disorders, such as epilepsy, Huntington disease, fragile X-associated ataxia, spinocerebellar ataxias, amyotrophic lateral sclerosis (ALS), and pain. In addition, we give information about the conditions where microRNAs have demonstrated to be potential biomarkers such as in epilepsy, pain, and ALS.
Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

PubMed

Galián, José A; Rosato, Marcela; Rosselló, Josep A

2014-03-01

Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.
Complete Sequence of the mitochondrial genome of the tapeworm Hymenolepis diminuta: Gene arrangements indicate that platyhelminths are eutrochozoans

DOE Office of Scientific and Technical Information (OSTI.GOV)

von Nickisch-Rosenegk, Markus; Brown, Wesley M.; Boore, Jeffrey L.

2001-01-01

Using ''long-PCR'' we have amplified in overlapping fragments the complete mitochondrial genome of the tapeworm Hymenolepis diminuta (Platyhelminthes: Cestoda) and determined its 13,900 nucleotide sequence. The gene content is the same as that typically found for animal mitochondrial DNA (mtDNA) except that atp8 appears to be lacking, a condition found previously for several other animals. Despite the small size of this mtDNA, there are two large non-coding regions, one of which contains 13 repeats of a 31 nucleotide sequence and a potential stem-loop structure of 25 base pairs with an 11-member loop. Large potential secondary structures are identified also formore » the non-coding regions of two other cestode mtDNAs. Comparison of the mitochondrial gene arrangement of H. diminuta with those previously published supports a phylogenetic position of flatworms as members of the Eutrochozoa, rather than being basal to either a clade of protostomes or a clade of coelomates.« less
Molecular detection of bovine Noroviruses in Argentinean dairy calves: Circulation of a tentative new genotype.

PubMed

Ferragut, Fátima; Vega, Celina G; Mauroy, Axel; Conceição-Neto, Nádia; Zeller, Mark; Heylen, Elisabeth; Uriarte, Enrique Louge; Bilbao, Gladys; Bok, Marina; Matthijnssens, Jelle; Thiry, Etienne; Badaracco, Alejandra; Parreño, Viviana

2016-06-01

Bovine noroviruses are enteric pathogens detected in fecal samples of both diarrheic and non-diarrheic calves from several countries worldwide. However, epidemiological information regarding bovine noroviruses is still lacking for many important cattle producing countries from South America. In this study, three bovine norovirus genogroup III sequences were determined by conventional RT-PCR and Sanger sequencing in feces from diarrheic dairy calves from Argentina (B4836, B4848, and B4881, all collected in 2012). Phylogenetic studies based on a partial coding region for the RNA-dependent RNA polymerase (RdRp, 503 nucleotides) of these three samples suggested that two of them (B4836 and B4881) belong to genotype 2 (GIII.2) while the third one (B4848) was more closely related to genotype 1 (GIII.1) strains. By deep sequencing, the capsid region from two of these strains could be determined. This confirmed the circulation of genotype 1 (B4848) together with the presence of another sequence (B4881) sharing its highest genetic relatedness with genotype 1, but sufficiently distant to constitute a new genotype. This latter strain was shown in silico to be a recombinant: phylogenetic divergence was detected between its RNA-dependent RNA polymerase coding sequence (genotype GIII.2) and its capsid protein coding sequence (genotype GIII.1 or a potential norovirus genotype). According to this data, this strain could be the second genotype GIII.2_GIII.1 bovine norovirus recombinant described in literature worldwide. Further analysis suggested that this strain could even be a potential norovirus GIII genotype, tentatively named GIII.4. The data provides important epidemiological and evolutionary information on bovine noroviruses circulating in South America. Copyright © 2016. Published by Elsevier B.V.
Aquaporin 2 of Rhipicephalus (Boophilus) microplus as a potential target to control ticks and tick-borne parasites

USDA-ARS?s Scientific Manuscript database

In a collaboration with Washington State University and ARS-Pullman, WA researchers, we identified and sequenced a 1,059 base pair Rhipicephalus microplus transcript that contained the coding region for a water channel protein, Aquaporin 2 (RmAQP2). The clone sequencing resulted in the production of...
Draft Genome Sequence of the Entomopathogenic Bacterium Bacillus pumilus 15.1, a Strain Highly Toxic to the Mediterranean Fruit Fly Ceratitis capitata

PubMed Central

García-Ramón, Diana C.; Palma, Leopoldo; Berry, Colin; Osuna, Antonio

2015-01-01

We present the draft whole-genome sequence of the entomopathogenic Bacillus pumilus 15.1 strain that consists of 3,795,691 bp and 3,776 predicted protein-coding genes. This genome sequence provides the basis for understanding the potential mechanism behind the toxicity and virulence of B. pumilus 15.1 against the Mediterranean fruit fly. PMID:26404596
De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing.

PubMed

Santos, Leonardo N; Silva, Eduardo S; Santos, André S; De Sá, Pablo H; Ramos, Rommel T; Silva, Artur; Cooper, Philip J; Barreto, Maurício L; Loureiro, Sebastião; Pinheiro, Carina S; Alcantara-Neves, Neuza M; Pacheco, Luis G C

2016-07-01

Infection with helminthic parasites, including the soil-transmitted helminth Trichuris trichiura (human whipworm), has been shown to modulate host immune responses and, consequently, to have an impact on the development and manifestation of chronic human inflammatory diseases. De novo derivation of helminth proteomes from sequencing of transcriptomes will provide valuable data to aid identification of parasite proteins that could be evaluated as potential immunotherapeutic molecules in near future. Herein, we characterized the transcriptome of the adult stage of the human whipworm T. trichiura, using next-generation sequencing technology and a de novo assembly strategy. Nearly 17.6 million high-quality clean reads were assembled into 6414 contiguous sequences, with an N50 of 1606bp. In total, 5673 protein-encoding sequences were confidentially identified in the T. trichiura adult worm transcriptome; of these, 1013 sequences represent potential newly discovered proteins for the species, most of which presenting orthologs already annotated in the related species T. suis. A number of transcripts representing probable novel non-coding transcripts for the species T. trichiura were also identified. Among the most abundant transcripts, we found sequences that code for proteins involved in lipid transport, such as vitellogenins, and several chitin-binding proteins. Through a cross-species expression analysis of gene orthologs shared by T. trichiura and the closely related parasites T. suis and T. muris it was possible to find twenty-six protein-encoding genes that are consistently highly expressed in the adult stages of the three helminth species. Additionally, twenty transcripts could be identified that code for proteins previously detected by mass spectrometry analysis of protein fractions of the whipworm somatic extract that present immunomodulatory activities. Five of these transcripts were amongst the most highly expressed protein-encoding sequences in the T. trichiura adult worm. Besides, orthologs of proteins demonstrated to have potent immunomodulatory properties in related parasitic helminths were also predicted from the T. trichiura de novo assembled transcriptome. Copyright © 2016. Published by Elsevier B.V.
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages

PubMed Central

Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages.

PubMed

Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Non-Additive Expression of Homoeologous Genes is Established Upon Polyploidization in Hexaploid Wheat

USDA-ARS?s Scientific Manuscript database

Traditional views on the potential genetic effects of polyploidy in allohexaploid wheat (Triticum aestivum L.) have primarily emphasized aspects of greater coding sequence variation and the enhanced potential to acquire new gene functions through mutation of redundant loci. The extent and significa...
GeneBuilder: interactive in silico prediction of gene structure.

PubMed

Milanesi, L; D'Angelo, D; Rogozin, I B

1999-01-01

Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
The GENCODE exome: sequencing the complete human exome

PubMed Central

Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno

2011-01-01

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695
Improving performance of DS-CDMA systems using chaotic complex Bernoulli spreading codes

NASA Astrophysics Data System (ADS)

Farzan Sabahi, Mohammad; Dehghanfard, Ali

2014-12-01

The most important goal of spreading spectrum communication system is to protect communication signals against interference and exploitation of information by unintended listeners. In fact, low probability of detection and low probability of intercept are two important parameters to increase the performance of the system. In Direct Sequence Code Division Multiple Access (DS-CDMA) systems, these properties are achieved by multiplying the data information in spreading sequences. Chaotic sequences, with their particular properties, have numerous applications in constructing spreading codes. Using one-dimensional Bernoulli chaotic sequence as spreading code is proposed in literature previously. The main feature of this sequence is its negative auto-correlation at lag of 1, which with proper design, leads to increase in efficiency of the communication system based on these codes. On the other hand, employing the complex chaotic sequences as spreading sequence also has been discussed in several papers. In this paper, use of two-dimensional Bernoulli chaotic sequences is proposed as spreading codes. The performance of a multi-user synchronous and asynchronous DS-CDMA system will be evaluated by applying these sequences under Additive White Gaussian Noise (AWGN) and fading channel. Simulation results indicate improvement of the performance in comparison with conventional spreading codes like Gold codes as well as similar complex chaotic spreading sequences. Similar to one-dimensional Bernoulli chaotic sequences, the proposed sequences also have negative auto-correlation. Besides, construction of complex sequences with lower average cross-correlation is possible with the proposed method.
Single stock dynamics on high-frequency data: from a compressed coding perspective.

PubMed

Fushing, Hsieh; Chen, Shu-Chun; Hwang, Chii-Ruey

2014-01-01

High-frequency return, trading volume and transaction number are digitally coded via a nonparametric computing algorithm, called hierarchical factor segmentation (HFS), and then are coupled together to reveal a single stock dynamics without global state-space structural assumptions. The base-8 digital coding sequence, which is capable of revealing contrasting aggregation against sparsity of extreme events, is further compressed into a shortened sequence of state transitions. This compressed digital code sequence vividly demonstrates that the aggregation of large absolute returns is the primary driving force for stimulating both the aggregations of large trading volumes and transaction numbers. The state of system-wise synchrony is manifested with very frequent recurrence in the stock dynamics. And this data-driven dynamic mechanism is seen to correspondingly vary as the global market transiting in and out of contraction-expansion cycles. These results not only elaborate the stock dynamics of interest to a fuller extent, but also contradict some classical theories in finance. Overall this version of stock dynamics is potentially more coherent and realistic, especially when the current financial market is increasingly powered by high-frequency trading via computer algorithms, rather than by individual investors.
Single Stock Dynamics on High-Frequency Data: From a Compressed Coding Perspective

PubMed Central

Fushing, Hsieh; Chen, Shu-Chun; Hwang, Chii-Ruey

2014-01-01

High-frequency return, trading volume and transaction number are digitally coded via a nonparametric computing algorithm, called hierarchical factor segmentation (HFS), and then are coupled together to reveal a single stock dynamics without global state-space structural assumptions. The base-8 digital coding sequence, which is capable of revealing contrasting aggregation against sparsity of extreme events, is further compressed into a shortened sequence of state transitions. This compressed digital code sequence vividly demonstrates that the aggregation of large absolute returns is the primary driving force for stimulating both the aggregations of large trading volumes and transaction numbers. The state of system-wise synchrony is manifested with very frequent recurrence in the stock dynamics. And this data-driven dynamic mechanism is seen to correspondingly vary as the global market transiting in and out of contraction-expansion cycles. These results not only elaborate the stock dynamics of interest to a fuller extent, but also contradict some classical theories in finance. Overall this version of stock dynamics is potentially more coherent and realistic, especially when the current financial market is increasingly powered by high-frequency trading via computer algorithms, rather than by individual investors. PMID:24586235

SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

PubMed Central

Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

2001-01-01

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
Whole-genome landscapes of major melanoma subtypes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hayward, Nicholas K.; Wilmott, James S.; Waddell, Nicola

Melanoma of the skin is a common cancer only in Europeans, whereas it arises in internal body surfaces (mucosal sites) and on the hands and feet (acral sites) in people throughout the world. We report analysis of whole-genome sequences from cutaneous, acral and mucosal subtypes of melanoma. The heavily mutated landscape of coding and non-coding mutations in cutaneous melanoma resolved novel signatures of mutagenesis attributable to ultraviolet radiation. But, acral and mucosal melanomas were dominated by structural changes and mutation signatures of unknown aetiology, not previously identified in melanoma. The number of genes affected by recurrent mutations disrupting non-coding sequencesmore » was similar to that affected by recurrent mutations to coding sequences. Significantly mutated genes included BRAF, CDKN2A, NRAS and TP53 in cutaneous melanoma, BRAF, NRAS and NF1 in acral melanoma and SF3B1 in mucosal melanoma. Mutations affecting the TERT promoter were the most frequent of all; however, neither they nor ATRX mutations, which correlate with alternative telomere lengthening, were associated with greater telomere length. In most cases, melanomas had potentially actionable mutations, most in components of the mitogen-activated protein kinase and phosphoinositol kinase pathways. The whole-genome mutation landscape of melanoma reveals diverse carcinogenic processes across its subtypes, some unrelated to sun exposure, and extends potential involvement of the non-coding genome in its pathogenesis.« less
Whole-genome landscapes of major melanoma subtypes

DOE PAGES

Hayward, Nicholas K.; Wilmott, James S.; Waddell, Nicola; ...

2017-05-03

Melanoma of the skin is a common cancer only in Europeans, whereas it arises in internal body surfaces (mucosal sites) and on the hands and feet (acral sites) in people throughout the world. We report analysis of whole-genome sequences from cutaneous, acral and mucosal subtypes of melanoma. The heavily mutated landscape of coding and non-coding mutations in cutaneous melanoma resolved novel signatures of mutagenesis attributable to ultraviolet radiation. But, acral and mucosal melanomas were dominated by structural changes and mutation signatures of unknown aetiology, not previously identified in melanoma. The number of genes affected by recurrent mutations disrupting non-coding sequencesmore » was similar to that affected by recurrent mutations to coding sequences. Significantly mutated genes included BRAF, CDKN2A, NRAS and TP53 in cutaneous melanoma, BRAF, NRAS and NF1 in acral melanoma and SF3B1 in mucosal melanoma. Mutations affecting the TERT promoter were the most frequent of all; however, neither they nor ATRX mutations, which correlate with alternative telomere lengthening, were associated with greater telomere length. In most cases, melanomas had potentially actionable mutations, most in components of the mitogen-activated protein kinase and phosphoinositol kinase pathways. The whole-genome mutation landscape of melanoma reveals diverse carcinogenic processes across its subtypes, some unrelated to sun exposure, and extends potential involvement of the non-coding genome in its pathogenesis.« less
Differentiated evolutionary conservatism and lack of polymorphism of crucial sex determination genes (SRY and SOX9) in four species of the family Canidae.

PubMed

Nowacka-Woszuk, Joanna; Switonski, Marek

2009-01-01

The sex determination process is under the control of several genes of which two (SRY and SOX9), encoding transcription factors, play a crucial role. It is well-known that mutations at these genes may cause the development of an intersexual phenotype. The aim of this study was to conduct a comparative analysis of the coding sequence and 5'-flanking regions of both genes in four species of the family Canidae (the dog, red fox, arctic fox and Chinese raccoon dog). Similarity of the coding sequence of the SOX9 gene among the studied species was higher (99.7-99.9%) than in the case of the SRY gene (96.7-97.3%). Only single nucleotide changes were found in the compared coding sequences, whereas in the 5'-flanking region of both genes nucleotide substitutions, as well as insertions and deletions were observed. None of the changes detected in the 5'-flanking region occurred within the potential consensus sequences for transcription factors. No polymorphism was found for either of these genes in any of the analyzed species.
MicroRNA-200c Modulates the Expression of MUC4 and MUC16 by Directly Targeting Their Coding Sequences in Human Pancreatic Cancer

PubMed Central

Radhakrishnan, Prakash; Mohr, Ashley M.; Grandgenett, Paul M.; Steele, Maria M.; Batra, Surinder K.; Hollingsworth, Michael A.

2013-01-01

Transmembrane mucins, MUC4 and MUC16 are associated with tumor progression and metastatic potential in human pancreatic adenocarcinoma. We discovered that miR-200c interacts with specific sequences within the coding sequence of MUC4 and MUC16 mRNAs, and evaluated the regulatory nature of this association. Pancreatic cancer cell lines S2.028 and T3M-4 transfected with miR-200c showed a 4.18 and 8.50 fold down regulation of MUC4 mRNA, and 4.68 and 4.82 fold down regulation of MUC16 mRNA compared to mock-transfected cells, respectively. A significant reduction of glycoprotein expression was also observed. These results indicate that miR-200c overexpression regulates MUC4 and MUC16 mucins in pancreatic cancer cells by directly targeting the mRNA coding sequence of each, resulting in reduced levels of MUC4 and MUC16 mRNA and protein. These data suggest that, in addition to regulating proteins that modulate EMT, miR-200c influences expression of cell surface mucins in pancreatic cancer. PMID:24204560
MicroRNA-200c modulates the expression of MUC4 and MUC16 by directly targeting their coding sequences in human pancreatic cancer.

PubMed

Radhakrishnan, Prakash; Mohr, Ashley M; Grandgenett, Paul M; Steele, Maria M; Batra, Surinder K; Hollingsworth, Michael A

2013-01-01

Transmembrane mucins, MUC4 and MUC16 are associated with tumor progression and metastatic potential in human pancreatic adenocarcinoma. We discovered that miR-200c interacts with specific sequences within the coding sequence of MUC4 and MUC16 mRNAs, and evaluated the regulatory nature of this association. Pancreatic cancer cell lines S2.028 and T3M-4 transfected with miR-200c showed a 4.18 and 8.50 fold down regulation of MUC4 mRNA, and 4.68 and 4.82 fold down regulation of MUC16 mRNA compared to mock-transfected cells, respectively. A significant reduction of glycoprotein expression was also observed. These results indicate that miR-200c overexpression regulates MUC4 and MUC16 mucins in pancreatic cancer cells by directly targeting the mRNA coding sequence of each, resulting in reduced levels of MUC4 and MUC16 mRNA and protein. These data suggest that, in addition to regulating proteins that modulate EMT, miR-200c influences expression of cell surface mucins in pancreatic cancer.
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solovyev, V.V.; Salamov, A.A.; Lawrence, C.B.

1994-12-31

Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF codingmore » potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.« less
[Transposition errors during learning to reproduce a sequence by the right- and the left-hand movements: simulation of positional and movement coding].

PubMed

Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N

2012-01-01

Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.
Functional Testing of SLC26A4 Variants—Clinical and Molecular Analysis of a Cohort with Enlarged Vestibular Aqueduct from Austria

PubMed Central

Bernardinelli, Emanuele; Nofziger, Charity; Patsch, Wolfgang; Rasp, Gerd; Paulmichl, Markus; Dossena, Silvia

2018-01-01

The prevalence and spectrum of sequence alterations in the SLC26A4 gene, which codes for the anion exchanger pendrin, are population-specific and account for at least 50% of cases of non-syndromic hearing loss associated with an enlarged vestibular aqueduct. A cohort of nineteen patients from Austria with hearing loss and a radiological alteration of the vestibular aqueduct underwent Sanger sequencing of SLC26A4 and GJB2, coding for connexin 26. The pathogenicity of sequence alterations detected was assessed by determining ion transport and molecular features of the corresponding SLC26A4 protein variants. In this group, four uncharacterized sequence alterations within the SLC26A4 coding region were found. Three of these lead to protein variants with abnormal functional and molecular features, while one should be considered with no pathogenic potential. Pathogenic SLC26A4 sequence alterations were only found in 12% of patients. SLC26A4 sequence alterations commonly found in other Caucasian populations were not detected. This survey represents the first study on the prevalence and spectrum of SLC26A4 sequence alterations in an Austrian cohort and further suggests that genetic testing should always be integrated with functional characterization and determination of the molecular features of protein variants in order to unequivocally identify or exclude a causal link between genotype and phenotype. PMID:29320412
Informational structure of genetic sequences and nature of gene splicing

NASA Astrophysics Data System (ADS)

Trifonov, E. N.

1991-10-01

Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

NASA Astrophysics Data System (ADS)

Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

2018-01-01

The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/
SEQassembly: A Practical Tools Program for Coding Sequences Splicing

NASA Astrophysics Data System (ADS)

Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.
Correlation approach to identify coding regions in DNA sequences

NASA Technical Reports Server (NTRS)

Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1994-01-01

Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Statistical properties of DNA sequences

NASA Technical Reports Server (NTRS)

Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

1995-01-01

We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Cellulases and coding sequences

DOEpatents

Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

2001-02-20

The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Cellulases and coding sequences

DOEpatents

Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

2001-01-01

The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Comparative Mitogenomics of Plant Bugs (Hemiptera: Miridae): Identifying the AGG Codon Reassignments between Serine and Lysine

PubMed Central

Wang, Pei; Song, Fan; Cai, Wanzhi

2014-01-01

Insect mitochondrial genomes are very important to understand the molecular evolution as well as for phylogenetic and phylogeographic studies of the insects. The Miridae are the largest family of Heteroptera encompassing more than 11,000 described species and of great economic importance. For better understanding the diversity and the evolution of plant bugs, we sequence five new mitochondrial genomes and present the first comparative analysis of nine mitochondrial genomes of mirids available to date. Our result showed that gene content, gene arrangement, base composition and sequences of mitochondrial transcription termination factor were conserved in plant bugs. Intra-genus species shared more conserved genomic characteristics, such as nucleotide and amino acid composition of protein-coding genes, secondary structure and anticodon mutations of tRNAs, and non-coding sequences. Control region possessed several distinct characteristics, including: variable size, abundant tandem repetitions, and intra-genus conservation; and was useful in evolutionary and population genetic studies. The AGG codon reassignments were investigated between serine and lysine in the genera Adelphocoris and other cimicomorphans. Our analysis revealed correlated evolution between reassignments of the AGG codon and specific point mutations at the antidocons of tRNALys and tRNASer(AGN). Phylogenetic analysis indicated that mitochondrial genome sequences were useful in resolving family level relationship of Cimicomorpha. Comparative evolutionary analysis of plant bug mitochondrial genomes allowed the identification of previously neglected coding genes or non-coding regions as potential molecular markers. The finding of the AGG codon reassignments between serine and lysine indicated the parallel evolution of the genetic code in Hemiptera mitochondrial genomes. PMID:24988409
Polyomavirus BK non-coding control region rearrangements in health and disease.

PubMed

Sharma, Preety M; Gupta, Gaurav; Vats, Abhay; Shapiro, Ron; Randhawa, Parmjeet S

2007-08-01

BK virus is an increasingly recognized pathogen in transplanted patients. DNA sequencing of this virus shows considerable genomic variability. To understand the clinical significance of rearrangements in the non-coding control region (NCCR) of BK virus (BKV), we report a meta-analysis of 507 sequences, including 40 sequences generated in our own laboratory, for associations between rearrangements and disease, tissue tropism, geographic origin, and viral genotype. NCCR rearrangements were less frequent in (a) asymptomatic BKV viruria compared to patients viral nephropathy (1.7% vs. 22.5%), and (b) viral genotype 1 compared to other genotypes (2.4% vs. 11.2%). Rearrangements were commoner in malignancy (78.6%), and Norwegians (45.7%), and less common in East Indians (0%), and Japanese (4.3%). A surprising number of rearranged sequences were reported from mononuclear cells of healthy subjects, whereas most plasma sequences were archetypal. This difference could not be related to potential recombinase activity in lymphocytes, as consensus recombination signal sequences could not be found in the NCCR region. NCCR rearrangements are neither required nor a sufficient condition to produce clinical disease. BKV nephropathy and hemorrhagic cystitis are not associated with any unique NCCR configuration or nucleotide sequence.
Disease-Causing 7.4 kb Cis-Regulatory Deletion Disrupting Conserved Non-Coding Sequences and Their Interaction with the FOXL2 Promotor: Implications for Mutation Screening

PubMed Central

Dostie, Josée; Lemire, Edmond; Bouchard, Philippe; Field, Michael; Jones, Kristie; Lorenz, Birgit; Menten, Björn; Buysse, Karen; Pattyn, Filip; Friedli, Marc; Ucla, Catherine; Rossier, Colette; Wyss, Carine; Speleman, Frank; De Paepe, Anne; Dekker, Job; Antonarakis, Stylianos E.; De Baere, Elfride

2009-01-01

To date, the contribution of disrupted potentially cis-regulatory conserved non-coding sequences (CNCs) to human disease is most likely underestimated, as no systematic screens for putative deleterious variations in CNCs have been conducted. As a model for monogenic disease we studied the involvement of genetic changes of CNCs in the cis-regulatory domain of FOXL2 in blepharophimosis syndrome (BPES). Fifty-seven molecularly unsolved BPES patients underwent high-resolution copy number screening and targeted sequencing of CNCs. Apart from three larger distant deletions, a de novo deletion as small as 7.4 kb was found at 283 kb 5′ to FOXL2. The deletion appeared to be triggered by an H-DNA-induced double-stranded break (DSB). In addition, it disrupts a novel long non-coding RNA (ncRNA) PISRT1 and 8 CNCs. The regulatory potential of the deleted CNCs was substantiated by in vitro luciferase assays. Interestingly, Chromosome Conformation Capture (3C) of a 625 kb region surrounding FOXL2 in expressing cellular systems revealed physical interactions of three upstream fragments and the FOXL2 core promoter. Importantly, one of these contains the 7.4 kb deleted fragment. Overall, this study revealed the smallest distant deletion causing monogenic disease and impacts upon the concept of mutation screening in human disease and developmental disorders in particular. PMID:19543368
Kullback Leibler divergence in complete bacterial and phage genomes

PubMed Central

Akhter, Sajia; Kashef, Mona T.; Ibrahim, Eslam S.; Bailey, Barbara

2017-01-01

The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. PMID:29204318

Kullback Leibler divergence in complete bacterial and phage genomes.

PubMed

Akhter, Sajia; Aziz, Ramy K; Kashef, Mona T; Ibrahim, Eslam S; Bailey, Barbara; Edwards, Robert A

2017-01-01

The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
Complete genome sequence of Pseudomonas citronellolis P3B5, a candidate for microbial phyllo-remediation of hydrocarbon-contaminated sites.

PubMed

Remus-Emsermann, Mitja N P; Schmid, Michael; Gekenidis, Maria-Theresia; Pelludat, Cosima; Frey, Jürg E; Ahrens, Christian H; Drissner, David

2016-01-01

Pseudomonas citronellolis is a Gram negative, motile gammaproteobacterium belonging to the order Pseudomonadales and the family Pseudomonadaceae . We isolated strain P3B5 from the phyllosphere of basil plants ( Ocimum basilicum L.). Here we describe the physiology of this microorganism, its full genome sequence, and detailed annotation. The 6.95 Mbp genome contains 6071 predicted protein coding sequences and 96 RNA coding sequences. P. citronellolis has been the subject of many studies including the investigation of long-chain aliphatic compounds and terpene degradation. Plant leaves are covered by long-chain aliphates making up a waxy layer that is associated with the leaf cuticle. In addition, basil leaves are known to contain high amounts of terpenoid substances, hinting to a potential nutrient niche that might be exploited by P. citronellolis . Furthermore, the isolated strain exhibited resistance to several antibiotics. To evaluate the potential of this strain as source of transferable antibiotic resistance genes on raw consumed herbs we therefore investigated if those resistances are encoded on mobile genetic elements. The availability of the genome will be helpful for comparative genomics of the phylogenetically broad pseudomonads, in particular with the sequence of the P. citronellolis type strain PRJDB205 not yet publicly available. The genome is discussed with respect to a phyllosphere related lifestyle, aliphate and terpenoid degradation, and antibiotic resistance.
DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra

PubMed Central

2013-01-01

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

PubMed

Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald

2014-02-07

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
Comparative genomics of biotechnologically important yeasts

USDA-ARS?s Scientific Manuscript database

Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the...
Next generation sequencing in women affected by nonsyndromic premature ovarian failure displays new potential causative genes and mutations.

PubMed

Fonseca, Dora Janeth; Patiño, Liliana Catherine; Suárez, Yohjana Carolina; de Jesús Rodríguez, Asid; Mateus, Heidi Eliana; Jiménez, Karen Marcela; Ortega-Recalde, Oscar; Díaz-Yamal, Ivonne; Laissue, Paul

2015-07-01

To identify new molecular actors involved in nonsyndromic premature ovarian failure (POF) etiology. This is a retrospective case-control cohort study. University research group and IVF medical center. Twelve women affected by nonsyndromic POF. The control group included 176 women whose menopause had occurred after age 50 and had no antecedents regarding gynecological disease. A further 345 women from the same ethnic origin (general population group) were also recruited to assess allele frequency for potentially deleterious sequence variants. Next generation sequencing (NGS), Sanger sequencing, and bioinformatics analysis. The complete coding regions of 70 candidate genes were massively sequenced, via NGS, in POF patients. Bioinformatics and genetics were used to confirm NGS results and to identify potential sequence variants related to the disease pathogenesis. We have identified mutations in two novel genes, ADAMTS19 and BMPR2, that are potentially related to POF origin. LHCGR mutations, which might have contributed to the phenotype, were also detected. We thus recommend NGS as a powerful tool for identifying new molecular actors in POF and for future diagnostic/prognostic purposes. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences.

PubMed

Ferro, Myriam; Tardif, Marianne; Reguer, Erwan; Cahuzac, Romain; Bruley, Christophe; Vermat, Thierry; Nugues, Estelle; Vigouroux, Marielle; Vandenbrouck, Yves; Garin, Jérôme; Viari, Alain

2008-05-01

PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.
Artificial Intelligence, DNA Mimicry, and Human Health.

PubMed

Stefano, George B; Kream, Richard M

2017-08-14

The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.
A 12-year molecular survey of clinical herpes simplex virus type 2 isolates demonstrates the circulation of clade A and B strains in Germany.

PubMed

Schmidt-Chanasit, Jonas; Bialonski, Alexandra; Heinemann, Patrick; Ulrich, Rainer G; Günther, Stephan; Rabenau, Holger F; Doerr, Hans Wilhelm

2010-07-01

Recently two different herpes simplex virus type 2 (HSV-2) clades (A and B) were described on DNA sequence data of the glycoprotein E (gE), G (gG) and I (gI) genes. To type the circulating HSV-2 wild-type strains in Germany by a novel approach and to monitor potential changes in the molecular epidemiology between 1997 and 2008. A total of 64 clinical HSV-2 isolates were analyzed by a novel approach using the DNA sequences of the complete open reading frames of glycoprotein B (gB) and gG. Recombination analysis of the gB and gG gene sequences was performed to reveal intragenic recombinants. Based on the phylogenetic analysis of the gB coding DNA sequence 8 of 64 (12%) isolates were classified as clade A strains and 56 of 64 (88%) isolates were classified as clade B strains. Analysis of the gG coding DNA sequence classified 4 (6%) isolates as clade A strains and 60 (94%) isolates as clade B strains. In comparison, the 8 isolates classified as clade A strains using the gB sequence data were classified as clade B strains when using the gG coding DNA sequence, suggesting intergenic recombination events. Intragenic recombination events were not detected. The first molecular survey of clinical HSV-2 isolates from Germany demonstrated the circulation of clade A and B strains and of intergenic recombinants over a period of 12 years. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Draft Genome Sequence of Bacillus cereus CITVM-11.1, a Strain Exhibiting Interesting Antifungal Activities.

PubMed

Caballero, Javier; Peralta, Cecilia; Molla, Antonella; Del Valle, Eleodoro E; Caballero, Primitivo; Berry, Colin; Felipe, Verónica; Yaryura, Pablo; Palma, Leopoldo

2018-01-01

Bacillus cereus is a gram-positive, spore-forming bacterium possessing an important and historical record as a human-pathogenic bacterium. However, several strains of this species exhibit interesting potential to be used as plant growth-promoting rhizobacteria. Here, we report the draft genome sequence of B. cereus strain CITVM-11.1, which consists of 37 contig sequences, accounting for 5,746,486 bp (with a GC content of 34.8%) and 5,752 predicted protein-coding sequences. Several of them could potentially be involved in plant-bacterium interactions and may contribute to the strong antagonistic activity shown by this strain against the charcoal root rot fungus, Macrophomina phaseolina. This genomic sequence also showed a number of genes that may confer this strain resistance against several polluting heavy metals and for the bioconversion of mycotoxins. © 2018 S. Karger AG, Basel.
Visually lossless compression of digital hologram sequences

NASA Astrophysics Data System (ADS)

Darakis, Emmanouil; Kowiel, Marcin; Näsänen, Risto; Naughton, Thomas J.

2010-01-01

Digital hologram sequences have great potential for the recording of 3D scenes of moving macroscopic objects as their numerical reconstruction can yield a range of perspective views of the scene. Digital holograms inherently have large information content and lossless coding of holographic data is rather inefficient due to the speckled nature of the interference fringes they contain. Lossy coding of still holograms and hologram sequences has shown promising results. By definition, lossy compression introduces errors in the reconstruction. In all of the previous studies, numerical metrics were used to measure the compression error and through it, the coding quality. Digital hologram reconstructions are highly speckled and the speckle pattern is very sensitive to data changes. Hence, numerical quality metrics can be misleading. For example, for low compression ratios, a numerically significant coding error can have visually negligible effects. Yet, in several cases, it is of high interest to know how much lossy compression can be achieved, while maintaining the reconstruction quality at visually lossless levels. Using an experimental threshold estimation method, the staircase algorithm, we determined the highest compression ratio that was not perceptible to human observers for objects compressed with Dirac and MPEG-4 compression methods. This level of compression can be regarded as the point below which compression is perceptually lossless although physically the compression is lossy. It was found that up to 4 to 7.5 fold compression can be obtained with the above methods without any perceptible change in the appearance of video sequences.
FORTRAN Automated Code Evaluation System (faces) system documentation, version 2, mod 0. [error detection codes/user manuals (computer programs)

NASA Technical Reports Server (NTRS)

1975-01-01

A system is presented which processes FORTRAN based software systems to surface potential problems before they become execution malfunctions. The system complements the diagnostic capabilities of compilers, loaders, and execution monitors rather than duplicating these functions. Also, it emphasizes frequent sources of FORTRAN problems which require inordinate manual effort to identify. The principle value of the system is extracting small sections of unusual code from the bulk of normal sequences. Code structures likely to cause immediate or future problems are brought to the user's attention. These messages stimulate timely corrective action of solid errors and promote identification of 'tricky' code. Corrective action may require recoding or simply extending software documentation to explain the unusual technique.
Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
Amyotrophic lateral sclerosis onset is influenced by the burden of rare variants in known amyotrophic lateral sclerosis genes.

PubMed

Cady, Janet; Allred, Peggy; Bali, Taha; Pestronk, Alan; Goate, Alison; Miller, Timothy M; Mitra, Robi D; Ravits, John; Harms, Matthew B; Baloh, Robert H

2015-01-01

To define the genetic landscape of amyotrophic lateral sclerosis (ALS) and assess the contribution of possible oligogenic inheritance, we aimed to comprehensively sequence 17 known ALS genes in 391 ALS patients from the United States. Targeted pooled-sample sequencing was used to identify variants in 17 ALS genes. Fragment size analysis was used to define ATXN2 and C9ORF72 expansion sizes. Genotype-phenotype correlations were made with individual variants and total burden of variants. Rare variant associations for risk of ALS were investigated at both the single variant and gene level. A total of 64.3% of familial and 27.8% of sporadic subjects carried potentially pathogenic novel or rare coding variants identified by sequencing or an expanded repeat in C9ORF72 or ATXN2; 3.8% of subjects had variants in >1 ALS gene, and these individuals had disease onset 10 years earlier (p = 0.0046) than subjects with variants in a single gene. The number of potentially pathogenic coding variants did not influence disease duration or site of onset. Rare and potentially pathogenic variants in known ALS genes are present in >25% of apparently sporadic and 64% of familial patients, significantly higher than previous reports using less comprehensive sequencing approaches. A significant number of subjects carried variants in >1 gene, which influenced the age of symptom onset and supports oligogenic inheritance as relevant to disease pathogenesis. © 2014 American Neurological Association.
DNA barcode goes two-dimensions: DNA QR code web server.

PubMed

Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

2012-01-01

The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
Lichenase and coding sequences

DOEpatents

Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

2000-08-15

The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.
Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity

PubMed Central

Shabalina, Svetlana A.; Spiridonov, Nikolay A.; Kashina, Anna

2013-01-01

Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions. PMID:23293005
FRAGS: estimation of coding sequence substitution rates from fragmentary data

PubMed Central

Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

2004-01-01

Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802
Current Research on Non-Coding Ribonucleic Acid (RNA).

PubMed

Wang, Jing; Samuels, David C; Zhao, Shilin; Xiang, Yu; Zhao, Ying-Yong; Guo, Yan

2017-12-05

Non-coding ribonucleic acid (RNA) has without a doubt captured the interest of biomedical researchers. The ability to screen the entire human genome with high-throughput sequencing technology has greatly enhanced the identification, annotation and prediction of the functionality of non-coding RNAs. In this review, we discuss the current landscape of non-coding RNA research and quantitative analysis. Non-coding RNA will be categorized into two major groups by size: long non-coding RNAs and small RNAs. In long non-coding RNA, we discuss regular long non-coding RNA, pseudogenes and circular RNA. In small RNA, we discuss miRNA, transfer RNA, piwi-interacting RNA, small nucleolar RNA, small nuclear RNA, Y RNA, single recognition particle RNA, and 7SK RNA. We elaborate on the origin, detection method, and potential association with disease, putative functional mechanisms, and public resources for these non-coding RNAs. We aim to provide readers with a complete overview of non-coding RNAs and incite additional interest in non-coding RNA research.
Visual pattern image sequence coding

NASA Technical Reports Server (NTRS)

Silsbee, Peter; Bovik, Alan C.; Chen, Dapang

1990-01-01

The visual pattern image coding (VPIC) configurable digital image-coding process is capable of coding with visual fidelity comparable to the best available techniques, at compressions which (at 30-40:1) exceed all other technologies. These capabilities are associated with unprecedented coding efficiencies; coding and decoding operations are entirely linear with respect to image size and entail a complexity that is 1-2 orders of magnitude faster than any previous high-compression technique. The visual pattern image sequence coding to which attention is presently given exploits all the advantages of the static VPIC in the reduction of information from an additional, temporal dimension, to achieve unprecedented image sequence coding performance.

The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101

NASA Astrophysics Data System (ADS)

Pfreundt, Ulrike; Kopf, Matthias; Belkin, Natalia; Berman-Frank, Ilana; Hess, Wolfgang R.

2014-08-01

Blooms of the dinitrogen-fixing marine cyanobacterium Trichodesmium considerably contribute to new nitrogen inputs into tropical oceans. Intriguingly, only 60% of the Trichodesmium erythraeum IMS101 genome sequence codes for protein, compared with ~85% in other sequenced cyanobacterial genomes. The extensive non-coding genome fraction suggests space for an unusually high number of unidentified, potentially regulatory non-protein-coding RNAs (ncRNAs). To identify the transcribed fraction of the genome, here we present a genome-wide map of transcriptional start sites (TSS) at single nucleotide resolution, revealing the activity of 6,080 promoters. We demonstrate that T. erythraeum has the highest number of actively splicing group II introns and the highest percentage of TSS yielding ncRNAs of any bacterium examined to date. We identified a highly transcribed retroelement that serves as template repeat for the targeted mutation of at least 12 different genes by mutagenic homing. Our findings explain the non-coding portion of the T. erythraeum genome by the transcription of an unusually high number of non-coding transcripts in addition to the known high incidence of transposable elements. We conclude that riboregulation and RNA maturation-dependent processes constitute a major part of the Trichodesmium regulatory apparatus.
[Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

PubMed

Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

2011-01-01

The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues

PubMed Central

Kukurba, Kimberly R.; Zhang, Rui; Li, Xin; Smith, Kevin S.; Knowles, David A.; How Tan, Meng; Piskol, Robert; Lek, Monkol; Snyder, Michael; MacArthur, Daniel G.; Li, Jin Billy; Montgomery, Stephen B.

2014-01-01

Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants. PMID:24786518
Discrete Cosine Transform Image Coding With Sliding Block Codes

NASA Astrophysics Data System (ADS)

Divakaran, Ajay; Pearlman, William A.

1989-11-01

A transform trellis coding scheme for images is presented. A two dimensional discrete cosine transform is applied to the image followed by a search on a trellis structured code. This code is a sliding block code that utilizes a constrained size reproduction alphabet. The image is divided into blocks by the transform coding. The non-stationarity of the image is counteracted by grouping these blocks in clusters through a clustering algorithm, and then encoding the clusters separately. Mandela ordered sequences are formed from each cluster i.e identically indexed coefficients from each block are grouped together to form one dimensional sequences. A separate search ensues on each of these Mandela ordered sequences. Padding sequences are used to improve the trellis search fidelity. The padding sequences absorb the error caused by the building up of the trellis to full size. The simulations were carried out on a 256x256 image ('LENA'). The results are comparable to any existing scheme. The visual quality of the image is enhanced considerably by the padding and clustering.
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

PubMed Central

Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

2012-01-01

The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
Platelet function is modified by common sequence variation in megakaryocyte super enhancers

PubMed Central

Petersen, Romina; Lambourne, John J.; Javierre, Biola M.; Grassi, Luigi; Kreuzhuber, Roman; Ruklisa, Dace; Rosa, Isabel M.; Tomé, Ana R.; Elding, Heather; van Geffen, Johanna P.; Jiang, Tao; Farrow, Samantha; Cairns, Jonathan; Al-Subaie, Abeer M.; Ashford, Sofie; Attwood, Antony; Batista, Joana; Bouman, Heleen; Burden, Frances; Choudry, Fizzah A.; Clarke, Laura; Flicek, Paul; Garner, Stephen F.; Haimel, Matthias; Kempster, Carly; Ladopoulos, Vasileios; Lenaerts, An-Sofie; Materek, Paulina M.; McKinney, Harriet; Meacham, Stuart; Mead, Daniel; Nagy, Magdolna; Penkett, Christopher J.; Rendon, Augusto; Seyres, Denis; Sun, Benjamin; Tuna, Salih; van der Weide, Marie-Elise; Wingett, Steven W.; Martens, Joost H.; Stegle, Oliver; Richardson, Sylvia; Vallier, Ludovic; Roberts, David J.; Freson, Kathleen; Wernisch, Lorenz; Stunnenberg, Hendrik G.; Danesh, John; Fraser, Peter; Soranzo, Nicole; Butterworth, Adam S.; Heemskerk, Johan W.; Turro, Ernest; Spivakov, Mikhail; Ouwehand, Willem H.; Astle, William J.; Downes, Kate; Kostadima, Myrto; Frontini, Mattia

2017-01-01

Linking non-coding genetic variants associated with the risk of diseases or disease-relevant traits to target genes is a crucial step to realize GWAS potential in the introduction of precision medicine. Here we set out to determine the mechanisms underpinning variant association with platelet quantitative traits using cell type-matched epigenomic data and promoter long-range interactions. We identify potential regulatory functions for 423 of 565 (75%) non-coding variants associated with platelet traits and we demonstrate, through ex vivo and proof of principle genome editing validation, that variants in super enhancers play an important role in controlling archetypical platelet functions. PMID:28703137
CRITICA: coding region identification tool invoking comparative analysis

NASA Technical Reports Server (NTRS)

Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1999-01-01

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
Flexible manipulation of terahertz wave reflection using polarization insensitive coding metasurfaces.

PubMed

Jiu-Sheng, Li; Ze-Jiang, Zhao; Jian-Quan, Yao

2017-11-27

In order to extend to 3-bit encoding, we propose notched-wheel structures as polarization insensitive coding metasurfaces to control terahertz wave reflection and suppress backward scattering. By using a coding sequence of "00110011…" along x-axis direction and 16 × 16 random coding sequence, we investigate the polarization insensitive properties of the coding metasurfaces. By designing the coding sequences of the basic coding elements, the terahertz wave reflection can be flexibly manipulated. Additionally, radar cross section (RCS) reduction in the backward direction is less than -10dB in a wide band. The present approach can offer application for novel terahertz manipulation devices.
Noncoding sequence classification based on wavelet transform analysis: part I

NASA Astrophysics Data System (ADS)

Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

2017-09-01

DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.

PubMed

Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao

2017-07-01

In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Many human accelerated regions are developmental enhancers

PubMed Central

Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.

2013-01-01

The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
Long non-coding RNA discovery across the genus anopheles reveals conserved secondary structures within and beyond the Gambiae complex.

PubMed

Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T

2015-04-23

Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
Is a Genome a Codeword of an Error-Correcting Code?

PubMed Central

Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

2012-01-01

Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

PubMed

Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

2015-07-01

Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.
In-depth characterization of breast cancer tumor-promoting cell transcriptome by RNA sequencing and microarrays

PubMed Central

Soldà, Giulia; Merlino, Giuseppe; Fina, Emanuela; Brini, Elena; Moles, Anna; Cappelletti, Vera; Daidone, Maria Grazia

2016-01-01

Numerous studies have reported the existence of tumor-promoting cells (TPC) with self-renewal potential and a relevant role in drug resistance. However, pathways and modifications involved in the maintenance of such tumor subpopulations are still only partially understood. Sequencing-based approaches offer the opportunity for a detailed study of TPC including their transcriptome modulation. Using microarrays and RNA sequencing approaches, we compared the transcriptional profiles of parental MCF7 breast cancer cells with MCF7-derived TPC (i.e. MCFS). Data were explored using different bioinformatic approaches, and major findings were experimentally validated. The different analytical pipelines (Lifescope and Cufflinks based) yielded similar although not identical results. RNA sequencing data partially overlapped microarray results and displayed a higher dynamic range, although overall the two approaches concordantly predicted pathway modifications. Several biological functions were altered in TPC, ranging from production of inflammatory cytokines (i.e., IL-8 and MCP-1) to proliferation and response to steroid hormones. More than 300 non-coding RNAs were defined as differentially expressed, and 2,471 potential splicing events were identified. A consensus signature of genes up-regulated in TPC was derived and was found to be significantly associated with insensitivity to fulvestrant in a public breast cancer patient dataset. Overall, we obtained a detailed portrait of the transcriptome of a breast cancer TPC line, highlighted the role of non-coding RNAs and differential splicing, and identified a gene signature with a potential as a context-specific biomarker in patients receiving endocrine treatment. PMID:26556871
The application of coded excitation technology in medical ultrasonic Doppler imaging

NASA Astrophysics Data System (ADS)

Li, Weifeng; Chen, Xiaodong; Bao, Jing; Yu, Daoyin

2008-03-01

Medical ultrasonic Doppler imaging is one of the most important domains of modern medical imaging technology. The application of coded excitation technology in medical ultrasonic Doppler imaging system has the potential of higher SNR and deeper penetration depth than conventional pulse-echo imaging system, it also improves the image quality, and enhances the sensitivity of feeble signal, furthermore, proper coded excitation is beneficial to received spectrum of Doppler signal. Firstly, this paper analyzes the application of coded excitation technology in medical ultrasonic Doppler imaging system abstractly, showing the advantage and bright future of coded excitation technology, then introduces the principle and the theory of coded excitation. Secondly, we compare some coded serials (including Chirp and fake Chirp signal, Barker codes, Golay's complementary serial, M-sequence, etc). Considering Mainlobe Width, Range Sidelobe Level, Signal-to-Noise Ratio and sensitivity of Doppler signal, we choose Barker codes as coded serial. At last, we design the coded excitation circuit. The result in B-mode imaging and Doppler flow measurement coincided with our expectation, which incarnated the advantage of application of coded excitation technology in Digital Medical Ultrasonic Doppler Endoscope Imaging System.
Context influences on TALE–DNA binding revealed by quantitative profiling

PubMed Central

Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.

2015-01-01

Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.

PubMed

Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L

2015-06-11

Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
The chloroplast tRNALys(UUU) gene from mustard (Sinapis alba) contains a class II intron potentially coding for a maturase-related polypeptide.

PubMed

Neuhaus, H; Link, G

1987-01-01

The trnK gene endocing the tRNALys(UUU) has been located on mustard (Sinapis alba) chloroplast DNA, 263 bp upstream of the psbA gene on the same strand. The nucleotide sequence of the trnK gene and its flanking regions as well as the putative transcription start and termination sites are shown. The 5' end of the transcript lies 121 bp upstream of the 5' tRNA coding region and is preceded by procaryotic-type "-10" and "-35" sequence elements, while the 3' end maps 2.77 kb downstream to a DNA region with possible stemloop secondary structure. The anticodon loop of the tRNALys is interrupted by a 2,574 bp intron containing a long open reading frame, which codes for 524 amino acids. Based on conserved stem and loop structures, this intron has characteristic features of a class II intron. A region near the carboxyl terminus of the derived polypeptide appears structurally related to maturases.
Comparative analysis of mitochondrial genomes between a wheat K-type cytoplasmic male sterility (CMS) line and its maintainer line.

PubMed

Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin

2011-03-29

Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.

Genomic Identification and Analysis of Shared Cis-regulator Elements in a Developmentally Critical homeobox Cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chris Amemiya

2003-04-01

The goals of this project were to isolate, characterize, and sequence the Dlx3/Dlx7 bigene cluster from twelve different species of mammals. The Dlx3 and Dlx7 genes are known to encode homeobox transcription factors involved in patterning of structures in the vertebrate jaw as well as vertebrate limbs. Genomic sequences from the respective taxa will subsequently be compared in order to identify conserved non-coding sequences that are potential cis-regulatory elements. Based on the comparisons they will fashion transgenic mouse experiments to functionally test the strength of the potential cis-regulatory elements. A goal of the project is to attempt to identify thosemore » elements that may function in coordinately regulating both Dlx3 and Dlx7 functions.« less
Primary structure of prostaglandin G/H synthase from sheep vesicular gland determined from the complementary DNA sequence.

PubMed Central

DeWitt, D L; Smith, W L

1988-01-01

Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548
Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.

PubMed

Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T

1996-02-01

A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

PubMed

Rogan, P K; Schneider, T D

1995-01-01

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
Sequence similarity is more relevant than species specificity in probabilistic backtranslation.

PubMed

Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco

2007-02-21

Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Euglena gracilis chloroplast DNA: analysis of a 1.6 kb intron of the psb C gene containing an open reading frame of 458 codons.

PubMed

Montandon, P E; Vasserot, A; Stutz, E

1986-01-01

We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.
Quantitative Profiling of Peptides from RNAs classified as non-coding

PubMed Central

Prabakaran, Sudhakaran; Hemberg, Martin; Chauhan, Ruchi; Winter, Dominic; Tweedie-Cullen, Ry Y.; Dittrich, Christian; Hong, Elizabeth; Gunawardena, Jeremy; Steen, Hanno; Kreiman, Gabriel; Steen, Judith A.

2014-01-01

Only a small fraction of the mammalian genome codes for messenger RNAs destined to be translated into proteins, and it is generally assumed that a large portion of transcribed sequences - including introns and several classes of non-coding RNAs (ncRNAs) do not give rise to peptide products. A systematic examination of translation and physiological regulation of ncRNAs has not been conducted. Here, we use computational methods to identify the products of non-canonical translation in mouse neurons by analyzing unannotated transcripts in combination with proteomic data. This study supports the existence of non-canonical translation products from both intragenic and extragenic genomic regions, including peptides derived from anti-sense transcripts and introns. Moreover, the studied novel translation products exhibit temporal regulation similar to that of proteins known to be involved in neuronal activity processes. These observations highlight a potentially large and complex set of biologically regulated translational events from transcripts formerly thought to lack coding potential. PMID:25403355
Two Perspectives on the Origin of the Standard Genetic Code

NASA Astrophysics Data System (ADS)

Sengupta, Supratim; Aggarwal, Neha; Bandhu, Ashutosh Vishwa

2014-12-01

The origin of a genetic code made it possible to create ordered sequences of amino acids. In this article we provide two perspectives on code origin by carrying out simulations of code-sequence coevolution in finite populations with the aim of examining how the standard genetic code may have evolved from more primitive code(s) encoding a small number of amino acids. We determine the efficacy of the physico-chemical hypothesis of code origin in the absence and presence of horizontal gene transfer (HGT) by allowing a diverse collection of code-sequence sets to compete with each other. We find that in the absence of horizontal gene transfer, natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. However, for certain probabilities of the horizontal transfer events, a universal code emerges having a structure that is consistent with the standard genetic code.
An algebraic hypothesis about the primeval genetic code architecture.

PubMed

Sánchez, Robersy; Grau, Ricardo

2009-09-01

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

PubMed Central

Kumari, Pooja; Sampath, Karuna

2015-01-01

For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

NASA Astrophysics Data System (ADS)

Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

2017-07-01

DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Disruption of long-distance highly conserved noncoding elements in neurocristopathies.

PubMed

Amiel, Jeanne; Benko, Sabina; Gordon, Christopher T; Lyonnet, Stanislas

2010-12-01

One of the key discoveries of vertebrate genome sequencing projects has been the identification of highly conserved noncoding elements (CNEs). Some characteristics of CNEs include their high frequency in mammalian genomes, their potential regulatory role in gene expression, and their enrichment in gene deserts nearby master developmental genes. The abnormal development of neural crest cells (NCCs) leads to a broad spectrum of congenital malformation(s), termed neurocristopathies, and/or tumor predisposition. Here we review recent findings that disruptions of CNEs, within or at long distance from the coding sequences of key genes involved in NCC development, result in neurocristopathies via the alteration of tissue- or stage-specific long-distance regulation of gene expression. While most studies on human genetic disorders have focused on protein-coding sequences, these examples suggest that investigation of genomic alterations of CNEs will provide a broader understanding of the molecular etiology of both rare and common human congenital malformations. © 2010 New York Academy of Sciences.
Draft Genome Sequence of Curtobacterium sp. Strain ER1/6, an Endophytic Strain Isolated from Citrus sinensis with Potential To Be Used as a Biocontrol Agent.

PubMed

Garrido, Leandro Maza; Alves, João Marcelo Pereira; Oliveira, Liliane Santana; Gruber, Arthur; Padilla, Gabriel; Araújo, Welington Luiz

2016-11-17

Herein, we report a draft genome sequence of the endophytic Curtobacterium sp. strain ER1/6, isolated from a surface-sterilized Citrus sinensis branch, and it presented the capability to control phytopathogens. Functional annotation of the ~3.4-Mb genome revealed 3,100 protein-coding genes, with many products related to known ecological and biotechnological aspects of this bacterium. Copyright © 2016 Garrido et al.
Draft Genome Sequence of Pantoea ananatis Strain 1.38, a Bacterium Isolated from the Rhizosphere of Oryza sativa var. Puntal That Shows Biotechnological Potential as an Inoculant

PubMed Central

Megías, Esaú; dos Reis Junior, Fábio Bueno; Ribeiro, Renan Augusto; Ollero, Francisco Javier; Megías, Manuel

2018-01-01

ABSTRACT Pantoea ananatis 1.38 is a strain isolated from the rhizosphere of irrigated rice in southern Spain. Its genome was estimated at 4,869,281 bp, with 4,644 coding sequences (CDSs). The genome encompasses several CDSs related to plant growth promotion, such as that for siderophore metabolism, and virulence genes characteristic of pathogenic Pantoea spp. are absent. PMID:29371365
Circular codes revisited: a statistical approach.

PubMed

Gonzalez, D L; Giannerini, S; Rosa, R

2011-04-21

In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C(3) maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization. Copyright © 2011 Elsevier Ltd. All rights reserved.
Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

PubMed Central

Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

2013-01-01

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
A genome-wide survey of maternal and embryonic transcripts during Xenopus tropicalis development.

PubMed

Paranjpe, Sarita S; Jacobi, Ulrike G; van Heeringen, Simon J; Veenstra, Gert Jan C

2013-11-06

Dynamics of polyadenylation vs. deadenylation determine the fate of several developmentally regulated genes. Decay of a subset of maternal mRNAs and new transcription define the maternal-to-zygotic transition, but the full complement of polyadenylated and deadenylated coding and non-coding transcripts has not yet been assessed in Xenopus embryos. To analyze the dynamics and diversity of coding and non-coding transcripts during development, both polyadenylated mRNA and ribosomal RNA-depleted total RNA were harvested across six developmental stages and subjected to high throughput sequencing. The maternally loaded transcriptome is highly diverse and consists of both polyadenylated and deadenylated transcripts. Many maternal genes show peak expression in the oocyte and include genes which are known to be the key regulators of events like oocyte maturation and fertilization. Of all the transcripts that increase in abundance between early blastula and larval stages, about 30% of the embryonic genes are induced by fourfold or more by the late blastula stage and another 35% by late gastrulation. Using a gene model validation and discovery pipeline, we identified novel transcripts and putative long non-coding RNAs (lncRNA). These lncRNA transcripts were stringently selected as spliced transcripts generated from independent promoters, with limited coding potential and a codon bias characteristic of noncoding sequences. Many lncRNAs are conserved and expressed in a developmental stage-specific fashion. These data reveal dynamics of transcriptome polyadenylation and abundance and provides a high-confidence catalogue of novel and long non-coding RNAs.
Development of a set of SNP markers present in expressed genes of the apple.

PubMed

Chagné, David; Gasic, Ksenija; Crowhurst, Ross N; Han, Yuepeng; Bassett, Heather C; Bowatte, Deepa R; Lawrence, Timothy J; Rikkerink, Erik H A; Gardiner, Susan E; Korban, Schuyler S

2008-11-01

Molecular markers associated with gene coding regions are useful tools for bridging functional and structural genomics. Due to their high abundance in plant genomes, single nucleotide polymorphisms (SNPs) are present within virtually all genomic regions, including most coding sequences. The objective of this study was to develop a set of SNPs for the apple by taking advantage of the wealth of genomics resources available for the apple, including a large collection of expressed sequenced tags (ESTs). Using bioinformatics tools, a search for SNPs within an EST database of approximately 350,000 sequences developed from a variety of apple accessions was conducted. This resulted in the identification of a total of 71,482 putative SNPs. As the apple genome is reported to be an ancient polyploid, attempts were made to verify whether those SNPs detected in silico were attributable either to allelic polymorphisms or to gene duplication or paralogous or homeologous sequence variations. To this end, a set of 464 PCR primer pairs was designed, PCR was amplified using two subsets of plants, and the PCR products were sequenced. The SNPs retrieved from these sequences were then mapped onto apple genetic maps, including a newly constructed map of a Royal Gala x A689-24 cross and a Malling 9 x Robusta 5, map using a bin mapping strategy. The SNP genotyping was performed using the high-resolution melting (HRM) technique. A total of 93 new markers containing 210 coding SNPs were successfully mapped. This new set of SNP markers for the apple offers new opportunities for understanding the genetic control of important horticultural traits using quantitative trait loci (QTL) or linkage disequilibrium analysis. These also serve as useful markers for aligning physical and genetic maps, and as potential transferable markers across the Rosaceae family.
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds

PubMed Central

Dean, Rebecca; Harrison, Peter W.; Wright, Alison E.; Zimmer, Fabian; Mank, Judith E.

2015-01-01

The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. PMID:26067773
Long Non-Coding RNAs Responsive to Salt and Boron Stress in the Hyper-Arid Lluteño Maize from Atacama Desert.

PubMed

Huanca-Mamani, Wilson; Arias-Carrasco, Raúl; Cárdenas-Ninasivincha, Steffany; Rojas-Herrera, Marcelo; Sepúlveda-Hermosilla, Gonzalo; Caris-Maldonado, José Carlos; Bastías, Elizabeth; Maracaja-Coutinho, Vinicius

2018-03-20

Long non-coding RNAs (lncRNAs) have been defined as transcripts longer than 200 nucleotides, which lack significant protein coding potential and possess critical roles in diverse cellular processes. Long non-coding RNAs have recently been functionally characterized in plant stress-response mechanisms. In the present study, we perform a comprehensive identification of lncRNAs in response to combined stress induced by salinity and excess of boron in the Lluteño maize, a tolerant maize landrace from Atacama Desert, Chile. We use deep RNA sequencing to identify a set of 48,345 different lncRNAs, of which 28,012 (58.1%) are conserved with other maize (B73, Mo17 or Palomero), with the remaining 41.9% belonging to potentially Lluteño exclusive lncRNA transcripts. According to B73 maize reference genome sequence, most Lluteño lncRNAs correspond to intergenic transcripts. Interestingly, Lluteño lncRNAs presents an unusual overall higher expression compared to protein coding genes under exposure to stressed conditions. In total, we identified 1710 putatively responsive to the combined stressed conditions of salt and boron exposure. We also identified a set of 848 stress responsive potential trans natural antisense transcripts ( trans -NAT) lncRNAs, which seems to be regulating genes associated with regulation of transcription, response to stress, response to abiotic stimulus and participating of the nicotianamine metabolic process. Reverse transcription-quantitative PCR (RT-qPCR) experiments were performed in a subset of lncRNAs, validating their existence and expression patterns. Our results suggest that a diverse set of maize lncRNAs from leaves and roots is responsive to combined salt and boron stress, being the first effort to identify lncRNAs from a maize landrace adapted to extreme conditions such as the Atacama Desert. The information generated is a starting point to understand the genomic adaptabilities suffered by this maize to surpass this extremely stressed environment.

Long Non-Coding RNAs Responsive to Salt and Boron Stress in the Hyper-Arid Lluteño Maize from Atacama Desert

PubMed Central

Huanca-Mamani, Wilson; Arias-Carrasco, Raúl; Cárdenas-Ninasivincha, Steffany; Rojas-Herrera, Marcelo; Sepúlveda-Hermosilla, Gonzalo; Caris-Maldonado, José Carlos; Bastías, Elizabeth; Maracaja-Coutinho, Vinicius

2018-01-01

Long non-coding RNAs (lncRNAs) have been defined as transcripts longer than 200 nucleotides, which lack significant protein coding potential and possess critical roles in diverse cellular processes. Long non-coding RNAs have recently been functionally characterized in plant stress–response mechanisms. In the present study, we perform a comprehensive identification of lncRNAs in response to combined stress induced by salinity and excess of boron in the Lluteño maize, a tolerant maize landrace from Atacama Desert, Chile. We use deep RNA sequencing to identify a set of 48,345 different lncRNAs, of which 28,012 (58.1%) are conserved with other maize (B73, Mo17 or Palomero), with the remaining 41.9% belonging to potentially Lluteño exclusive lncRNA transcripts. According to B73 maize reference genome sequence, most Lluteño lncRNAs correspond to intergenic transcripts. Interestingly, Lluteño lncRNAs presents an unusual overall higher expression compared to protein coding genes under exposure to stressed conditions. In total, we identified 1710 putatively responsive to the combined stressed conditions of salt and boron exposure. We also identified a set of 848 stress responsive potential trans natural antisense transcripts (trans-NAT) lncRNAs, which seems to be regulating genes associated with regulation of transcription, response to stress, response to abiotic stimulus and participating of the nicotianamine metabolic process. Reverse transcription-quantitative PCR (RT-qPCR) experiments were performed in a subset of lncRNAs, validating their existence and expression patterns. Our results suggest that a diverse set of maize lncRNAs from leaves and roots is responsive to combined salt and boron stress, being the first effort to identify lncRNAs from a maize landrace adapted to extreme conditions such as the Atacama Desert. The information generated is a starting point to understand the genomic adaptabilities suffered by this maize to surpass this extremely stressed environment. PMID:29558449
Second-generation sequencing of entire mitochondrial coding-regions (∼15.4 kb) holds promise for study of the phylogeny and taxonomy of human body lice and head lice.

PubMed

Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C

2014-08-01

The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407. © 2014 The Royal Entomological Society.
Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

PubMed

Guo, D; Maiss, E; Adam, G; Casper, R

1995-05-01

The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.
Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding.

PubMed

Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh

2018-04-11

Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.
Emotional communication in medical consultations with native and non-native patients applying two different methodological approaches.

PubMed

Kale, Emine; Skjeldestad, Kristin; Finset, Arnstein

2013-09-01

To explore the potential agreement between two different methods to investigate emotional communication of native and non-native patients in medical consultations. The data consisted of 12 videotaped hospital consultations with six native and six non-native patients. The consultations were coded according to coding rules of the Verona Coding definitions of Emotional Sequences (VR-CoDES) and afterwards analyzed by discourse analysis (DA) by two co-workers who were blind to the results from VR-CoDES. The agreement between VR-CoDES and DA was high in consultations with many cues and concerns, both with native and non-native patients. In consultations with no (or one cue) according to VR-CoDES criteria the DA still indicated the presence of emotionally salient expressions and themes. In some consultations cues to underlying emotions are communicated so vaguely or veiled by language barriers that standard VR-CoDES coding may miss subtle cues. Many of these sub-threshold cues could potentially be coded as cues according to VR-CoDES main coding categories, if criteria for coding vague or ambiguous cues had been better specified. Combining different analytical frameworks on the same dataset provide us new insights on emotional communication. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

PubMed

Zhou, Carol L Ecale

2015-01-01

In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil

DTIC Science & Technology

2017-05-04

and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We
Decoding the non-coding RNAs in Alzheimer's disease.

PubMed

Schonrock, Nicole; Götz, Jürgen

2012-11-01

Non-coding RNAs (ncRNAs) are integral components of biological networks with fundamental roles in regulating gene expression. They can integrate sequence information from the DNA code, epigenetic regulation and functions of multimeric protein complexes to potentially determine the epigenetic status and transcriptional network in any given cell. Humans potentially contain more ncRNAs than any other species, especially in the brain, where they may well play a significant role in human development and cognitive ability. This review discusses their emerging role in Alzheimer's disease (AD), a human pathological condition characterized by the progressive impairment of cognitive functions. We discuss the complexity of the ncRNA world and how this is reflected in the regulation of the amyloid precursor protein and Tau, two proteins with central functions in AD. By understanding this intricate regulatory network, there is hope for a better understanding of disease mechanisms and ultimately developing diagnostic and therapeutic tools.
Quantized phase coding and connected region labeling for absolute phase retrieval.

PubMed

Chen, Xiangcheng; Wang, Yuwei; Wang, Yajun; Ma, Mengchao; Zeng, Chunnian

2016-12-12

This paper proposes an absolute phase retrieval method for complex object measurement based on quantized phase-coding and connected region labeling. A specific code sequence is embedded into quantized phase of three coded fringes. Connected regions of different codes are labeled and assigned with 3-digit-codes combining the current period and its neighbors. Wrapped phase, more than 36 periods, can be restored with reference to the code sequence. Experimental results verify the capability of the proposed method to measure multiple isolated objects.
Functional interrogation of non-coding DNA through CRISPR genome editing

PubMed Central

Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

2017-01-01

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
Memorization of Sequences of Movements of the Right or the Left Hand by Right- and Left-Handers: Vector Coding.

PubMed

Bobrova, E V; Bogacheva, I N; Lyakhovetskii, V A; Fabinskaja, A A; Fomina, E V

2017-01-01

In order to test the hypothesis of hemisphere specialization for different types of information coding (the right hemisphere, for positional coding; the left one, for vector coding), we analyzed the errors of right and left-handers during a task involving the memorization of sequences of movements by the left or the right hand, which activates vector coding by changing the order of movements in memorized sequences. The task was first performed by the right or the left hand, then by the opposite hand. It was found that both'right- and left-handers use the information about the previous movements of the dominant hand, but not of the non-dom" inant one. After changing the hand, right-handers use the information about previous movements of the second hand, while left-handers do not. We compared our results with the data of previous experiments, in which positional coding was activated, and concluded that both right- and left-handers use vector coding for memorizing the sequences of their dominant hands and positional coding for memorizing the sequences of non-dominant hand. No similar patterns of errors were found between right- and left-handers after changing the hand, which suggests that in right- and left-handersthe skills are transferred in different ways depending on the type of coding.
Characterization of mitochondrial genome of sea cucumber Stichopus horrens: a novel gene arrangement in Holothuroidea.

PubMed

Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing

2011-05-01

The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Evolution of coding and non-coding genes in HOX clusters of a marsupial.

PubMed

Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

2012-06-18

The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
Evolution of coding and non-coding genes in HOX clusters of a marsupial

PubMed Central

2012-01-01

Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672
Error and Error Mitigation in Low-Coverage Genome Assemblies

PubMed Central

Hubisz, Melissa J.; Lin, Michael F.; Kellis, Manolis; Siepel, Adam

2011-01-01

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download. PMID:21340033
Venom gland transcriptomic and venom proteomic analyses of the scorpion Megacormus gertschi Díaz-Najera, 1966 (Scorpiones: Euscorpiidae: Megacorminae).

PubMed

Santibáñez-López, Carlos E; Cid-Uribe, Jimena I; Zamudio, Fernando Z; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

2017-07-01

The soluble venom from the Mexican scorpion Megacormus gertschi of the family Euscorpiidae was obtained and its biological effects were tested in several animal models. This venom is not toxic to mice at doses of 100 μg per 20 g of mouse weight, while being lethal to arthropods (insects and crustaceans), at doses of 20 μg (for crickets) and 100 μg (for shrimps) per animal. Samples of the venom were separated by high performance liquid chromatography and circa 80 distinct chromatographic fractions were obtained from which 67 components have had their molecular weights determined by mass spectrometry analysis. The N-terminal amino acid sequence of seven protein/peptides were obtained by Edman degradation and are reported. Among the high molecular weight components there are enzymes with experimentally-confirmed phospholipase activity. A pair of telsons from this scorpion species was dissected, from which total RNA was extracted and used for cDNA library construction. Massive sequencing by the Illumina protocol, followed by de novo assembly, resulted in a total of 110,528 transcripts. From those, we were able to annotate 182, which putatively code for peptides/proteins with sequence similarity to previously-reported venom components available from different protein databases. Transcripts seemingly coding for enzymes showed the richest diversity, with 52 sequences putatively coding for proteases, 20 for phospholipases, 8 for lipases and 5 for hyaluronidases. The number of different transcripts potentially coding for peptides with sequence similarity to those that affect ion channels was 19, for putative antimicrobial peptides 19, and for protease inhibitor-like peptides, 18. Transcripts seemingly coding for other venom components were identified and described. The LC/MS analysis of a trypsin-digested venom aliquot resulted in 23 matches with the translated transcriptome database, which validates the transcriptome. The proteomic and transcriptomic analyses reported here constitute the first approach to study the venom components from a scorpion species belonging to the family Euscorpiidae. The data certainly show that this venom is different from all the ones described thus far in the literature. Copyright © 2017 Elsevier Ltd. All rights reserved.
Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes.

PubMed

Mahajan, Anubha; Wessel, Jennifer; Willems, Sara M; Zhao, Wei; Robertson, Neil R; Chu, Audrey Y; Gan, Wei; Kitajima, Hidetoshi; Taliun, Daniel; Rayner, N William; Guo, Xiuqing; Lu, Yingchang; Li, Man; Jensen, Richard A; Hu, Yao; Huo, Shaofeng; Lohman, Kurt K; Zhang, Weihua; Cook, James P; Prins, Bram Peter; Flannick, Jason; Grarup, Niels; Trubetskoy, Vassily Vladimirovich; Kravic, Jasmina; Kim, Young Jin; Rybin, Denis V; Yaghootkar, Hanieh; Müller-Nurasyid, Martina; Meidtner, Karina; Li-Gao, Ruifang; Varga, Tibor V; Marten, Jonathan; Li, Jin; Smith, Albert Vernon; An, Ping; Ligthart, Symen; Gustafsson, Stefan; Malerba, Giovanni; Demirkan, Ayse; Tajes, Juan Fernandez; Steinthorsdottir, Valgerdur; Wuttke, Matthias; Lecoeur, Cécile; Preuss, Michael; Bielak, Lawrence F; Graff, Marielisa; Highland, Heather M; Justice, Anne E; Liu, Dajiang J; Marouli, Eirini; Peloso, Gina Marie; Warren, Helen R; Afaq, Saima; Afzal, Shoaib; Ahlqvist, Emma; Almgren, Peter; Amin, Najaf; Bang, Lia B; Bertoni, Alain G; Bombieri, Cristina; Bork-Jensen, Jette; Brandslund, Ivan; Brody, Jennifer A; Burtt, Noël P; Canouil, Mickaël; Chen, Yii-Der Ida; Cho, Yoon Shin; Christensen, Cramer; Eastwood, Sophie V; Eckardt, Kai-Uwe; Fischer, Krista; Gambaro, Giovanni; Giedraitis, Vilmantas; Grove, Megan L; de Haan, Hugoline G; Hackinger, Sophie; Hai, Yang; Han, Sohee; Tybjærg-Hansen, Anne; Hivert, Marie-France; Isomaa, Bo; Jäger, Susanne; Jørgensen, Marit E; Jørgensen, Torben; Käräjämäki, Annemari; Kim, Bong-Jo; Kim, Sung Soo; Koistinen, Heikki A; Kovacs, Peter; Kriebel, Jennifer; Kronenberg, Florian; Läll, Kristi; Lange, Leslie A; Lee, Jung-Jin; Lehne, Benjamin; Li, Huaixing; Lin, Keng-Hung; Linneberg, Allan; Liu, Ching-Ti; Liu, Jun; Loh, Marie; Mägi, Reedik; Mamakou, Vasiliki; McKean-Cowdin, Roberta; Nadkarni, Girish; Neville, Matt; Nielsen, Sune F; Ntalla, Ioanna; Peyser, Patricia A; Rathmann, Wolfgang; Rice, Kenneth; Rich, Stephen S; Rode, Line; Rolandsson, Olov; Schönherr, Sebastian; Selvin, Elizabeth; Small, Kerrin S; Stančáková, Alena; Surendran, Praveen; Taylor, Kent D; Teslovich, Tanya M; Thorand, Barbara; Thorleifsson, Gudmar; Tin, Adrienne; Tönjes, Anke; Varbo, Anette; Witte, Daniel R; Wood, Andrew R; Yajnik, Pranav; Yao, Jie; Yengo, Loïc; Young, Robin; Amouyel, Philippe; Boeing, Heiner; Boerwinkle, Eric; Bottinger, Erwin P; Chowdhury, Rajiv; Collins, Francis S; Dedoussis, George; Dehghan, Abbas; Deloukas, Panos; Ferrario, Marco M; Ferrières, Jean; Florez, Jose C; Frossard, Philippe; Gudnason, Vilmundur; Harris, Tamara B; Heckbert, Susan R; Howson, Joanna M M; Ingelsson, Martin; Kathiresan, Sekar; Kee, Frank; Kuusisto, Johanna; Langenberg, Claudia; Launer, Lenore J; Lindgren, Cecilia M; Männistö, Satu; Meitinger, Thomas; Melander, Olle; Mohlke, Karen L; Moitry, Marie; Morris, Andrew D; Murray, Alison D; de Mutsert, Renée; Orho-Melander, Marju; Owen, Katharine R; Perola, Markus; Peters, Annette; Province, Michael A; Rasheed, Asif; Ridker, Paul M; Rivadineira, Fernando; Rosendaal, Frits R; Rosengren, Anders H; Salomaa, Veikko; Sheu, Wayne H-H; Sladek, Rob; Smith, Blair H; Strauch, Konstantin; Uitterlinden, André G; Varma, Rohit; Willer, Cristen J; Blüher, Matthias; Butterworth, Adam S; Chambers, John Campbell; Chasman, Daniel I; Danesh, John; van Duijn, Cornelia; Dupuis, Josée; Franco, Oscar H; Franks, Paul W; Froguel, Philippe; Grallert, Harald; Groop, Leif; Han, Bok-Ghee; Hansen, Torben; Hattersley, Andrew T; Hayward, Caroline; Ingelsson, Erik; Kardia, Sharon L R; Karpe, Fredrik; Kooner, Jaspal Singh; Köttgen, Anna; Kuulasmaa, Kari; Laakso, Markku; Lin, Xu; Lind, Lars; Liu, Yongmei; Loos, Ruth J F; Marchini, Jonathan; Metspalu, Andres; Mook-Kanamori, Dennis; Nordestgaard, Børge G; Palmer, Colin N A; Pankow, James S; Pedersen, Oluf; Psaty, Bruce M; Rauramaa, Rainer; Sattar, Naveed; Schulze, Matthias B; Soranzo, Nicole; Spector, Timothy D; Stefansson, Kari; Stumvoll, Michael; Thorsteinsdottir, Unnur; Tuomi, Tiinamaija; Tuomilehto, Jaakko; Wareham, Nicholas J; Wilson, James G; Zeggini, Eleftheria; Scott, Robert A; Barroso, Inês; Frayling, Timothy M; Goodarzi, Mark O; Meigs, James B; Boehnke, Michael; Saleheen, Danish; Morris, Andrew P; Rotter, Jerome I; McCarthy, Mark I

2018-04-01

We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10 -7 ); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition.
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.

PubMed

Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo

2016-08-30

Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
GATA: A graphic alignment tool for comparative sequenceanalysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Phylogenetic Network for European mtDNA

PubMed Central

Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari

2001-01-01

The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229

Deciphering the glycosaminoglycan code with the help of microarrays.

PubMed

de Paz, Jose L; Seeberger, Peter H

2008-07-01

Carbohydrate microarrays have become a powerful tool to elucidate the biological role of complex sugars. Microarrays are particularly useful for the study of glycosaminoglycans (GAGs), a key class of carbohydrates. The high-throughput chip format enables rapid screening of large numbers of potential GAG sequences produced via a complex biosynthesis while consuming very little sample. Here, we briefly highlight the most recent advances involving GAG microarrays built with synthetic or naturally derived oligosaccharides. These chips are powerful tools for characterizing GAG-protein interactions and determining structure-activity relationships for specific sequences. Thereby, they contribute to decoding the information contained in specific GAG sequences.
Insights from the genome of a high alkaline cellulase producing Aspergillus fumigatus strain obtained from Peruvian Amazon rainforest.

PubMed

Paul, Sujay; Zhang, Angel; Ludeña, Yvette; Villena, Gretty K; Yu, Fengan; Sherman, David H; Gutiérrez-Correa, Marcel

2017-06-10

Here, we report the complete genome sequence of a high alkaline cellulase producing Aspergillus fumigatus strain LMB-35Aa isolated from soil of Peruvian Amazon rainforest. The genome is ∼27.5mb in size, comprises of 228 scaffolds with an average GC content of 50%, and is predicted to contain a total of 8660 protein-coding genes. Of which, 6156 are with known function; it codes for 607 putative CAZymes families potentially involved in carbohydrate metabolism. Several important cellulose degrading genes, such as endoglucanase A, endoglucanase B, endoglucanase D and beta-glucosidase, are also identified. The genome of A. fumigatus strain LMB-35Aa represents the first whole sequenced genome of non-clinical, high cellulase producing A. fumigatus strain isolated from forest soil. Copyright © 2017 Elsevier B.V. All rights reserved.
A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

PubMed

Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

2012-01-01

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
Enabling plant synthetic biology through genome engineering.

PubMed

Baltes, Nicholas J; Voytas, Daniel F

2015-02-01

Synthetic biology seeks to create new biological systems, including user-designed plants and plant cells. These systems can be employed for a variety of purposes, ranging from producing compounds of industrial or therapeutic value, to reducing crop losses by altering cellular responses to pathogens or climate change. To realize the full potential of plant synthetic biology, techniques are required that provide control over the genetic code - enabling targeted modifications to DNA sequences within living plant cells. Such control is now within reach owing to recent advances in the use of sequence-specific nucleases to precisely engineer genomes. We discuss here the enormous potential provided by genome engineering for plant synthetic biology. Copyright © 2014 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerr, J.M.; Fisher, L.W.; Termine, J.D.

The authors have isolated and partially sequenced the human bone sialoprotein gene (IBSP). IBSP has been sublocalized by in situ hybridization to chromosome 4q38-q31 and is composed of six small exons (51 to 159 bp) and 1 large exon ([approximately]2.6 kb). The intron/exon junctions defined by sequence analysis are of class O, retaining an intact coding triplet. Sequence analysis of the 5[prime] upstream region revealed a TATAA (nucleotides -30 to-25 from the transcriptional start point) and a CCAAT (nucleotides -56 to-52) box, both in the reverse orientation. Intron 1 contains interesting structural elements composed of polypyrimidine repeats followed by amore » poly(AC)[sub n] tract. Both types of structural elements have been detected in promoter regions of other genes and have been implicated in transcriptional regulation. Several differences between the previously published cDNA sequence and the authors' sequence have been identified, most of which are contained within the untranslated exon 1. Three base revisions in the coding region include a G to T (Gly to Val, amino acid 195), T to C (Val to Ala, amino acid 268), and T to A (Glu to Asp, amino acid 270). In conclusion, the genomic organization and potential regulatory elements of human IBSP have been elucidated. 42 refs., 4 figs., 1 tab.« less
The first complete mitochondrial genome of Bactrocera tsuneonis (Miyake) (Diptera: Tephritidae) by next-generation sequencing and its phylogenetic implications.

PubMed

Zhang, Yue; Feng, Shiqian; Zeng, Yiying; Ning, Hong; Liu, Lijun; Zhao, Zihua; Jiang, Fan; Li, Zhihong

2018-06-23

Bactrocera tsuneonis (Miyake), generally known as the Japanese orange fly, is considered to be a major pest of commercial citrus crops. It has a limited distribution in China, Japan and Vietnam, but it has the potential to invade areas outside of Asia. More genetic information of B. tsuneonis should be obtained in order to develop effective methodologies for rapid and accurate molecular identification due to the difficulty of distinguishing it from Bactrocera minax based on morphological features. We report here the whole mitochondrial genome of B. tsuneonis sequenced by next-generation sequencing. This mitogenome sequence had a total length of 15,865 bp, a typical circular molecule comprising 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The structure and organization of the molecule were typical and similar compared with the published homologous sequences of other fruit flies in Tephritidae. The phylogenetic analyses based on the mitochondrial genome data presented a close genetic relationship between B. tsuneonis and B. minax. This is the first report of the complete mitochondrial genome of B. tsuneonis, and it can be used in further studies of species diagnosis, evolutionary biology, prevention and control. Copyright © 2018. Published by Elsevier B.V.
Unexpected allelic heterogeneity and spectrum of mutations in Fowler syndrome revealed by next-generation exome sequencing.

PubMed

Lalonde, Emilie; Albrecht, Steffen; Ha, Kevin C H; Jacob, Karine; Bolduc, Nathalie; Polychronakos, Constantin; Dechelotte, Pierre; Majewski, Jacek; Jabado, Nada

2010-08-01

Protein coding genes constitute approximately 1% of the human genome but harbor 85% of the mutations with large effects on disease-related traits. Therefore, efficient strategies for selectively sequencing complete coding regions (i.e., "whole exome") have the potential to contribute our understanding of human diseases. We used a method for whole-exome sequencing coupling Agilent whole-exome capture to the Illumina DNA-sequencing platform, and investigated two unrelated fetuses from nonconsanguineous families with Fowler Syndrome (FS), a stereotyped phenotype lethal disease. We report novel germline mutations in feline leukemia virus subgroup C cellular-receptor-family member 2, FLVCR2, which has recently been shown to cause FS. Using this technology, we identified three types of genetic abnormalities: point-mutations, insertions-deletions, and intronic splice-site changes (first pathogenic report using this technology), in the fetuses who both were compound heterozygotes for the disease. Although revealing a high level of allelic heterogeneity and mutational spectrum in FS, this study further illustrates the successful application of whole-exome sequencing to uncover genetic defects in rare Mendelian disorders. Of importance, we show that we can identify genes underlying rare, monogenic and recessive diseases using a limited number of patients (n=2), in the absence of shared genetic heritage and in the presence of allelic heterogeneity.
Genetic characterisation of the recent foot-and-mouth disease virus subtype A/IRN/2005

PubMed Central

Klein, Joern; Hussain, Manzoor; Ahmad, Munir; Normann, Preben; Afzal, Muhammad; Alexandersen, Soren

2007-01-01

Background According to the World Reference Laboratory for FMD, a new subtype of FMDV serotype A was detected in Iran in 2005. This subtype was designated A/IRN/2005, and rapidly spread throughout Iran and moved westwards into Saudi Arabia and Turkey where it was initially detected from August 2005 and subsequently caused major disease problems in the spring of 2006. The same subtype reached Jordan in 2007. As part of an ongoing project we have also detected this subtype in Pakistan with the first positive samples detected in April 2006. To characterise this subtype in detail, we have sequenced and analysed the complete coding sequence of three subtype A/IRN/2005 isolates collected in Pakistan in 2006, the complete coding sequence of one subtype A/IRN/2005 isolate collected during the first outbreak in Turkey in 2005 and, in addition, the partial 1D coding sequence derived from 4 epithelium samples and 34 swab-samples from Asian buffaloes or cattle subsequently found to be infected with the A/IRN/2005 subtype. Results The phylogenies of the genome regions encoding for the structural proteins, displayed, with the exception of 1A, distinct, serotype-specific clustering and an evolutionary relationship of the A/IRN/2005 sublineage with the A22 sublineage. Potential recombination events have been detected in parts of the genome region coding for the non-structural proteins of FMDV. In addition, amino acid substitutions have been detected in the deduced VP1 protein sequence, potentially related to clinical or subclinical outcome of FMD. Indications of differential susceptibility for developing a subclinical course of disease between Asian buffaloes and cattle have been detected. Furthermore, hitherto unknown insertions of 2 amino acids before the second start codon, as well as sublineage specific amino acids have been detected in the genome region encoding for the leader proteinase of A/IRN/2005 sublineage. Conclusion Our findings indicate that the A/IRN/2005 sublineage has undergone two different paths of evolution for the structural and non-structural genome regions. The structural genome regions have had their evolutionary starting point in the A22 sublineage. It can be assumed that, due to the quasispecies structure of FMDV populations and the error-prone replication process, advantageous mutations in a changed environment have been fixed and lead to the occurrence of the new A/IRN/2005 sublineage. Together with this mechanism, recombination within the non-structural genome regions, potentially modifying the virulence of the virus, may be involved in the success of this new sublineage. The possible origin of this recombinant virus may be a co-infection with Asia1 and a serotype A precursor of the A/IRN/2005 sublineage potentially within Asian Buffaloes, as these appears to relatively easy become infected, but usually without developing clinical disease and consequently showing not a strong acute inflammatory immune response against a second FMDV infection. PMID:18001482
Brain cDNA clone for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

McTiernan, C.; Adkins, S.; Chatonnet, A.

1987-10-01

A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
Functional interrogation of non-coding DNA through CRISPR genome editing.

PubMed

Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

2017-05-15

Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Bacillus anthracis genome organization in light of whole transcriptome sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.

2010-03-22

Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computationalmore » predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.« less
An FPGA Implementation to Detect Selective Cationic Antibacterial Peptides

PubMed Central

Polanco González, Carlos; Nuño Maganda, Marco Aurelio; Arias-Estrada, Miguel; del Rio, Gabriel

2011-01-01

Exhaustive prediction of physicochemical properties of peptide sequences is used in different areas of biological research. One example is the identification of selective cationic antibacterial peptides (SCAPs), which may be used in the treatment of different diseases. Due to the discrete nature of peptide sequences, the physicochemical properties calculation is considered a high-performance computing problem. A competitive solution for this class of problems is to embed algorithms into dedicated hardware. In the present work we present the adaptation, design and implementation of an algorithm for SCAPs prediction into a Field Programmable Gate Array (FPGA) platform. Four physicochemical properties codes useful in the identification of peptide sequences with potential selective antibacterial activity were implemented into an FPGA board. The speed-up gained in a single-copy implementation was up to 108 times compared with a single Intel processor cycle for cycle. The inherent scalability of our design allows for replication of this code into multiple FPGA cards and consequently improvements in speed are possible. Our results show the first embedded SCAPs prediction solution described and constitutes the grounds to efficiently perform the exhaustive analysis of the sequence-physicochemical properties relationship of peptides. PMID:21738652
Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

PubMed Central

Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

2017-01-01

Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization. PMID:29111566
A Code Division Multiple Access Communication System for the Low Frequency Band.

DTIC Science & Technology

1983-04-01

frequency channels spread-spectrum communication / complex sequences, orthogonal codes impulsive noise 20. ABSTRACT (Continue an reverse side It...their transmissions with signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal ...signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal sequences and thus log 2 M
Oligo/Polynucleotide-Based Gene Modification: Strategies and Therapeutic Potential

PubMed Central

Sargent, R. Geoffrey; Kim, Soya

2011-01-01

Oligonucleotide- and polynucleotide-based gene modification strategies were developed as an alternative to transgene-based and classical gene targeting-based gene therapy approaches for treatment of genetic disorders. Unlike the transgene-based strategies, oligo/polynucleotide gene targeting approaches maintain gene integrity and the relationship between the protein coding and gene-specific regulatory sequences. Oligo/polynucleotide-based gene modification also has several advantages over classical vector-based homologous recombination approaches. These include essentially complete homology to the target sequence and the potential to rapidly engineer patient-specific oligo/polynucleotide gene modification reagents. Several oligo/polynucleotide-based approaches have been shown to successfully mediate sequence-specific modification of genomic DNA in mammalian cells. The strategies involve the use of polynucleotide small DNA fragments, triplex-forming oligonucleotides, and single-stranded oligodeoxynucleotides to mediate homologous exchange. The primary focus of this review will be on the mechanistic aspects of the small fragment homologous replacement, triplex-forming oligonucleotide-mediated, and single-stranded oligodeoxynucleotide-mediated gene modification strategies as it relates to their therapeutic potential. PMID:21417933
Microbial metatranscriptomics in a permanent marine oxygen minimum zone.

PubMed

Stewart, Frank J; Ulloa, Osvaldo; DeLong, Edward F

2012-01-01

Simultaneous characterization of taxonomic composition, metabolic gene content and gene expression in marine oxygen minimum zones (OMZs) has potential to broaden perspectives on the microbial and biogeochemical dynamics in these environments. Here, we present a metatranscriptomic survey of microbial community metabolism in the Eastern Tropical South Pacific OMZ off northern Chile. Community RNA was sampled in late austral autumn from four depths (50, 85, 110, 200 m) extending across the oxycline and into the upper OMZ. Shotgun pyrosequencing of cDNA yielded 180,000 to 550,000 transcript sequences per depth. Based on functional gene representation, transcriptome samples clustered apart from corresponding metagenome samples from the same depth, highlighting the discrepancies between metabolic potential and actual transcription. BLAST-based characterizations of non-ribosomal RNA sequences revealed a dominance of genes involved with both oxidative (nitrification) and reductive (anammox, denitrification) components of the marine nitrogen cycle. Using annotations of protein-coding genes as proxies for taxonomic affiliation, we observed depth-specific changes in gene expression by key functional taxonomic groups. Notably, transcripts most closely matching the genome of the ammonia-oxidizing archaeon Nitrosopumilus maritimus dominated the transcriptome in the upper three depths, representing one in five protein-coding transcripts at 85 m. In contrast, transcripts matching the anammox bacterium Kuenenia stuttgartiensis dominated at the core of the OMZ (200 m; 1 in 12 protein-coding transcripts). The distribution of N. maritimus-like transcripts paralleled that of transcripts matching ammonia monooxygenase genes, which, despite being represented by both bacterial and archaeal sequences in the community DNA, were dominated (> 99%) by archaeal sequences in the RNA, suggesting a substantial role for archaeal nitrification in the upper OMZ. These data, as well as those describing other key OMZ metabolic processes (e.g. sulfur oxidation), highlight gene-specific expression patterns in the context of the entire community transcriptome, as well as identify key functional groups for taxon-specific genomic profiling. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

PubMed

Kress, W John; Erickson, David L

2007-06-06

A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Identification of G-quadruplex forming sequences in three manatee papillomaviruses

PubMed Central

Zahin, Maryam; Dean, William L.; Ghim, Shin-je; Joh, Joongho; Gray, Robert D.; Khanal, Sujita; Bossart, Gregory D.; Mignucci-Giannoni, Antonio A.; Rouchka, Eric C.; Jenson, Alfred B.; Trent, John O.; Chaires, Jonathan B.

2018-01-01

The Florida manatee (Trichechus manatus latirotris) is a threatened aquatic mammal in United States coastal waters. Over the past decade, the appearance of papillomavirus-induced lesions and viral papillomatosis in manatees has been a concern for those involved in the management and rehabilitation of this species. To date, three manatee papillomaviruses (TmPVs) have been identified in Florida manatees, one forming cutaneous lesions (TmPV1) and two forming genital lesions (TmPV3 and TmPV4). We identified DNA sequences with the potential to form G-quadruplex structures (G4) across the three genomes. G4 were located on both DNA strands and across coding and non-coding regions on all TmPVs, offering multiple targets for viral control. Although G4 have been identified in several viral genomes, including human PVs, most research has focused on canonical structures comprised of three G-tetrads. In contrast, the vast majority of sequences we identified would allow the formation of non-canonical structures with only two G-tetrads. Our biophysical analysis confirmed the formation of G4 with parallel topology in three such sequences from the E2 region. Two of the structures appear comprised of multiple stacked two G-tetrad structures, perhaps serving to increase structural stability. Computational analysis demonstrated enrichment of G4 sequences on all TmPVs on the reverse strand in the E2/E4 region and on both strands in the L2 region. Several G4 sequences occurred at similar regional locations on all PVs, most notably on the reverse strand in the E2 region. In other cases, G4 were identified at similar regional locations only on PVs forming genital lesions. On all TmPVs, G4 sequences were located in the non-coding region near putative E2 binding sites. Together, these findings suggest that G4 are possible regulatory elements in TmPVs. PMID:29630682
Coarse-grained sequences for protein folding and design.

PubMed

Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa

2003-09-16

We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.
Coarse-grained sequences for protein folding and design

PubMed Central

Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa

2003-01-01

We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815

Coding of Class I and II aminoacyl-tRNA synthetases

PubMed Central

Carter, Charles W.

2018-01-01

SUMMARY The aminoacyl-tRNA synthetases and their cognate transfer RNAs translate the universal genetic code. The twenty canonical amino acids are sufficiently diverse to create a selective advantage for dividing amino acid activation between two distinct, apparently unrelated superfamilies of synthetases, Class I amino acids being generally larger and less polar, Class II amino acids smaller and more polar. Biochemical, bioinformatic, and protein engineering experiments support the hypothesis that the two Classes descended from opposite strands of the same ancestral gene. Parallel experimental deconstructions of Class I and II synthetases reveal parallel losses in catalytic proficiency at two novel modular levels—protozymes and Urzymes—associated with the evolution of catalytic activity. Bi-directional coding supports an important unification of the proteome; affords a genetic relatedness metric—middle base-pairing frequencies in sense/antisense alignments—that probes more deeply into the evolutionary history of translation than do single multiple sequence alignments; and has facilitated the analysis of hitherto unknown coding relationships in tRNA sequences. Reconstruction of native synthetases by modular thermodynamic cycles facilitated by domain engineering emphasizes the subtlety associated with achieving high specificity, shedding new light on allosteric relationships in contemporary synthetases. Synthetase Urzyme structural biology suggests that they are catalytically active molten globules, broadening the potential manifold of polypeptide catalysts accessible to primitive genetic coding and motivating revisions of the origins of catalysis. Finally, bi-directional genetic coding of some of the oldest genes in the proteome places major limitations on the likelihood that any RNA World preceded the origins of coded proteins. PMID:28828732
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

PubMed

Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

2015-01-01

Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Streamlined Genome Sequence Compression using Distributed Source Coding

PubMed Central

Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

2014-01-01

We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552
Molecular cloning of a Candida albicans gene (SSB1) coding for a protein related to the Hsp70 family.

PubMed

Maneu, V; Cervera, A M; Martinez, J P; Gozalbo, D

1997-06-15

We have cloned and sequenced a Candida albicans gene (SSB1) encoding a potential member of the heat-shock protein seventy (hsp70) family. The protein encoded by this gene contains 613 amino acids and shows a high degree (85%) of sequence identity to the ssb subfamily (ssb1 and ssb2) of the Saccharomyces cerevisiae hsp70 family. The transcribed mRNA (2.1 kb) is present in similar amounts both in yeast and germ tube cells of C. albicans.
An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

PubMed

Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H

2017-12-01

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

PubMed

Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

PubMed Central

Dasenko, Mark A.

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

PubMed Central

Pietan, Lucas L.; Spradling, Theresa A.

2016-01-01

In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

PubMed

2004-12-09

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Identification and Molecular Characterization of Genes Coding Pharmaceutically Important Enzymes from Halo-Thermo Tolerant Bacillus

PubMed Central

Safary, Azam; Moniri, Rezvan; Hamzeh-Mivehroud, Maryam; Dastmalchi, Siavoush

2016-01-01

Purpose: Robust pharmaceutical and industrial enzymes from extremophile microorganisms are main source of enzymes with tremendous stability under harsh conditions which make them potential tools for commercial and biotechnological applications. Methods: The genome of a Gram-positive halo-thermotolerant Bacillus sp. SL1, new isolate from Saline Lake, was investigated for the presence of genes coding for potentially pharmaceutical enzymes. We determined gene sequences for the enzymes laccase (CotA), l-asparaginase (ansA3, ansA1), glutamate-specific endopeptidase (blaSE), l-arabinose isomerase (araA2), endo-1,4-β mannosidase (gmuG), glutaminase (glsA), pectate lyase (pelA), cellulase (bglC1), aldehyde dehydrogenase (ycbD) and allantoinases (pucH) in the genome of Bacillus sp. SL1. Results: Based on the DNA sequence alignment results, six of the studied enzymes of Bacillus sp. SL-1 showed 100% similarity at the nucleotide level to the same genes of B. licheniformis 14580 demonstrating extensive organizational relationship between these two strains. Despite high similarities between the B. licheniformis and Bacillus sp. SL-1 genomes, there are minor differences in the sequences of some enzyme. Approximately 30% of the enzyme sequences revealed more than 99% identity with some variations in nucleotides leading to amino acid substitution in protein sequences. Conclusion: Molecular characterization of this new isolate provides useful information regarding evolutionary relationship between B. subtilis and B. licheniformis species. Since, the most industrial processes are often performed in harsh conditions, enzymes from such halo-thermotolerant bacteria may provide economically and industrially appealing biocatalysts to be used under specific physicochemical situations in medical, pharmaceutical, chemical and other industries. PMID:28101462
Identification and Molecular Characterization of Genes Coding Pharmaceutically Important Enzymes from Halo-Thermo Tolerant Bacillus.

PubMed

Safary, Azam; Moniri, Rezvan; Hamzeh-Mivehroud, Maryam; Dastmalchi, Siavoush

2016-12-01

Purpose: Robust pharmaceutical and industrial enzymes from extremophile microorganisms are main source of enzymes with tremendous stability under harsh conditions which make them potential tools for commercial and biotechnological applications. Methods: The genome of a Gram-positive halo-thermotolerant Bacillus sp. SL1, new isolate from Saline Lake, was investigated for the presence of genes coding for potentially pharmaceutical enzymes. We determined gene sequences for the enzymes laccase (CotA), l-asparaginase (ansA3, ansA1), glutamate-specific endopeptidase (blaSE), l-arabinose isomerase (araA2), endo-1,4-β mannosidase (gmuG), glutaminase (glsA), pectate lyase (pelA), cellulase (bglC1), aldehyde dehydrogenase (ycbD) and allantoinases (pucH) in the genome of Bacillus sp. SL1. Results: Based on the DNA sequence alignment results, six of the studied enzymes of Bacillus sp. SL-1 showed 100% similarity at the nucleotide level to the same genes of B. licheniformis 14580 demonstrating extensive organizational relationship between these two strains. Despite high similarities between the B. licheniformis and Bacillus sp. SL-1 genomes, there are minor differences in the sequences of some enzyme. Approximately 30% of the enzyme sequences revealed more than 99% identity with some variations in nucleotides leading to amino acid substitution in protein sequences. Conclusion: Molecular characterization of this new isolate provides useful information regarding evolutionary relationship between B. subtilis and B. licheniformis species. Since, the most industrial processes are often performed in harsh conditions, enzymes from such halo-thermotolerant bacteria may provide economically and industrially appealing biocatalysts to be used under specific physicochemical situations in medical, pharmaceutical, chemical and other industries.
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays

PubMed Central

Boerner, Susan; McGinnis, Karen M.

2012-01-01

Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
Sequencing proteins with transverse ionic transport in nanochannels.

PubMed

Boynton, Paul; Di Ventra, Massimiliano

2016-05-03

De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Golay sequences coded coherent optical OFDM for long-haul transmission

NASA Astrophysics Data System (ADS)

Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

2017-09-01

We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.
A benchmark study of scoring methods for non-coding mutations.

PubMed

Drubay, Damien; Gautheret, Daniel; Michiels, Stefan

2018-05-15

Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
Characterisation of divergent flavivirus NS3 and NS5 protein sequences detected in Rhipicephalus microplus ticks from Brazil

PubMed Central

Maruyama, Sandra Regina; Castro-Jorge, Luiza Antunes; Ribeiro, José Marcos Chaves; Gardinassi, Luiz Gustavo; Garcia, Gustavo Rocha; Brandão, Lucinda Giampietro; Rodrigues, Aline Rezende; Okada, Marcos Ituo; Abrão, Emiliana Pereira; Ferreira, Beatriz Rossetti; da Fonseca, Benedito Antonio Lopes; de Miranda-Santos, Isabel Kinney Ferreira

2013-01-01

Transcripts similar to those that encode the nonstructural (NS) proteins NS3 and NS5 from flaviviruses were found in a salivary gland (SG) complementary DNA (cDNA) library from the cattle tick Rhipicephalus microplus. Tick extracts were cultured with cells to enable the isolation of viruses capable of replicating in cultured invertebrate and vertebrate cells. Deep sequencing of the viral RNA isolated from culture supernatants provided the complete coding sequences for the NS3 and NS5 proteins and their molecular characterisation confirmed similarity with the NS3 and NS5 sequences from other flaviviruses. Despite this similarity, phylogenetic analyses revealed that this potentially novel virus may be a highly divergent member of the genus Flavivirus. Interestingly, we detected the divergent NS3 and NS5 sequences in ticks collected from several dairy farms widely distributed throughout three regions of Brazil. This is the first report of flavivirus-like transcripts in R. microplus ticks. This novel virus is a potential arbovirus because it replicated in arthropod and mammalian cells; furthermore, it was detected in a cDNA library from tick SGs and therefore may be present in tick saliva. It is important to determine whether and by what means this potential virus is transmissible and to monitor the virus as a potential emerging tick-borne zoonotic pathogen. PMID:24626302
Complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera, and comparative analyses with other grass genomes

PubMed Central

Saski, Christopher; Lee, Seung-Bum; Fjellheim, Siri; Guda, Chittibabu; Jansen, Robert K.; Luo, Hong; Tomkins, Jeffrey; Rognli, Odd Arne; Clarke, Jihong Liu

2009-01-01

Comparisons of complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera to six published grass chloroplast genomes reveal that gene content and order are similar but two microstructural changes have occurred. First, the expansion of the IR at the SSC/IRa boundary that duplicates a portion of the 5′ end of ndhH is restricted to the three genera of the subfamily Pooideae (Agrostis, Hordeum and Triticum). Second, a 6 bp deletion in ndhK is shared by Agrostis, Hordeum, Oryza and Triticum, and this event supports the sister relationship between the subfamilies Erhartoideae and Pooideae. Repeat analysis identified 19–37 direct and inverted repeats 30 bp or longer with a sequence identity of at least 90%. Seventeen of the 26 shared repeats are found in all the grass chloroplast genomes examined and are located in the same genes or intergenic spacer (IGS) regions. Examination of simple sequence repeats (SSRs) identified 16–21 potential polymorphic SSRs. Five IGS regions have 100% sequence identity among Zea mays, Saccharum officinarum and Sorghum bicolor, whereas no spacer regions were identical among Oryza sativa, Triticum aestivum, H. vulgare and A. stolonifera despite their close phylogenetic relationship. Alignment of EST sequences and DNA coding sequences identified six C–U conversions in both Sorghum bicolor and H. vulgare but only one in A. stolonifera. Phylogenetic trees based on DNA sequences of 61 protein-coding genes of 38 taxa using both maximum parsimony and likelihood methods provide moderate support for a sister relationship between the subfamilies Erhartoideae and Pooideae. PMID:17534593
Characterization of a species-specific repetitive DNA from a highly endangered wild animal, Rhinoceros unicornis, and assessment of genetic polymorphism by microsatellite associated sequence amplification (MASA).

PubMed

Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S

1999-03-04

We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
MouSensor: A Versatile Genetic Platform to Create Super Sniffer Mice for Studying Human Odor Coding.

PubMed

D'Hulst, Charlotte; Mina, Raena B; Gershon, Zachary; Jamet, Sophie; Cerullo, Antonio; Tomoiaga, Delia; Bai, Li; Belluscio, Leonardo; Rogers, Matthew E; Sirotin, Yevgeniy; Feinstein, Paul

2016-07-26

Typically, ∼0.1% of the total number of olfactory sensory neurons (OSNs) in the main olfactory epithelium express the same odorant receptor (OR) in a singular fashion and their axons coalesce into homotypic glomeruli in the olfactory bulb. Here, we have dramatically increased the total number of OSNs expressing specific cloned OR coding sequences by multimerizing a 21-bp sequence encompassing the predicted homeodomain binding site sequence, TAATGA, known to be essential in OR gene choice. Singular gene choice is maintained in these "MouSensors." In vivo synaptopHluorin imaging of odor-induced responses by known M71 ligands shows functional glomerular activation in an M71 MouSensor. Moreover, a behavioral avoidance task demonstrates that specific odor detection thresholds are significantly decreased in multiple transgenic lines, expressing mouse or human ORs. We have developed a versatile platform to study gene choice and axon identity, to create biosensors with great translational potential, and to finally decode human olfaction. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

PubMed

Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

2018-06-05

With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

PubMed

Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

2016-01-01

Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.
The eukaryotic genome is structurally and functionally more like a social insect colony than a book.

PubMed

Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin

2017-11-01

Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.
Increased apomixis expression concurrent with genetic and epigenetic variation in a newly synthesized Eragrostis curvula polyploid

NASA Astrophysics Data System (ADS)

Zappacosta, Diego C.; Ochogavía, Ana C.; Rodrigo, Juan M.; Romero, José R.; Meier, Mauro S.; Garbus, Ingrid; Pessino, Silvina C.; Echenique, Viviana C.

2014-04-01

Eragrostis curvula includes biotypes reproducing through obligate and facultative apomixis or, rarely, full sexuality. We previously generated a ``tetraploid-dihaploid-tetraploid'' series of plants consisting of a tetraploid apomictic plant (T), a sexual dihaploid plant (D) and a tetraploid artificial colchiploid (C). Initially, plant C was nearly 100% sexual. However, its capacity to form non-reduced embryo sacs dramatically increased over a four year period (2003-2007) to reach levels of 85-90%. Here, we confirmed high rates of apomixis in plant C, and used AFLPs and MSAPs to characterize the genetic and epigenetic variation observed in this plant in 2007 as compared to 2003. Of the polymorphic sequences, some had no coding potential whereas others were homologous to retrotransposons and/or protein-coding-like sequences. Our results suggest that in this particular plant system increased apomixis expression is concurrent with genetic and epigenetic modifications, possibly involving transposable elements.
A Deeper Examination of Thorellius atrox Scorpion Venom Components with Omic Techonologies.

PubMed

Romero-Gutierrez, Teresa; Peguero-Sanchez, Esteban; Cevallos, Miguel A; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

2017-12-12

This communication reports a further examination of venom gland transcripts and venom composition of the Mexican scorpion Thorellius atrox using RNA-seq and tandem mass spectrometry. The RNA-seq, which was performed with the Illumina protocol, yielded more than 20,000 assembled transcripts. Following a database search and annotation strategy, 160 transcripts were identified, potentially coding for venom components. A novel sequence was identified that potentially codes for a peptide with similarity to spider ω-agatoxins, which act on voltage-gated calcium channels, not known before to exist in scorpion venoms. Analogous transcripts were found in other scorpion species. They could represent members of a new scorpion toxin family, here named omegascorpins. The mass fingerprint by LC-MS identified 135 individual venom components, five of which matched with the theoretical masses of putative peptides translated from the transcriptome. The LC-MS/MS de novo sequencing allowed to reconstruct and identify 42 proteins encoded by assembled transcripts, thus validating the transcriptome analysis. Earlier studies conducted with this scorpion venom permitted the identification of only twenty putative venom components. The present work performed with more powerful and modern omic technologies demonstrates the capacity of accomplishing a deeper characterization of scorpion venom components and the identification of novel molecules with potential applications in biomedicine and the study of ion channel physiology.
A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

PubMed Central

Kress, W. John; Erickson, David L.

2007-01-01

Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification.

PubMed

Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P

2017-03-01

Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production.

PubMed

Roth, Melissa S; Cokus, Shawn J; Gallaher, Sean D; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A; Merchant, Sabeeha S; Pellegrini, Matteo; Niyogi, Krishna K

2017-05-23

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis , because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase ( BKT ), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

DOE PAGES

Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; ...

2017-05-08

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

PubMed Central

Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A.; Merchant, Sabeeha S.; Pellegrini, Matteo

2017-01-01

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production. PMID:28484037
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.

PubMed

O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F

2011-07-01

To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
A DS-UWB Cognitive Radio System Based on Bridge Function Smart Codes

NASA Astrophysics Data System (ADS)

Xu, Yafei; Hong, Sheng; Zhao, Guodong; Zhang, Fengyuan; di, Jinshan; Zhang, Qishan

This paper proposes a direct-sequence UWB Gaussian pulse of cognitive radio systems based on bridge function smart sequence matrix and the Gaussian pulse. As the system uses the spreading sequence code, that is the bridge function smart code sequence, the zero correlation zones (ZCZs) which the bridge function sequences' auto-correlation functions had, could reduce multipath fading of the pulse interference. The Modulated channel signal was sent into the IEEE 802.15.3a UWB channel. We analysis the ZCZs's inhibition to the interference multipath interference (MPI), as one of the main system sources interferences. The simulation in SIMULINK/MATLAB is described in detail. The result shows the system has better performance by comparison with that employing Walsh sequence square matrix, and it was verified by the formula in principle.
The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

PubMed Central

Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

2002-01-01

Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388
Structural and functional partitioning of bread wheat chromosome 3B.

PubMed

Choulet, Frédéric; Alberti, Adriana; Theil, Sébastien; Glover, Natasha; Barbe, Valérie; Daron, Josquin; Pingault, Lise; Sourdille, Pierre; Couloux, Arnaud; Paux, Etienne; Leroy, Philippe; Mangenot, Sophie; Guilhot, Nicolas; Le Gouis, Jacques; Balfourier, Francois; Alaux, Michael; Jamilloux, Véronique; Poulain, Julie; Durand, Céline; Bellec, Arnaud; Gaspin, Christine; Safar, Jan; Dolezel, Jaroslav; Rogers, Jane; Vandepoele, Klaas; Aury, Jean-Marc; Mayer, Klaus; Berges, Hélène; Quesneville, Hadi; Wincker, Patrick; Feuillet, Catherine

2014-07-18

We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits. Copyright © 2014, American Association for the Advancement of Science.
Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

PubMed Central

2012-01-01

Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Characterization of a Theta-Type Plasmid from Lactobacillus sakei: a Potential Basis for Low-Copy-Number Vectors in Lactobacilli

PubMed Central

Alpert, Carl-Alfred; Crutz-Le Coq, Anne-Marie; Malleret, Christine; Zagorec, Monique

2003-01-01

The complete nucleotide sequence of the 13-kb plasmid pRV500, isolated from Lactobacillus sakei RV332, was determined. Sequence analysis enabled the identification of genes coding for a putative type I restriction-modification system, two genes coding for putative recombinases of the integrase family, and a region likely involved in replication. The structural features of this region, comprising a putative ori segment containing 11- and 22-bp repeats and a repA gene coding for a putative initiator protein, indicated that pRV500 belongs to the pUCL287 subfamily of theta-type replicons. A 3.7-kb fragment encompassing this region was fused to an Escherichia coli replicon to produce the shuttle vector pRV566 and was observed to be functional in L. sakei for plasmid replication. The L. sakei replicon alone could not support replication in E. coli. Plasmid pRV500 and its derivative pRV566 were determined to be at very low copy numbers in L. sakei. pRV566 was maintained at a reasonable rate over 20 generations in several lactobacilli, such as Lactobacillus curvatus, Lactobacillus casei, and Lactobacillus plantarum, in addition to L. sakei, making it an interesting basis for developing vectors. Sequence relationships with other plasmids are described and discussed. PMID:12957947
The first draft genome of the aquatic model plant Lemna minor opens the route for future stress physiology research and biotechnological applications.

PubMed

Van Hoeck, Arne; Horemans, Nele; Monsieurs, Pieter; Cao, Hieu Xuan; Vandenhove, Hildegarde; Blust, Ronny

2015-01-01

Freshwater duckweed, comprising the smallest, fastest growing and simplest macrophytes has various applications in agriculture, phytoremediation and energy production. Lemna minor, the so-called common duckweed, is a model system of these aquatic plants for ecotoxicological bioassays, genetic transformation tools and industrial applications. Given the ecotoxic relevance and high potential for biomass production, whole-genome information of this cosmopolitan duckweed is needed. The 472 Mbp assembly of the L. minor genome (2n = 40; estimated 481 Mbp; 98.1 %) contains 22,382 protein-coding genes and 61.5 % repetitive sequences. The repeat content explains 94.5 % of the genome size difference in comparison with the greater duckweed, Spirodela polyrhiza (2n = 40; 158 Mbp; 19,623 protein-coding genes; and 15.79 % repetitive sequences). Comparison of proteins from other monocot plants, protein ortholog identification, OrthoMCL, suggests 1356 duckweed-specific groups (3367 proteins, 15.0 % total L. minor proteins) and 795 Lemna-specific groups (2897 proteins, 12.9 % total L. minor proteins). Interestingly, proteins involved in biosynthetic processes in response to various stimuli and hydrolase activities are enriched in the Lemna proteome in comparison with the Spirodela proteome. The genome sequence and annotation of L. minor protein-coding genes provide new insights in biological understanding and biomass production applications of Lemna species.
An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology

PubMed Central

Li, Jian; Batcha, Aarif Mohamed Nazeer; Grüning, Björn; Mansmann, Ulrich R.

2015-01-01

Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility. PMID:27081306
Nonspatial Sequence Coding in CA1 Neurons

PubMed Central

Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

2016-01-01

The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal region CA1 while rats performed a nonspatial sequence memory task. We found that hippocampal neurons code for the temporal context of items (whether odors were presented in the correct or incorrect sequential position) and that this activity is linked with memory performance. The discovery of this novel form of temporal coding in hippocampal neurons advances our fundamental understanding of the neurobiology of episodic memory and will serve as a foundation for our cross-species, multitechnique approach aimed at elucidating the neural mechanisms underlying memory impairments in aging and dementia. PMID:26843637

Peptide library synthesis on spectrally encoded beads for multiplexed protein/peptide bioassays

NASA Astrophysics Data System (ADS)

Nguyen, Huy Q.; Brower, Kara; Harink, Björn; Baxter, Brian; Thorn, Kurt S.; Fordyce, Polly M.

2017-02-01

Protein-peptide interactions are essential for cellular responses. Despite their importance, these interactions remain largely uncharacterized due to experimental challenges associated with their measurement. Current techniques (e.g. surface plasmon resonance, fluorescence polarization, and isothermal calorimetry) either require large amounts of purified material or direct fluorescent labeling, making high-throughput measurements laborious and expensive. In this report, we present a new technology for measuring antibody-peptide interactions in vitro that leverages spectrally encoded beads for biological multiplexing. Specific peptide sequences are synthesized directly on encoded beads with a 1:1 relationship between peptide sequence and embedded code, thereby making it possible to track many peptide sequences throughout the course of an experiment within a single small volume. We demonstrate the potential of these bead-bound peptide libraries by: (1) creating a set of 46 peptides composed of 3 commonly used epitope tags (myc, FLAG, and HA) and single amino-acid scanning mutants; (2) incubating with a mixture of fluorescently-labeled antimyc, anti-FLAG, and anti-HA antibodies; and (3) imaging these bead-bound libraries to simultaneously identify the embedded spectral code (and thus the sequence of the associated peptide) and quantify the amount of each antibody bound. To our knowledge, these data demonstrate the first customized peptide library synthesized directly on spectrally encoded beads. While the implementation of the technology provided here is a high-affinity antibody/protein interaction with a small code space, we believe this platform can be broadly applicable to any range of peptide screening applications, with the capability to multiplex into libraries of hundreds to thousands of peptides in a single assay.
New genetic variants of LATS1 detected in urinary bladder and colon cancer.

PubMed

Saadeldin, Mona K; Shawer, Heba; Mostafa, Ahmed; Kassem, Neemat M; Amleh, Asma; Siam, Rania

2014-01-01

LATS1, the large tumor suppressor 1 gene, encodes for a serine/threonine kinase protein and is implicated in cell cycle progression. LATS1 is down-regulated in various human cancers, such as breast cancer, and astrocytoma. Point mutations in LATS1 were reported in human sarcomas. Additionally, loss of heterozygosity of LATS1 chromosomal region predisposes to breast, ovarian, and cervical tumors. In the current study, we investigated LATS1 genetic variations including single nucleotide polymorphisms (SNPs), in 28 Egyptian patients with either urinary bladder or colon cancers. The LATS1 gene was amplified and sequenced and the expression of LATS1 at the RNA level was assessed in 12 urinary bladder cancer samples. We report, the identification of a total of 29 variants including previously identified SNPs within LATS1 coding and non-coding sequences. A total of 18 variants were novel. Majority of the novel variants, 13, were mapped to intronic sequences and un-translated regions of the gene. Four of the five novel variants located in the coding region of the gene, represented missense mutations within the serine/threonine kinase catalytic domain. Interestingly, LATS1 RNA steady state levels was lost in urinary bladder cancerous tissue harboring four specific SNPs (16045 + 41736 + 34614 + 56177) positioned in the 5'UTR, intron 6, and two silent mutations within exon 4 and exon 8, respectively. This study identifies novel single-base-sequence alterations in the LATS1 gene. These newly identified variants could potentially be used as novel diagnostic or prognostic tools in cancer.
Computational Prediction of the Immunomodulatory Potential of RNA Sequences.

PubMed

Nagpal, Gandharva; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Raghava, Gajendra Pal Singh

2017-01-01

Advances in the knowledge of various roles played by non-coding RNAs have stimulated the application of RNA molecules as therapeutics. Among these molecules, miRNA, siRNA, and CRISPR-Cas9 associated gRNA have been identified as the most potent RNA molecule classes with diverse therapeutic applications. One of the major limitations of RNA-based therapeutics is immunotoxicity of RNA molecules as it may induce the innate immune system. In contrast, RNA molecules that are potent immunostimulators are strong candidates for use in vaccine adjuvants. Thus, it is important to understand the immunotoxic or immunostimulatory potential of these RNA molecules. The experimental techniques for determining immunostimulatory potential of siRNAs are time- and resource-consuming. To overcome this limitation, recently our group has developed a web-based server "imRNA" for predicting the immunomodulatory potential of RNA sequences. This server integrates a number of modules that allow users to perform various tasks including (1) generation of RNA analogs with reduced immunotoxicity, (2) identification of highly immunostimulatory regions in RNA sequence, and (3) virtual screening. This server may also assist users in the identification of minimum mutations required in a given RNA sequence to minimize its immunomodulatory potential that is required for designing RNA-based therapeutics. Besides, the server can be used for designing RNA-based vaccine adjuvants as it may assist users in the identification of mutations required for increasing immunomodulatory potential of a given RNA sequence. In summary, this chapter describes major applications of the "imRNA" server in designing RNA-based therapeutics and vaccine adjuvants (http://www.imtech.res.in/raghava/imrna/).
A novel peptide from the ACEI/BPP-CNP precursor in the venom of Crotalus durissus collilineatus.

PubMed

Higuchi, Shigesada; Murayama, Nobuhiro; Saguchi, Ken-ichi; Ohi, Hiroaki; Fujita, Yoshiaki; da Silva, Nelson Jorge; de Siqueira, Rodrigo José Bezerra; Lahlou, Saad; Aird, Steven D

2006-10-01

In crotaline venoms, angiotensin-converting enzyme inhibitors [ACEIs, also known as bradykinin potentiating peptides (BPPs)], are products of a gene coding for an ACEI/BPP-C-type natriuretic peptide (CNP) precursor. In the genes from Bothrops jararaca and Gloydius blomhoffii, ACEI/BPP sequences are repeated. Sequencing of a cDNA clone from venom glands of Crotalus durissus collilineatus showed that two ACEIs/BPPs are located together at the N-terminus, but without repeats. An additional sequence for CNP was unexpectedly found at the C-terminus. Homologous genes for the ACEI/BPP-CNP precursor suggest that most crotaline venoms contain both ACEIs/BPPs and CNP. The sequence of ACEIs/BPPs is separated from the CNP sequence by a long spacer sequence. Previously, there was no evidence that this spacer actually coded any expressed peptides. Aird and Kaiser (1986, unpublished) previously isolated and sequenced a peptide of 11 residues (TPPAGPDVGPR) from Crotalus viridis viridis venom. In the present study, analysis of the cDNA clone from C. d. collilineatus revealed a nearly identical sequence in the ACEI/BPP-CNP spacer. Fractionation of the crude venom by reverse phase HPLC (C(18)), and analysis of the fractions by mass spectrometry (MS) indicated a component of 1020.5 Da. Amino acid sequencing by MS/MS confirmed that C. d. collilineatus venom contains the peptide TPPAGPDGGPR. Its high proline content and paired proline residues are typical of venom hypotensive peptides, although it lacks the usual N-terminal pyroglutamate. It has no demonstrable hypotensive activity when injected intravenously in rats; however, its occurrence in the venoms of dissimilar species suggests that its presence is not accidental. Evidence suggests that these novel toxins probably activate anaphylatoxin C3a receptors.
Verona Coding Definitions of Emotional Sequences (VR-CoDES): Conceptual framework and future directions.

PubMed

Piccolo, Lidia Del; Finset, Arnstein; Mellblom, Anneli V; Figueiredo-Braga, Margarida; Korsvold, Live; Zhou, Yuefang; Zimmermann, Christa; Humphris, Gerald

2017-12-01

To discuss the theoretical and empirical framework of VR-CoDES and potential future direction in research based on the coding system. The paper is based on selective review of papers relevant to the construction and application of VR-CoDES. VR-CoDES system is rooted in patient-centered and biopsychosocial model of healthcare consultations and on a functional approach to emotion theory. According to the VR-CoDES, emotional interaction is studied in terms of sequences consisting of an eliciting event, an emotional expression by the patient and the immediate response by the clinician. The rationale for the emphasis on sequences, on detailed classification of cues and concerns, and on the choices of explicit vs. non-explicit responses and providing vs. reducing room for further disclosure, as basic categories of the clinician responses, is described. Results from research on VR-CoDES may help raise awareness of emotional sequences. Future directions in applying VR-CoDES in research may include studies on predicting patient and clinician behavior within the consultation, qualitative analyses of longer sequences including several VR-CoDES triads, and studies of effects of emotional communication on health outcomes. VR-CoDES may be applied to develop interventions to promote good handling of patients' emotions in healthcare encounters. Copyright © 2017 Elsevier B.V. All rights reserved.
Gene sequences for cytochromes p450 1A1 and 1A2: the need for biomarker development in sea otters (Enhydra lutris).

PubMed

Hook, Sharon E; Cobb, Michael E; Oris, James T; Anderson, Jack W

2008-11-01

There has been recent public concern regarding the impacts of environmental pollution on populations of otters. Population level impacts have been seen with otter (Lutra lutra) populations in Europe due to polychlorinated biphenyls, and with some segments of the Prince William Sound, AK, sea otter (Enhydra lutris) population following the Exxon Valdez oil spill. Despite public interest in these animals and their ecological significance, there are few tools that allow for the study of otter's response to contaminant exposure. Cytochrome p450 1A (CYP1A) performs the first step in metabolizing many xenobiotics, including many polychlorinated biphenyls and polycyclic aromatic hydrocarbons. CYP1A induction is a frequently used biomarker of exposure to these compounds. Despite the potential importance of this gene in ecological risk assessment, the complete coding sequence has not been published for any otter species. This study's objective was to isolate the gene for CYP1A1 and CYP1A2 in sea otters using a series of PCR-based approaches. The coding sequences from CYP1A1 and CYP1A2 from sea otters were identified and published in GenBank. Both CYP1A sequences are homologous to those obtained from marine mammals and other carnivores. These sequences will be useful as tools for researchers assessing contaminant exposure in mustelid populations.
Complete plastid genome sequence of Daucus carota: implications for biotechnology and phylogeny of angiosperms.

PubMed

Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry

2006-08-31

Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats > or = 30 bp with a sequence identity > or = 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements.
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds.

PubMed

Dean, Rebecca; Harrison, Peter W; Wright, Alison E; Zimmer, Fabian; Mank, Judith E

2015-10-01

The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.

PubMed

Hargreaves, Adam D; Mulley, John F

2015-01-01

Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.
Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

PubMed Central

Hargreaves, Adam D.

2015-01-01

Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species. PMID:26623194
Biological Information Transfer Beyond the Genetic Code: The Sugar Code

NASA Astrophysics Data System (ADS)

Gabius, H.-J.

In the era of genetic engineering, cloning, and genome sequencing the focus of research on the genetic code has received an even further accentuation in the public eye. In attempting, however, to understand intra- and intercellular recognition processes comprehensively, the two biochemical dimensions established by nucleic acids and proteins are not sufficient to satisfactorily explain all molecular events in, for example, cell adhesion or routing. The consideration of further code systems is essential to bridge this gap. A third biochemical alphabet forming code words with an information storage capacity second to no other substance class in rather small units (words, sentences) is established by monosaccharides (letters). As hardware oligosaccharides surpass peptides by more than seven orders of magnitude in the theoretical ability to build isomers, when the total of conceivable hexamers is calculated. In addition to the sequence complexity, the use of magnetic resonance spectroscopy and molecular modeling has been instrumental in discovering that even small glycans can often reside in not only one but several distinct low-energy conformations (keys). Intriguingly, conformers can display notably different capacities to fit snugly into the binding site of nonhomologous receptors (locks). This process, experimentally verified for two classes of lectins, is termed "differential conformer selection." It adds potential for shifts of the conformer equilibrium to modulate ligand properties dynamically and reversibly to the well-known changes in sequence (including anomeric positioning and linkage points) and in pattern of substitution, for example, by sulfation. In the intimate interplay with sugar receptors (lectins, enzymes, and antibodies) the message of coding units of the sugar code is deciphered. Their recognition will trigger postbinding signaling and the intended biological response. Knowledge about the driving forces for the molecular rendezvous, i.e., contributions of bidentate or cooperative hydrogen bonds, dispersion forces, stacking, and solvent rearrangement, will enable the design of high-affinity ligands or mimetics thereof. They embody clinical applications reaching from receptor localization in diagnostic pathology to cell type-selective targeting of drugs and inhibition of undesired cell adhesion in bacterial/viral infections, inflammation, or metastasis.
Comparison of simple sequence repeats in 19 Archaea.

PubMed

Trivedi, S

2006-12-05

All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

PubMed

Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

2016-03-01

Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

PubMed

Pastor, D; Amaya, W; García-Olcina, R; Sales, S

2007-07-01

We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.
FOURTH SEMINAR TO THE MEMORY OF D.N. KLYSHKO: Algebraic solution of the synthesis problem for coded sequences

NASA Astrophysics Data System (ADS)

Leukhin, Anatolii N.

2005-08-01

The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.
Helper-Dependent Adenoviral Vectors.

PubMed

Rosewell, Amanda; Vetrini, Francesco; Ng, Philip

2011-10-29

Helper-dependent adenoviral vectors are devoid of all viral coding sequences, possess a large cloning capacity, and can efficiently transduce a wide variety of cell types from various species independent of the cell cycle to mediate long-term transgene expression without chronic toxicity. These non-integrating vectors hold tremendous potential for a variety of gene transfer and gene therapy applications. Here, we review the production technologies, applications, obstacles to clinical translation and their potential resolutions, and the future challenges and unanswered questions regarding this promising gene transfer technology.
Helper-Dependent Adenoviral Vectors

PubMed Central

Rosewell, Amanda; Vetrini, Francesco; Ng, Philip

2012-01-01

Helper-dependent adenoviral vectors are devoid of all viral coding sequences, possess a large cloning capacity, and can efficiently transduce a wide variety of cell types from various species independent of the cell cycle to mediate long-term transgene expression without chronic toxicity. These non-integrating vectors hold tremendous potential for a variety of gene transfer and gene therapy applications. Here, we review the production technologies, applications, obstacles to clinical translation and their potential resolutions, and the future challenges and unanswered questions regarding this promising gene transfer technology. PMID:24533227
Interframe vector wavelet coding technique

NASA Astrophysics Data System (ADS)

Wus, John P.; Li, Weiping

1997-01-01

Wavelet coding is often used to divide an image into multi- resolution wavelet coefficients which are quantized and coded. By 'vectorizing' scalar wavelet coding and combining this with vector quantization (VQ), vector wavelet coding (VWC) can be implemented. Using a finite number of states, finite-state vector quantization (FSVQ) takes advantage of the similarity between frames by incorporating memory into the video coding system. Lattice VQ eliminates the potential mismatch that could occur using pre-trained VQ codebooks. It also eliminates the need for codebook storage in the VQ process, thereby creating a more robust coding system. Therefore, by using the VWC coding method in conjunction with the FSVQ system and lattice VQ, the formulation of a high quality very low bit rate coding systems is proposed. A coding system using a simple FSVQ system where the current state is determined by the previous channel symbol only is developed. To achieve a higher degree of compression, a tree-like FSVQ system is implemented. The groupings are done in this tree-like structure from the lower subbands to the higher subbands in order to exploit the nature of subband analysis in terms of the parent-child relationship. Class A and Class B video sequences from the MPEG-IV testing evaluations are used in the evaluation of this coding method.
Organization and annotation of the Xcat critical region: elimination of seven positional candidate genes.

PubMed

Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight

2004-05-01

Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene.
Prediction of plant lncRNA by ensemble machine learning classifiers.

PubMed

Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

2018-05-02

In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.

Learning of spatio-temporal codes in a coupled oscillator system.

PubMed

Orosz, Gábor; Ashwin, Peter; Townley, Stuart

2009-07-01

In this paper, we consider a learning strategy that allows one to transmit information between two coupled phase oscillator systems (called teaching and learning systems) via frequency adaptation. The dynamics of these systems can be modeled with reference to a number of partially synchronized cluster states and transitions between them. Forcing the teaching system by steady but spatially nonhomogeneous inputs produces cyclic sequences of transitions between the cluster states, that is, information about inputs is encoded via a "winnerless competition" process into spatio-temporal codes. The large variety of codes can be learned by the learning system that adapts its frequencies to those of the teaching system. We visualize the dynamics using "weighted order parameters (WOPs)" that are analogous to "local field potentials" in neural systems. Since spatio-temporal coding is a mechanism that appears in olfactory systems, the developed learning rules may help to extract information from these neural ensembles.
Mining co-occurrence and sequence patterns from cancer diagnoses in New York State.

PubMed

Wang, Yu; Hou, Wei; Wang, Fusheng

2018-01-01

The goal of this study is to discover disease co-occurrence and sequence patterns from large scale cancer diagnosis histories in New York State. In particular, we want to identify disparities among different patient groups. Our study will provide essential knowledge for clinical researchers to further investigate comorbidities and disease progression for improving the management of multiple diseases. We used inpatient discharge and outpatient visit records from the New York State Statewide Planning and Research Cooperative System (SPARCS) from 2011-2015. We grouped each patient's visit history to generate diagnosis sequences for seven most popular cancer types. We performed frequent disease co-occurrence mining using the Apriori algorithm, and frequent disease sequence patterns discovery using the cSPADE algorithm. Different types of cancer demonstrated distinct patterns. Disparities of both disease co-occurrence and sequence patterns were observed from patients within different age groups. There were also considerable disparities in disease co-occurrence patterns with respect to different claim types (i.e., inpatient, outpatient, emergency department and ambulatory surgery). Disparities regarding genders were mostly found where the cancer types were gender specific. Supports of most patterns were usually higher for males than for females. Compared with secondary diagnosis codes, primary diagnosis codes can convey more stable results. Two disease sequences consisting of the same diagnoses but in different orders were usually with different supports. Our results suggest that the methods adopted can generate potentially interesting and clinically meaningful disease co-occurrence and sequence patterns, and identify disparities among various patient groups. These patterns could imply comorbidities and disease progressions.
RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

PubMed Central

Wright, Imogen A.; Travers, Simon A.

2014-01-01

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Speech processing using conditional observable maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, John; Nix, David

A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence ofmore » speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.« less
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
GeneMachine: gene prediction and sequence annotation.

PubMed

Makalowska, I; Ryan, J F; Baxevanis, A D

2001-09-01

A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.
ANN modeling of DNA sequences: new strategies using DNA shape code.

PubMed

Parbhane, R V; Tambe, S S; Kulkarni, B D

2000-09-01

Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome

PubMed Central

Ferlaino, Michael; Rogers, Mark F.; Shihab, Hashem A.; Mort, Matthew; Cooper, David N.; Gaunt, Tom R.; Campbell, Colin

2018-01-01

Background Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. Results We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. Conclusions FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome. PMID:28985712
Non-coding RNAs in lung cancer

PubMed Central

Ricciuti, Biagio; Mecca, Carmen; Crinò, Lucio; Baglivo, Sara; Cenci, Matteo; Metro, Giulio

2014-01-01

The discovery that protein-coding genes represent less than 2% of all human genome, and the evidence that more than 90% of it is actively transcribed, changed the classical point of view of the central dogma of molecular biology, which was always based on the assumption that RNA functions mainly as an intermediate bridge between DNA sequences and protein synthesis machinery. Accumulating data indicates that non-coding RNAs are involved in different physiological processes, providing for the maintenance of cellular homeostasis. They are important regulators of gene expression, cellular differentiation, proliferation, migration, apoptosis, and stem cell maintenance. Alterations and disruptions of their expression or activity have increasingly been associated with pathological changes of cancer cells, this evidence and the prospect of using these molecules as diagnostic markers and therapeutic targets, make currently non-coding RNAs among the most relevant molecules in cancer research. In this paper we will provide an overview of non-coding RNA function and disruption in lung cancer biology, also focusing on their potential as diagnostic, prognostic and predictive biomarkers. PMID:25593996
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

PubMed

Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

2017-10-06

Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
Integrating evolutionary and regulatory information with a multispecies approach implicates genes and pathways in obsessive-compulsive disorder.

PubMed

Noh, Hyun Ji; Tang, Ruqi; Flannick, Jason; O'Dushlaine, Colm; Swofford, Ross; Howrigan, Daniel; Genereux, Diane P; Johnson, Jeremy; van Grootheest, Gerard; Grünblatt, Edna; Andersson, Erik; Djurfeldt, Diana R; Patel, Paresh D; Koltookian, Michele; M Hultman, Christina; Pato, Michele T; Pato, Carlos N; Rasmussen, Steven A; Jenike, Michael A; Hanna, Gregory L; Stewart, S Evelyn; Knowles, James A; Ruhrmann, Stephan; Grabe, Hans-Jörgen; Wagner, Michael; Rück, Christian; Mathews, Carol A; Walitza, Susanne; Cath, Daniëlle C; Feng, Guoping; Karlsson, Elinor K; Lindblad-Toh, Kerstin

2017-10-17

Obsessive-compulsive disorder is a severe psychiatric disorder linked to abnormalities in glutamate signaling and the cortico-striatal circuit. We sequenced coding and regulatory elements for 608 genes potentially involved in obsessive-compulsive disorder in human, dog, and mouse. Using a new method that prioritizes likely functional variants, we compared 592 cases to 560 controls and found four strongly associated genes, validated in a larger cohort. NRXN1 and HTR2A are enriched for coding variants altering postsynaptic protein-binding domains. CTTNBP2 (synapse maintenance) and REEP3 (vesicle trafficking) are enriched for regulatory variants, of which at least six (35%) alter transcription factor-DNA binding in neuroblastoma cells. NRXN1 achieves genome-wide significance (p = 6.37 × 10 -11 ) when we include 33,370 population-matched controls. Our findings suggest synaptic adhesion as a key component in compulsive behaviors, and show that targeted sequencing plus functional annotation can identify potentially causative variants, even when genomic data are limited.Obsessive-compulsive disorder (OCD) is a neuropsychiatric disorder with symptoms including intrusive thoughts and time-consuming repetitive behaviors. Here Noh and colleagues identify genes enriched for functional variants associated with increased risk of OCD.
Selfish DNA in protein-coding genes of Rickettsia.

PubMed

Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M

2000-10-13

Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

PubMed Central

Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

2008-01-01

Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues. PMID:18973670
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples.

PubMed

Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

2008-10-30

Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12-17 bp), C. elegans (11-17 bp), A. thaliana (11-17 bp), S. cerevisiae (10-16 bp) and E. coli (9-15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.
Glycans: bioactive signals decoded by lectins.

PubMed

Gabius, Hans-Joachim

2008-12-01

The glycan part of cellular glycoconjugates affords a versatile means to build biochemical signals. These oligosaccharides have an exceptional talent in this respect. They surpass any other class of biomolecule in coding capacity within an oligomer (code word). Four structural factors account for this property: the potential for variability of linkage points, anomeric position and ring size as well as the aptitude for branching (first and second dimensions of the sugar code). Specific intermolecular recognition is favoured by abundant potential for hydrogen/co-ordination bonds and for C-H/pi-interactions. Fittingly, an array of protein folds has developed in evolution with the ability to select certain glycans from the natural diversity. The thermodynamics of this reaction profits from the occurrence of these ligands in only a few energetically favoured conformers, comparing favourably with highly flexible peptides (third dimension of the sugar code). Sequence, shape and local aspects of glycan presentation (e.g. multivalency) are key factors to regulate the avidity of lectin binding. At the level of cells, distinct glycan determinants, a result of enzymatic synthesis and dynamic remodelling, are being defined as biomarkers. Their presence gains a functional perspective by co-regulation of the cognate lectin as effector, for example in growth regulation. The way to tie sugar signal and lectin together is illustrated herein for two tumour model systems. In this sense, orchestration of glycan and lectin expression is an efficient means, with far-reaching relevance, to exploit the coding potential of oligosaccharides physiologically and medically.
Genomics dataset of unidentified disclosed isolates.

PubMed

Rekadwad, Bhagwan N

2016-09-01

Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

PubMed

Hazes, Bart

2014-02-28

Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Influence of genome-scale RNA structure disruption on the replication of murine norovirus—similar replication kinetics in cell culture but attenuation of viral fitness in vivo

PubMed Central

McFadden, Nora; Arias, Armando; Dry, Inga; Bailey, Dalan; Witteveldt, Jeroen; Evans, David J.; Goodfellow, Ian; Simmonds, Peter

2013-01-01

Mechanisms by which certain RNA viruses, such as hepatitis C virus, establish persistent infections and cause chronic disease are of fundamental importance in viral pathogenesis. Mammalian positive-stranded RNA viruses establishing persistence typically possess genome-scale ordered RNA secondary structure (GORS) in their genomes. Murine norovirus (MNV) persists in immunocompetent mice and provides an experimental model to functionally characterize GORS. Substitution mutants were constructed with coding sequences in NS3/4- and NS6/7-coding regions replaced with sequences with identical coding and (di-)nucleotide composition but disrupted RNA secondary structure (F1, F2, F1/F2 mutants). Mutants replicated with similar kinetics to wild-type (WT) MNV3 in RAW264.7 cells and primary macrophages, exhibited similar (highly restricted) induction and susceptibility to interferon-coupled cellular responses and equal replication fitness by serial passaging of co-cultures. In vivo, both WT and F1/F2 mutant viruses persistently infected mice, although F1, F2 and F1/F2 mutant viruses were rapidly eliminated 1–7 days post-inoculation in competition experiments with WT. F1/F2 mutants recovered from tissues at 9 months showed higher synonymous substitution rates than WT and nucleotide substitutions that potentially restored of RNA secondary structure. GORS plays no role in basic replication of MNV but potentially contributes to viral fitness and persistence in vivo. PMID:23630317
[Digital signal processing of a novel neuron discharge model stimulation strategy for cochlear implants].

PubMed

Yang, Yiwei; Xu, Yuejin; Miu, Jichang; Zhou, Linghong; Xiao, Zhongju

2012-10-01

To apply the classic leakage integrate-and-fire models, based on the mechanism of the generation of physiological auditory stimulation, in the information processing coding of cochlear implants to improve the auditory result. The results of algorithm simulation in digital signal processor (DSP) were imported into Matlab for a comparative analysis. Compared with CIS coding, the algorithm of membrane potential integrate-and-fire (MPIF) allowed more natural pulse discharge in a pseudo-random manner to better fit the physiological structures. The MPIF algorithm can effectively solve the problem of the dynamic structure of the delivered auditory information sequence issued in the auditory center and allowed integration of the stimulating pulses and time coding to ensure the coherence and relevance of the stimulating pulse time.

Spliced DNA Sequences in the Paramecium Germline: Their Properties and Evolutionary Potential

PubMed Central

Catania, Francesco; McGrath, Casey L.; Doak, Thomas G.; Lynch, Michael

2013-01-01

Despite playing a crucial role in germline-soma differentiation, the evolutionary significance of developmentally regulated genome rearrangements (DRGRs) has received scant attention. An example of DRGR is DNA splicing, a process that removes segments of DNA interrupting genic and/or intergenic sequences. Perhaps, best known for shaping immune-system genes in vertebrates, DNA splicing plays a central role in the life of ciliated protozoa, where thousands of germline DNA segments are eliminated after sexual reproduction to regenerate a functional somatic genome. Here, we identify and chronicle the properties of 5,286 sequences that putatively undergo DNA splicing (i.e., internal eliminated sequences [IESs]) across the genomes of three closely related species of the ciliate Paramecium (P. tetraurelia, P. biaurelia, and P. sexaurelia). The study reveals that these putative IESs share several physical characteristics. Although our results are consistent with excision events being largely conserved between species, episodes of differential IES retention/excision occur, may have a recent origin, and frequently involve coding regions. Our findings indicate interconversion between somatic—often coding—DNA sequences and noncoding IESs, and provide insights into the role of DNA splicing in creating potentially functional genetic innovation. PMID:23737328
Variability and transmission by Aphis glycines of North American and Asian Soybean mosaic virus isolates.

PubMed

Domier, L L; Latorre, I J; Steinlage, T A; McCoppin, N; Hartman, G L

2003-10-01

The variability of North American and Asian strains and isolates of Soybean mosaic virus was investigated. First, polymerase chain reaction (PCR) products representing the coat protein (CP)-coding regions of 38 SMVs were analyzed for restriction fragment length polymorphisms (RFLP). Second, the nucleotide and predicted amino acid sequence variability of the P1-coding region of 18 SMVs and the helper component/protease (HC/Pro) and CP-coding regions of 25 SMVs were assessed. The CP nucleotide and predicted amino acid sequences were the most similar and predicted phylogenetic relationships similar to those obtained from RFLP analysis. Neither RFLP nor sequence analyses of the CP-coding regions grouped the SMVs by geographical origin. The P1 and HC/Pro sequences were more variable and separated the North American and Asian SMV isolates into two groups similar to previously reported differences in pathogenic diversity of the two sets of SMV isolates. The P1 region was the most informative of the three regions analyzed. To assess the biological relevance of the sequence differences in the HC/Pro and CP coding regions, the transmissibility of 14 SMV isolates by Aphis glycines was tested. All field isolates of SMV were transmitted efficiently by A. glycines, but the laboratory isolates analyzed were transmitted poorly. The amino acid sequences from most, but not all, of the poorly transmitted isolates contained mutations in the aphid transmission-associated DAG and/or KLSC amino acid sequence motifs of CP and HC/Pro, respectively.
Closed Genome Sequence of Chryseobacterium piperi Strain CTMT/ATCC BAA-1782, a Gram-Negative Bacterium with Clostridial Neurotoxin-Like Coding Sequences

PubMed Central

Wentz, Travis G.; Muruvanda, Tim; Thirunavukkarasu, Nagarajan; Hoffmann, Maria; Allard, Marc W.; Hodge, David R.; Pillai, Segaran P.; Hammack, Thomas S.; Brown, Eric W.

2017-01-01

ABSTRACT Clostridial neurotoxins, including botulinum and tetanus neurotoxins, are among the deadliest known bacterial toxins. Until recently, the horizontal mobility of this toxin gene family appeared to be limited to the genus Clostridium. We report here the closed genome sequence of Chryseobacterium piperi, a Gram-negative bacterium containing coding sequences with homology to clostridial neurotoxin family proteins. PMID:29192076
Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

PubMed

Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

2014-11-20

Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Diversity and Divergence of Dinoflagellate Histone Proteins

PubMed Central

Marinov, Georgi K.; Lynch, Michael

2015-01-01

Histone proteins and the nucleosomal organization of chromatin are near-universal eukaroytic features, with the exception of dinoflagellates. Previous studies have suggested that histones do not play a major role in the packaging of dinoflagellate genomes, although several genomic and transcriptomic surveys have detected a full set of core histone genes. Here, transcriptomic and genomic sequence data from multiple dinoflagellate lineages are analyzed, and the diversity of histone proteins and their variants characterized, with particular focus on their potential post-translational modifications and the conservation of the histone code. In addition, the set of putative epigenetic mark readers and writers, chromatin remodelers and histone chaperones are examined. Dinoflagellates clearly express the most derived set of histones among all autonomous eukaryote nuclei, consistent with a combination of relaxation of sequence constraints imposed by the histone code and the presence of numerous specialized histone variants. The histone code itself appears to have diverged significantly in some of its components, yet others are conserved, implying conservation of the associated biochemical processes. Specifically, and with major implications for the function of histones in dinoflagellates, the results presented here strongly suggest that transcription through nucleosomal arrays happens in dinoflagellates. Finally, the plausible roles of histones in dinoflagellate nuclei are discussed. PMID:26646152
Draft genome sequence of the extremely halophilic archaeon Haladaptatus cibarius type strain D43(T) isolated from fermented seafood.

PubMed

Lee, Hae-Won; Kim, Dae-Won; Lee, Mi-Hwa; Kim, Byung-Yong; Cho, Yong-Joon; Yim, Kyung June; Song, Hye Seon; Rhee, Jin-Kyu; Seo, Myung-Ji; Choi, Hak-Jong; Choi, Jong-Soon; Lee, Dong-Gi; Yoon, Changmann; Nam, Young-Do; Roh, Seong Woon

2015-01-01

An extremely halophilic archaeon, Haladaptatus cibarius D43(T), was isolated from traditional Korean salt-rich fermented seafood. Strain D43(T) shows the highest 16S rRNA gene sequence similarity (98.7 %) with Haladaptatus litoreus RO1-28(T), is Gram-negative staining, motile, and extremely halophilic. Despite potential industrial applications of extremely halophilic archaea, their genome characteristics remain obscure. Here, we describe the whole genome sequence and annotated features of strain D43(T). The 3,926,724 bp genome includes 4,092 protein-coding and 57 RNA genes (including 6 rRNA and 49 tRNA genes) with an average G + C content of 57.76 %.
Complete genome sequence of Sulfurimonas autotrophica type strain (OK10T)

PubMed Central

Sikorski, Johannes; Munk, Christine; Lapidus, Alla; Ngatchou Djao, Olivier Duplex; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Han, Cliff; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Sims, David; Meincke, Linda; Brettin, Thomas; Detter, John C.; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

2010-01-01

Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15th genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304749
Motion Detection in Ultrasound Image-Sequences Using Tensor Voting

NASA Astrophysics Data System (ADS)

Inba, Masafumi; Yanagida, Hirotaka; Tamura, Yasutaka

2008-05-01

Motion detection in ultrasound image sequences using tensor voting is described. We have been developing an ultrasound imaging system adopting a combination of coded excitation and synthetic aperture focusing techniques. In our method, frame rate of the system at distance of 150 mm reaches 5000 frame/s. Sparse array and short duration coded ultrasound signals are used for high-speed data acquisition. However, many artifacts appear in the reconstructed image sequences because of the incompleteness of the transmitted code. To reduce the artifacts, we have examined the application of tensor voting to the imaging method which adopts both coded excitation and synthetic aperture techniques. In this study, the basis of applying tensor voting and the motion detection method to ultrasound images is derived. It was confirmed that velocity detection and feature enhancement are possible using tensor voting in the time and space of simulated ultrasound three-dimensional image sequences.
Circular RNA expression in basal cell carcinoma.

PubMed

Sand, Michael; Bechara, Falk G; Sand, Daniel; Gambichler, Thilo; Hahn, Stephan A; Bromba, Michael; Stockfleth, Eggert; Hessam, Schapoor

2016-05-01

Circular RNAs (circRNAs), are nonprotein coding RNAs consisting of a circular loop with multiple miRNA, binding sites called miRNA response elements (MREs), functioning as miRNA sponges. This study was performed to identify differentially expressed circRNAs and their MREs in basal cell carcinoma (BCC). Microarray circRNA expression profiles were acquired from BCC and control followed by qRT-PCR validation. Bioinformatical target prediction revealed multiple MREs. Sequence analysis was performed concerning MRE interaction potential with the BCC miRNome. We identified 23 upregulated and 48 downregulated circRNAs with 354 miRNA response elements capable of sequestering miRNA target sequences of the BCC miRNome. The present study describes a variety of circRNAs that are potentially involved in the molecular pathogenesis of BCC.
Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts.

PubMed

De Maayer, Pieter; Chan, Wai Yin; Rubagotti, Enrico; Venter, Stephanus N; Toth, Ian K; Birch, Paul R J; Coutinho, Teresa A

2014-05-27

Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicated in diseases of a broad range of plant hosts and humans. By analysing the pan-genome of eight sequenced P. ananatis strains isolated from different sources we identified factors potentially underlying its ability to colonize and interact with hosts in both the plant and animal Kingdoms. The pan-genome of the eight compared P. ananatis strains consisted of a core genome comprised of 3,876 protein coding sequences (CDSs) and a sizeable accessory genome consisting of 1,690 CDSs. We estimate that ~106 unique CDSs would be added to the pan-genome with each additional P. ananatis genome sequenced in the future. The accessory fraction is derived mainly from integrated prophages and codes mostly for proteins of unknown function. Comparison of the translated CDSs on the P. ananatis pan-genome with the proteins encoded on all sequenced bacterial genomes currently available revealed that P. ananatis carries a number of CDSs with orthologs restricted to bacteria associated with distinct hosts, namely plant-, animal- and insect-associated bacteria. These CDSs encode proteins with putative roles in transport and metabolism of carbohydrate and amino acid substrates, adherence to host tissues, protection against plant and animal defense mechanisms and the biosynthesis of potential pathogenicity determinants including insecticidal peptides, phytotoxins and type VI secretion system effectors. P. ananatis has an 'open' pan-genome typical of bacterial species that colonize several different environments. The pan-genome incorporates a large number of genes encoding proteins that may enable P. ananatis to colonize, persist in and potentially cause disease symptoms in a wide range of plant and animal hosts.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.

PubMed Central

Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R

1982-01-01

The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
mRNA changes in nucleus accumbens related to methamphetamine addiction in mice

NASA Astrophysics Data System (ADS)

Zhu, Li; Li, Jiaqi; Dong, Nan; Guan, Fanglin; Liu, Yufeng; Ma, Dongliang; Goh, Eyleen L. K.; Chen, Teng

2016-11-01

Methamphetamine (METH) is a highly addictive psychostimulant that elicits aberrant changes in the expression of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in the nucleus accumbens of mice, indicating a potential role of METH in post-transcriptional regulations. To decipher the potential consequences of these post-transcriptional regulations in response to METH, we performed strand-specific RNA sequencing (ssRNA-Seq) to identify alterations in mRNA expression and their alternative splicing in the nucleus accumbens of mice following exposure to METH. METH-mediated changes in mRNAs were analyzed and correlated with previously reported changes in non-coding RNAs (miRNAs and lncRNAs) to determine the potential functions of these mRNA changes observed here and how non-coding RNAs are involved. A total of 2171 mRNAs were differentially expressed in response to METH with functions involved in synaptic plasticity, mitochondrial energy metabolism and immune response. 309 and 589 of these mRNAs are potential targets of miRNAs and lncRNAs respectively. In addition, METH treatment decreases mRNA alternative splicing, and there are 818 METH-specific events not observed in saline-treated mice. Our results suggest that METH-mediated addiction could be attributed by changes in miRNAs and lncRNAs and consequently, changes in mRNA alternative splicing and expression. In conclusion, our study reported a methamphetamine-modified nucleus accumbens transcriptome and provided non-coding RNA-mRNA interaction networks possibly involved in METH addiction.
Construction of a robust microarray from a non-model species (largemouth bass) using pyrosequencing technology

PubMed Central

Garcia-Reyero, Natàlia; Griffitt, Robert J.; Liu, Li; Kroll, Kevin J.; Farmerie, William G.; Barber, David S.; Denslow, Nancy D.

2009-01-01

A novel custom microarray for largemouth bass (Micropterus salmoides) was designed with sequences obtained from a normalized cDNA library using the 454 Life Sciences GS-20 pyrosequencer. This approach yielded in excess of 58 million bases of high-quality sequence. The sequence information was combined with 2,616 reads obtained by traditional suppressive subtractive hybridizations to derive a total of 31,391 unique sequences. Annotation and coding sequences were predicted for these transcripts where possible. 16,350 annotated transcripts were selected as target sequences for the design of the custom largemouth bass oligonucleotide microarray. The microarray was validated by examining the transcriptomic response in male largemouth bass exposed to 17β-œstradiol. Transcriptomic responses were assessed in liver and gonad, and indicated gene expression profiles typical of exposure to œstradiol. The results demonstrate the potential to rapidly create the tools necessary to assess large scale transcriptional responses in non-model species, paving the way for expanded impact of toxicogenomics in ecotoxicology. PMID:19936325
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

NASA Technical Reports Server (NTRS)

Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

PubMed

Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

1995-05-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

NASA Technical Reports Server (NTRS)

Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
A deep learning method for lincRNA detection using auto-encoder algorithm.

PubMed

Yu, Ning; Yu, Zeng; Pan, Yi

2017-12-06

RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

PubMed

Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I

2017-12-19

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

PubMed Central

Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.

2017-01-01

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133
Global characterization of copy number variants in epilepsy patients from whole genome sequencing

PubMed Central

Meloche, Caroline; Andrade, Danielle M.; Lafreniere, Ron G.; Gravel, Micheline; Spiegelman, Dan; Dionne-Laporte, Alexandre; Boelman, Cyrus; Hamdan, Fadi F.; Michaud, Jacques L.; Rouleau, Guy; Minassian, Berge A.; Bourque, Guillaume; Cossette, Patrick

2018-01-01

Epilepsy will affect nearly 3% of people at some point during their lifetime. Previous copy number variants (CNVs) studies of epilepsy have used array-based technology and were restricted to the detection of large or exonic events. In contrast, whole-genome sequencing (WGS) has the potential to more comprehensively profile CNVs but existing analytic methods suffer from limited accuracy. We show that this is in part due to the non-uniformity of read coverage, even after intra-sample normalization. To improve on this, we developed PopSV, an algorithm that uses multiple samples to control for technical variation and enables the robust detection of CNVs. Using WGS and PopSV, we performed a comprehensive characterization of CNVs in 198 individuals affected with epilepsy and 301 controls. For both large and small variants, we found an enrichment of rare exonic events in epilepsy patients, especially in genes with predicted loss-of-function intolerance. Notably, this genome-wide survey also revealed an enrichment of rare non-coding CNVs near previously known epilepsy genes. This enrichment was strongest for non-coding CNVs located within 100 Kbp of an epilepsy gene and in regions associated with changes in the gene expression, such as expression QTLs or DNase I hypersensitive sites. Finally, we report on 21 potentially damaging events that could be associated with known or new candidate epilepsy genes. Our results suggest that comprehensive sequence-based profiling of CNVs could help explain a larger fraction of epilepsy cases. PMID:29649218

Not All Order Memory Is Equal: Test Demands Reveal Dissociations in Memory for Sequence Information

ERIC Educational Resources Information Center

Jonker, Tanya R.; MacLeod, Colin M.

2017-01-01

Remembering the order of a sequence of events is a fundamental feature of episodic memory. Indeed, a number of formal models represent temporal context as part of the memory system, and memory for order has been researched extensively. Yet, the nature of the code(s) underlying sequence memory is still relatively unknown. Across 4 experiments that…
Complete mitochondrial genome sequence of the heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus).

PubMed

Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai

2016-05-01

In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Code-Switching to Know a TL Equivalent of an L1 Word: Request-Provision-Acknowledgement (RPA) Sequence

ERIC Educational Resources Information Center

Lucero, Edgar

2011-01-01

This article focuses on the learner's use of Code-switching to learn the TL (Target Language) equivalent of an L1 word. The interactional pattern that this situation creates defines the Request-Provision-Acknowledgement (RPA) sequence. The article explains each of the turns of the sequence under the combination of the Ethnomethodological…
Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method

NASA Astrophysics Data System (ADS)

Yuan, Zhe; Zhang, Yiming; Zheng, Qijia

2018-02-01

An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.
RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

PubMed

Wright, Imogen A; Travers, Simon A

2014-07-01

The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomics dataset on unclassified published organism (patent US 7547531).

PubMed

Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier

2016-12-01

Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
A Deeper Examination of Thorellius atrox Scorpion Venom Components with Omic Techonologies

PubMed Central

Romero-Gutierrez, Teresa; Batista, Cesar V. F.

2017-01-01

This communication reports a further examination of venom gland transcripts and venom composition of the Mexican scorpion Thorellius atrox using RNA-seq and tandem mass spectrometry. The RNA-seq, which was performed with the Illumina protocol, yielded more than 20,000 assembled transcripts. Following a database search and annotation strategy, 160 transcripts were identified, potentially coding for venom components. A novel sequence was identified that potentially codes for a peptide with similarity to spider ω-agatoxins, which act on voltage-gated calcium channels, not known before to exist in scorpion venoms. Analogous transcripts were found in other scorpion species. They could represent members of a new scorpion toxin family, here named omegascorpins. The mass fingerprint by LC-MS identified 135 individual venom components, five of which matched with the theoretical masses of putative peptides translated from the transcriptome. The LC-MS/MS de novo sequencing allowed to reconstruct and identify 42 proteins encoded by assembled transcripts, thus validating the transcriptome analysis. Earlier studies conducted with this scorpion venom permitted the identification of only twenty putative venom components. The present work performed with more powerful and modern omic technologies demonstrates the capacity of accomplishing a deeper characterization of scorpion venom components and the identification of novel molecules with potential applications in biomedicine and the study of ion channel physiology. PMID:29231872
Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

PubMed

González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

2018-05-21

To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
Regulatory versus coding signatures of natural selection in a candidate gene involved in the adaptive divergence of whitefish species pairs (Coregonus spp.)

PubMed Central

Jeukens, Julie; Bernatchez, Louis

2012-01-01

While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species. PMID:22408741
Regulatory versus coding signatures of natural selection in a candidate gene involved in the adaptive divergence of whitefish species pairs (Coregonus spp.).

PubMed

Jeukens, Julie; Bernatchez, Louis

2012-01-01

While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species.
High compression image and image sequence coding

NASA Technical Reports Server (NTRS)

Kunt, Murat

1989-01-01

The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.
Translation efficiency of heterologous proteins is significantly affected by the genetic context of RBS sequences in engineered cyanobacterium Synechocystis sp. PCC 6803.

PubMed

Thiel, Kati; Mulaku, Edita; Dandapani, Hariharan; Nagy, Csaba; Aro, Eva-Mari; Kallio, Pauli

2018-03-02

Photosynthetic cyanobacteria have been studied as potential host organisms for direct solar-driven production of different carbon-based chemicals from CO 2 and water, as part of the development of sustainable future biotechnological applications. The engineering approaches, however, are still limited by the lack of comprehensive information on most optimal expression strategies and validated species-specific genetic elements which are essential for increasing the intricacy, predictability and efficiency of the systems. This study focused on the systematic evaluation of the key translational control elements, ribosome binding sites (RBS), in the cyanobacterial host Synechocystis sp. PCC 6803, with the objective of expanding the palette of tools for more rigorous engineering approaches. An expression system was established for the comparison of 13 selected RBS sequences in Synechocystis, using several alternative reporter proteins (sYFP2, codon-optimized GFPmut3 and ethylene forming enzyme) as quantitative indicators of the relative translation efficiencies. The set-up was shown to yield highly reproducible expression patterns in independent analytical series with low variation between biological replicates, thus allowing statistical comparison of the activities of the different RBSs in vivo. While the RBSs covered a relatively broad overall expression level range, the downstream gene sequence was demonstrated in a rigorous manner to have a clear impact on the resulting translational profiles. This was expected to reflect interfering sequence-specific mRNA-level interaction between the RBS and the coding region, yet correlation between potential secondary structure formation and observed translation levels could not be resolved with existing in silico prediction tools. The study expands our current understanding on the potential and limitations associated with the regulation of protein expression at translational level in engineered cyanobacteria. The acquired information can be used for selecting appropriate RBSs for optimizing over-expression constructs or multicistronic pathways in Synechocystis, while underlining the complications in predicting the activity due to gene-specific interactions which may reduce the translational efficiency for a given RBS-gene combination. Ultimately, the findings emphasize the need for additional characterized insulator sequence elements to decouple the interaction between the RBS and the coding region for future engineering approaches.
BeerDeCoded: the open beer metagenome project.

PubMed

Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo

2017-01-01

Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer.
BeerDeCoded: the open beer metagenome project

PubMed Central

Sobel, Jonathan; Henry, Luc; Rotman, Nicolas; Rando, Gianpaolo

2017-01-01

Next generation sequencing has radically changed research in the life sciences, in both academic and corporate laboratories. The potential impact is tremendous, yet a majority of citizens have little or no understanding of the technological and ethical aspects of this widespread adoption. We designed BeerDeCoded as a pretext to discuss the societal issues related to genomic and metagenomic data with fellow citizens, while advancing scientific knowledge of the most popular beverage of all. In the spirit of citizen science, sample collection and DNA extraction were carried out with the participation of non-scientists in the community laboratory of Hackuarium, a not-for-profit organisation that supports unconventional research and promotes the public understanding of science. The dataset presented herein contains the targeted metagenomic profile of 39 bottled beers from 5 countries, based on internal transcribed spacer (ITS) sequencing of fungal species. A preliminary analysis reveals the presence of a large diversity of wild yeast species in commercial brews. With this project, we demonstrate that coupling simple laboratory procedures that can be carried out in a non-professional environment with state-of-the-art sequencing technologies and targeted metagenomic analyses, can lead to the detection and identification of the microbial content in bottled beer. PMID:29123645
Novel genes and mutations in patients affected by recurrent pregnancy loss.

PubMed

Quintero-Ronderos, Paula; Mercier, Eric; Fukuda, Michiko; González, Ronald; Suárez, Carlos Fernando; Patarroyo, Manuel Alfonso; Vaiman, Daniel; Gris, Jean-Christophe; Laissue, Paul

2017-01-01

Recurrent pregnancy loss is a frequently occurring human infertility-related disease affecting ~1% of women. It has been estimated that the cause remains unexplained in >50% cases which strongly suggests that genetic factors may contribute towards the phenotype. Concerning its molecular aetiology numerous studies have had limited success in identifying the disease's genetic causes. This might have been due to the fact that hundreds of genes are involved in each physiological step necessary for guaranteeing reproductive success in mammals. In such scenario, next generation sequencing provides a potentially interesting tool for research into recurrent pregnancy loss causative mutations. The present study involved whole-exome sequencing and an innovative bioinformatics analysis, for the first time, in 49 unrelated women affected by recurrent pregnancy loss. We identified 27 coding variants (22 genes) potentially related to the phenotype (41% of patients). The affected genes, which were enriched by potentially deleterious sequence variants, belonged to distinct molecular cascades playing key roles in implantation/pregnancy biology. Using a quantum chemical approach method we established that mutations in MMP-10 and FGA proteins led to substantial energetic modifications suggesting an impact on their functions and/or stability. The next generation sequencing and bioinformatics approaches presented here represent an efficient way to find mutations, having potentially moderate/strong functional effects, associated with recurrent pregnancy loss aetiology. We consider that some of these variants (and genes) represent probable future biomarkers for recurrent pregnancy loss.
Detection of Streptococcus pyogenes virulence genes in Streptococcus dysgalactiae subsp. equisimilis from Vellore, India.

PubMed

Babbar, Anshu; Itzek, Andreas; Pieper, Dietmar H; Nitsche-Schmitz, D Patric

2018-03-12

Streptococcus dysgalactiae subsp. equisimilis (SDSE), belonging to the group C and G streptococci, are human pathogens reported to cause clinical manifestations similar to infections caused by Streptococcus pyogenes. To scrutinize the distribution of gene coding for S. pyogenes virulence factors in SDSE, 255 isolates were collected from humans infected with SDSE in Vellore, a region in southern India, with high incidence of SDSE infections. Initial evaluation indicated SDSE isolates comprising of 82.35% group G and 17.64% group C. A multiplex PCR system was used to detect 21 gene encoding virulence-associated factors of S. pyogenes, like superantigens, DNases, proteinases, and other immune modulatory toxins. As validated by DNA sequencing of the PCR products, sequences homologous to speC, speG, speH, speI, speL, ssa and smeZ of the family of superantigen coding genes and for DNases like sdaD and sdc were detected in the SDSE collection. Furthermore, there was high abundance (48.12% in group G and 86.6% in group C SDSE) of scpA, the gene coding for C5a peptidase in these isolates. Higher abundance of S. pyogenes virulence factor genes was observed in SDSE of Lancefield group C as compared to group G, even though the incidence rates in former were lower. This study not only substantiates detection of S. pyogenes virulence factor genes in whole genome sequenced SDSE but also makes significant contribution towards the understanding of SDSE and its increasing virulence potential.
Draft Genome Sequence of Marinobacter sp. Strain ANT_B65, Isolated from Antarctic Marine Sponge.

PubMed

de França, Paula; Camilo, Esther; Fantinatti-Garboginni, Fabiana

2018-01-04

Marinobacter sp. strain ANT_B65 was isolated from sponge collected in King George Island, Antarctica. The draft genome of 4,173,840 bp encodes 3,743 protein-coding open reading frames. The genome will provide insights into the strain's potential use in the production of natural products. Copyright © 2018 de França et al.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

PubMed

Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

2016-06-15

Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

Draft genome sequence of carbapenem-resistant Shewanella algae strain AC isolated from small abalone (Haliotis diversicolor).

PubMed

Huang, Yao-Ting; Cheng, Jan-Fang; Chen, Shi-Yu; Hong, Yu-Kai; Wu, Zong-Yen; Liu, Po-Yu

2018-06-19

Shewanella algae is an environmental marine bacteria and an emerging opportunistic human pathogen. Moreover, there are increasing reports of strains showing multi-drug resistance, particularly carbapenem-resistant isolates. Although S. algae have been found in bivalve shellfish aquaculture, there is very little genome-wide data on resistant determinants in S. algae from shellfish. In the study, we aimed to determine the whole genome sequence of carbapenem-resistant S. algae strain AC isolated from small abalone in Taiwan. Genome DNA was sequenced using an Illumina MiSeq platform using 250bp paired-end reads. De novo genome assembly was performed using Velvet v1.2.07. The whole genome was annotated and several candidate genes for antimicrobial resistance were identified. The genome size was calculated at 4,751,156bp, with a mean G＋C content of 53.09%. A total of 4,164 protein-coding sequences, 7 rRNAs, 85 tRNAs, and 5 non-coding RNAs were identified. The genome contains genes associated with resistance to β-lactams, trimethoprim, tetracycline, colistin, and quinolone resistance. Multiple efflux pump genes were also detected. Small abalone is a potential source of foodborne drug resistant S. algae. The genome sequence of a carbapenem-resistant S. algae strain AC isolated from small abalone will provide valuable information for further study of the dissemination of resistance genes at the human-animal interface. Copyright © 2018. Published by Elsevier Ltd.
Efficient analysis of mouse genome sequences reveal many nonsense variants

PubMed Central

Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

2016-01-01

Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

PubMed

Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

2010-05-07

Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
VaDiR: an integrated approach to Variant Detection in RNA.

PubMed

Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

2018-02-01

Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Draft Genome Sequence of Cellulolytic and Xylanolytic Paenibacillus sp. A59, Isolated from Decaying Forest Soil from Patagonia, Argentina

PubMed Central

Ghio, Silvina; Martinez Cáceres, Alfredo I.; Talia, Paola; Grasso, Daniel H.

2015-01-01

Paenibacillus sp. A59 was isolated from decaying forest soil in Argentina and characterized as a xylanolytic strain. We report the draft genome sequence of this isolate, with an estimated genome size of 7 Mb which harbor 6,424 coding sequences. Genes coding for hydrolytic enzymes involved in lignocellulose deconstruction were predicted. PMID:26494679
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

NASA Astrophysics Data System (ADS)

Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

2010-10-01

Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
Long Non-coding RNAs and Their Biological Roles in Plants

PubMed Central

Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian

2015-01-01

With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. PMID:25936895
Anisotropic transmissive coding metamaterials based on dispersion modulation of spoof surface plasmon polaritons

NASA Astrophysics Data System (ADS)

Pang, Yongqiang; Li, Yongfeng; Zhang, Jieqiu; Chen, Hongya; Xu, Zhuo; Qu, Shaobo

2018-06-01

Anisotropic transmissive coding metamaterials (CMMs) have been designed and demonstrated in this work. High-efficiency transmission with the amplitudes close to unity is achieved by ultrathin metallic tapered blade structures, on which incident waves can be highly coupled into spoof surface plasmon polaritons (SSPPs). The transmission phase can be therefore manipulated with much freedom by designing the dispersion of the SSPPs. These tapered blade structures are designed as the anisotropic unit cells of the CMMs. Two 1-bit anisotropic CMMs with different coding sequences were first designed and simulated, and then a 2-bit anisotropic CMM was designed and measured experimentally. The measured results agree well with the simulations. It is expected that this work provides an alternative method for designing the transmissive CMMs, and may find potential applications in the beam forming technique.
Outbreak of poliomyelitis in Finland in 1984-85 - Re-analysis of viral sequences using the current standard approach.

PubMed

Simonen, Marja-Leena; Roivainen, Merja; Iber, Jane; Burns, Cara; Hovi, Tapani

2010-01-01

In 1984, a wild type 3 poliovirus (PV3/FIN84) spread all over Finland causing nine cases of paralytic poliomyelitis and one case of aseptic meningitis. The outbreak was ended in 1985 with an intensive vaccination campaign. By limited sequence comparison with previously isolated PV3 strains, closest relatives of PV3/FIN84 were found among strains circulating in the Mediterranean region. Now we wanted to reanalyse the relationships using approaches currently exploited in poliovirus surveillance. Cell lysates of 22 strains isolated during the outbreak and stored frozen were subjected to RT-PCR amplification in three genomic regions without prior subculture. Sequences of the entire VP1 coding region, 150 nucleotides in the VP1-2A junction, most of the 5' non-coding region, partial sequences of the 3D RNA polymerase coding region and partial 3' non-coding region were compared within the outbreak and with sequences available in data banks. In addition, complete nucleotide sequences were obtained for 2 strains isolated from two different cases of disease during the outbreak. The results confirmed the previously described wide intraepidemic variation of the strains, including amino acid substitutions in antigenic sites, as well as the likely Mediterranean region origin of the strains. Simplot and bootscanning analyses of the complete genomes indicated complicated evolutionary history of the non-capsid coding regions of the genome suggesting several recombinations with different HEV-C viruses in the past.
Characterization of the Fb-Nof Transposable Element of Drosophila Melanogaster

PubMed Central

Harden, N.; Ashburner, M.

1990-01-01

FB-NOF is a composite transposable element of Drosophila melanogaster. It is composed of foldback sequences, of variable length, which flank a 4-kb NOF sequence with 308-bp inverted repeat termini. The NOF sequence could potentially code for a 120-kD polypeptide. The FB-NOF element is responsible for unstable mutations of the white gene (w(c) and w(DZL)) and is associated with the large TEs of G. Ising. Although most strains of D. melanogaster have 20-30 sites of FB insertion, FB-NOF elements are usually rare, many strains lack this composite element or have only one copy of it. A few strains, including w(DZL) and Basc have many (8-21) copies of FB-NOF, and these show a tendency to insert at ``hot-spots.'' These strains also have an increased number of FB elements. The DNA sequence of the NOF region associated with TE146(Z) has been determined. PMID:2174013
Forty-four novel protein-coding loci discovered using a proteomics informed by transcriptomics (PIT) approach in rat male germ cells.

PubMed

Chocu, Sophie; Evrard, Bertrand; Lavigne, Régis; Rolland, Antoine D; Aubry, Florence; Jégou, Bernard; Chalmel, Frédéric; Pineau, Charles

2014-11-01

Spermatogenesis is a complex process, dependent upon the successive activation and/or repression of thousands of gene products, and ends with the production of haploid male gametes. RNA sequencing of male germ cells in the rat identified thousands of novel testicular unannotated transcripts (TUTs). Although such RNAs are usually annotated as long noncoding RNAs (lncRNAs), it is possible that some of these TUTs code for protein. To test this possibility, we used a "proteomics informed by transcriptomics" (PIT) strategy combining RNA sequencing data with shotgun proteomics analyses of spermatocytes and spermatids in the rat. Among 3559 TUTs and 506 lncRNAs found in meiotic and postmeiotic germ cells, 44 encoded at least one peptide. We showed that these novel high-confidence protein-coding loci exhibit several genomic features intermediate between those of lncRNAs and mRNAs. We experimentally validated the testicular expression pattern of two of these novel protein-coding gene candidates, both highly conserved in mammals: one for a vesicle-associated membrane protein we named VAMP-9, and the other for an enolase domain-containing protein. This study confirms the potential of PIT approaches for the discovery of protein-coding transcripts initially thought to be untranslated or unknown transcripts. Our results contribute to the understanding of spermatogenesis by characterizing two novel proteins, implicated by their strong expression in germ cells. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD000872. © 2014 by the Society for the Study of Reproduction, Inc.
Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

PubMed

Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

2015-01-15

The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. Copyright © 2014 Elsevier Inc. All rights reserved.
Genetic code, hamming distance and stochastic matrices.

PubMed

He, Matthew X; Petoukhov, Sergei V; Ricci, Paolo E

2004-09-01

In this paper we use the Gray code representation of the genetic code C=00, U=10, G=11 and A=01 (C pairs with G, A pairs with U) to generate a sequence of genetic code-based matrices. In connection with these code-based matrices, we use the Hamming distance to generate a sequence of numerical matrices. We then further investigate the properties of the numerical matrices and show that they are doubly stochastic and symmetric. We determine the frequency distributions of the Hamming distances, building blocks of the matrices, decomposition and iterations of matrices. We present an explicit decomposition formula for the genetic code-based matrix in terms of permutation matrices, which provides a hypercube representation of the genetic code. It is also observed that there is a Hamiltonian cycle in a genetic code-based hypercube.
Mitochondrial genomes of the jungle crow Corvus macrorhynchos (Passeriformes: Corvidae) from shed feathers and a phylogenetic analysis of genus Corvus using mitochondrial protein-coding genes.

PubMed

Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M

2016-07-01

The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Integrated DNA/RNA targeted genomic profiling of diffuse large B-cell lymphoma using a clinical assay.

PubMed

Intlekofer, Andrew M; Joffe, Erel; Batlevi, Connie L; Hilden, Patrick; He, Jie; Seshan, Venkatraman E; Zelenetz, Andrew D; Palomba, M Lia; Moskowitz, Craig H; Portlock, Carol; Straus, David J; Noy, Ariela; Horwitz, Steven M; Gerecitano, John F; Moskowitz, Alison; Hamlin, Paul; Matasar, Matthew J; Kumar, Anita; van den Brink, Marcel R; Knapp, Kristina M; Pichardo, Janine D; Nahas, Michelle K; Trabucco, Sally E; Mughal, Tariq; Copeland, Amanda R; Papaemmanuil, Elli; Moarii, Mathai; Levine, Ross L; Dogan, Ahmet; Miller, Vincent A; Younes, Anas

2018-06-12

We sought to define the genomic landscape of diffuse large B-cell lymphoma (DLBCL) by using formalin-fixed paraffin-embedded (FFPE) biopsy specimens. We used targeted sequencing of genes altered in hematologic malignancies, including DNA coding sequence for 405 genes, noncoding sequence for 31 genes, and RNA coding sequence for 265 genes (FoundationOne-Heme). Short variants, rearrangements, and copy number alterations were determined. We studied 198 samples (114 de novo, 58 previously treated, and 26 large-cell transformation from follicular lymphoma). Median number of GAs per case was 6, with 97% of patients harboring at least one alteration. Recurrent GAs were detected in genes with established roles in DLBCL pathogenesis (e.g. MYD88, CREBBP, CD79B, EZH2), as well as notable differences compared to prior studies such as inactivating mutations in TET2 (5%). Less common GAs identified potential targets for approved or investigational therapies, including BRAF, CD274 (PD-L1), IDH2, and JAK1/2. TP53 mutations were more frequently observed in relapsed/refractory DLBCL, and predicted for lack of response to first-line chemotherapy, identifying a subset of patients that could be prioritized for novel therapies. Overall, 90% (n = 169) of the patients harbored a GA which could be explored for therapeutic intervention, with 54% (n = 107) harboring more than one putative target.
GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii).

PubMed

Xu, Zhenzhen; Liu, Jing; Ni, Wanchao; Peng, Zhen; Guo, Yue; Ye, Wuwei; Huang, Fang; Zhang, Xianggui; Xu, Peng; Guo, Qi; Shen, Xinlian; Du, Jianchang

2017-01-01

Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1 , 12 Mutators , 435 PIF-Harbingers , 275 CACTAs and 14 Helitrons . Meanwhile, the web-based sequence browsing, searching, downloading and blast tool were implemented to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. GrTEdb provides resources and information related with TEs in G. raimondii , and will facilitate gene and genome analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes in Gossypium species. http://www.grtedb.org/. © The Author(s) 2017. Published by Oxford University Press.
Complete genome sequencing of the luminescent bacterium, Vibrio qinghaiensis sp. Q67 using PacBio technology

NASA Astrophysics Data System (ADS)

Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming

2018-01-01

Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.
Weight distributions for turbo codes using random and nonrandom permutations

NASA Technical Reports Server (NTRS)

Dolinar, S.; Divsalar, D.

1995-01-01

This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.
Complete plastid genome sequence of Daucus carota: Implications for biotechnology and phylogeny of angiosperms

PubMed Central

Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry

2006-01-01

Background Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements. PMID:16945140
Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

PubMed

Webb, Kristen M; Rosenthal, Benjamin M

2011-01-01

The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.

Cloning and sequence determination of the gene coding for the pyruvate phosphate dikinase of Entamoeba histolytica.

PubMed

Saavedra-Lira, E; Pérez-Montfort, R

1994-05-16

We isolated three overlapping clones from a DNA genomic library of Entamoeba histolytica strain HM1:IMSS, whose translated nucleotide (nt) sequence shows similarities of 51, 48 and 47% with the amino acid (aa) sequences reported for the pyruvate phosphate dikinases from Bacteroides symbiosus, maize and Flaveria trinervia, respectively. The reading frame determined codes for a protein of 886 aa.
Draft Genome Sequence of Cellulolytic and Xylanolytic Paenibacillus sp. A59, Isolated from Decaying Forest Soil from Patagonia, Argentina.

PubMed

Ghio, Silvina; Martinez Cáceres, Alfredo I; Talia, Paola; Grasso, Daniel H; Campos, Eleonora

2015-10-22

Paenibacillus sp. A59 was isolated from decaying forest soil in Argentina and characterized as a xylanolytic strain. We report the draft genome sequence of this isolate, with an estimated genome size of 7 Mb which harbor 6,424 coding sequences. Genes coding for hydrolytic enzymes involved in lignocellulose deconstruction were predicted. Copyright © 2015 Ghio et al.
Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

PubMed Central

Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

2015-01-01

There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
The mitochondrial genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae.

PubMed

Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H

2017-05-12

Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.
Metal resistance sequences and transgenic plants

DOEpatents

Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.

1999-10-12

The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.
Complete mitochondrial genome of the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae).

PubMed

Kim, Min Jee; Im, Hyun Hwak; Lee, Kwang Youll; Han, Yeon Soo; Kim, Iksoo

2014-06-01

Abstract The complete nucleotide sequences of the mitochondrial genome from the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae), was determined. The 20,319-bp long circular genome is the longest among completely sequenced Coleoptera. As is typical in animals, the P. brevitarsis genome consisted of two ribosomal RNAs, 22 transfer RNAs, 13 protein-coding genes and one A + T-rich region. Although the size of the coding genes was typical, the non-coding A + T-rich region was 5654 bp, which is the longest in insects. The extraordinary length of this region was composed of 28,117-bp tandem repeats and 782-bp tandem repeats. These repeat sequences were encompassed by three non-repeat sequences constituting 1804 bp.
EDGE 2017 R&D 100 Entry with Appendix

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chain, Patrick Sam Guy; Davenport, Karen Walston; Li, Po-E

Diabetes, infertility, cancer, and Alzheimer’s disease—the key to one day preventing or even curing such afflictions and diseases (both infectious and genetically driven) may be locked in our own genetic code and the code of microorganisms that inhabit our bodies. The study of this code, known as genomics, has recently become much more promising as a result of two things: (1) vast improvements in high-throughput, nextgeneration sequencing (NSG), and (2) an exponential decrease in the cost of such sequencing. For example, it originally cost approximately $3 billion to sequence the human genome; today, this genome could be resequenced for lessmore » than $1,000.« less
Draft Genome Sequence of Geobacillus sp. LEMMY01, a Thermophilic Bacterium Isolated from the Site of a Burning Grass Pile

PubMed Central

de Souza, Yuri Pinheiro Alves; da Mota, Fábio Faria

2017-01-01

ABSTRACT We report here the 3,586,065-bp draft genome of Geobacillus sp. LEMMY01, which was isolated (axenic culture) from a thermophilic chemolitoautotrophic consortium obtained from the site of a burning grass pile. The genome contains biosynthetic gene clusters coding for secondary metabolites, such as terpene and lantipeptide, confirming the biotechnological potential of this strain. PMID:28495764
Non-coding RNAs in cancer brain metastasis

PubMed Central

Wu, Kerui; Sharma, Sambad; Venkat, Suresh; Liu, Keqin; Zhou, Xiaobo; Watabe, Kounosuke

2017-01-01

More than 90% of cancer death is attributed to metastatic disease, and the brain is one of the major metastatic sites of melanoma, colon, renal, lung and breast cancers. Despite the recent advancement of targeted therapy for cancer, the incidence of brain metastasis is increasing. One reason is that most therapeutic drugs can’t penetrate blood-brain-barrier and tumor cells find the brain as sanctuary site. In this review, we describe the pathophysiology of brain metastases to introduce the latest understandings of metastatic brain malignancies. This review also particularly focuses on non-coding RNAs and their roles in cancer brain metastasis. Furthermore, we discuss the roles of the extracellular vesicles as they are known to transport information between cells to initiate cancer cell-microenvironment communication. The potential clinical translation of non-coding RNAs as a tool for diagnosis and for treatment is also discussed in this review. At the end, the computational aspects of non-coding RNA detection, the sequence and structure calculation and epigenetic regulation of non-coding RNA in brain metastasis are discussed. PMID:26709907
Scaling features of noncoding DNA

NASA Technical Reports Server (NTRS)

Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

1999-01-01

We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prochnik, Simon E.; Umen, James; Nedelcu, Aurora

2010-07-01

Analysis of the Volvox carteri genome reveals that this green alga's increased organismal complexity and multicellularity are associated with modifications in protein families shared with its unicellular ancestor, and not with large-scale innovations in protein coding capacity. The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are uniquely suited for investigating the evolution of multicellularity and development. We sequenced the 138 Mb genome of V. carteri and compared its {approx}14,500 predicted proteins to those of its unicellular relative, Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similarmore » protein-coding potentials, and few species-specific protein-coding gene predictions. Interestingly, volvocine algal-specific proteins are enriched in Volvox, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.« less
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

PubMed Central

Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

2010-01-01

Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

PubMed

Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

1988-02-01

Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.
Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

PubMed Central

Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

1988-01-01

Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578
[Learning and Repetive Reproduction of Memorized Sequences by the Right and the Left Hand].

PubMed

Bobrova, E V; Lyakhovetskii, V A; Bogacheva, I N

2015-01-01

An important stage of learning a new skill is repetitive reproduction of one and the same sequence of movements, which plays a significant role in forming of the movement stereotypes. Two groups of right-handers repeatedly memorized (6-10 repetitions) the sequences of their hand transitions by experimenter in 6 positions, firstly by the right hand (RH), and then--by the left hand (LH) or vice versa. Random sequences previously unknown to the volunteers were reproduced in the 11 series. Modified sequences were tested in the 2nd and 3rd series, where the same elements' positions were presented in different order. The processes of repetitive sequence reproduction were similar for RH and LH. However, the learning of the modified sequences differed: Information about elements' position disregarding the reproduction order was used only when LH initiated task performing. This information was not used when LH followed RH and when RH performed the task. Consequently, the type of information coding activated by LH helped learn the positions of sequence elements, while the type of information coding activated by RH prevented learning. It is supposedly connected with the predominant role of right hemisphere in the processes of positional coding and motor learning.
Sequence Polishing Library (SPL) v10.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oberortner, Ernst

The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
Multiple Access Interference Reduction Using Received Response Code Sequence for DS-CDMA UWB System

NASA Astrophysics Data System (ADS)

Toh, Keat Beng; Tachikawa, Shin'ichi

This paper proposes a combination of novel Received Response (RR) sequence at the transmitter and a Matched Filter-RAKE (MF-RAKE) combining scheme receiver system for the Direct Sequence-Code Division Multiple Access Ultra Wideband (DS-CDMA UWB) multipath channel model. This paper also demonstrates the effectiveness of the RR sequence in Multiple Access Interference (MAI) reduction for the DS-CDMA UWB system. It suggests that by using conventional binary code sequence such as the M sequence or the Gold sequence, there is a possibility of generating extra MAI in the UWB system. Therefore, it is quite difficult to collect the energy efficiently although the RAKE reception method is applied at the receiver. The main purpose of the proposed system is to overcome the performance degradation for UWB transmission due to the occurrence of MAI during multiple accessing in the DS-CDMA UWB system. The proposed system improves the system performance by improving the RAKE reception performance using the RR sequence which can reduce the MAI effect significantly. Simulation results verify that significant improvement can be obtained by the proposed system in the UWB multipath channel models.
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

PubMed Central

Martin, Andrew C. R.

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

PubMed

Martin, Andrew C R

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.
Inter-individual variation in expression: a missing link in biomarker biology?

PubMed

Little, Peter F R; Williams, Rohan B H; Wilkins, Marc R

2009-01-01

The past decade has seen an explosion of variation data demonstrating that diversity of both protein-coding sequences and of regulatory elements of protein-coding genes is common and of functional importance. In this article, we argue that genetic diversity can no longer be ignored in studies of human biology, even research projects without explicit genetic experimental design, and that this knowledge can, and must, inform research. By way of illustration, we focus on the potential role of genetic data in case-control studies to identify and validate cancer protein biomarkers. We argue that a consideration of genetics, in conjunction with proteomic biomarker discovery projects, should improve the proportion of biomarkers that can accurately classify patients.

A high-speed BCI based on code modulation VEP

NASA Astrophysics Data System (ADS)

Bin, Guangyu; Gao, Xiaorong; Wang, Yijun; Li, Yun; Hong, Bo; Gao, Shangkai

2011-04-01

Recently, electroencephalogram-based brain-computer interfaces (BCIs) have attracted much attention in the fields of neural engineering and rehabilitation due to their noninvasiveness. However, the low communication speed of current BCI systems greatly limits their practical application. In this paper, we present a high-speed BCI based on code modulation of visual evoked potentials (c-VEP). Thirty-two target stimuli were modulated by a time-shifted binary pseudorandom sequence. A multichannel identification method based on canonical correlation analysis (CCA) was used for target identification. The online system achieved an average information transfer rate (ITR) of 108 ± 12 bits min-1 on five subjects with a maximum ITR of 123 bits min-1 for a single subject.
RNAcentral: an international database of ncRNA sequences

DOE PAGES

Williams, Kelly Porter

2014-10-28

The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.
The complete validated mitochondrial genome of the silver gemfish Rexea solandri (Cuvier, 1832) (Perciformes, Gempylidae).

PubMed

Bustamante, Carlos; Ovenden, Jennifer R

2016-01-01

The silver gemfish Rexea solandri is an important economic resource but Vulnerable to overfishing in Australian waters. The complete mitochondrial genome sequence is described from 1.6 million reads obtained via next generation sequencing. The total length of the mitogenome is 16,350 bp comprising 2 rRNA, 13 protein-coding genes, 22 tRNA and 2 non-coding regions. The mitogenome sequence was validated against sequences of PCR fragments and BLAST queries of Genbank. Gene order was equivalent to that found in marine fishes.
High rate concatenated coding systems using bandwidth efficient trellis inner codes

NASA Technical Reports Server (NTRS)

Deng, Robert H.; Costello, Daniel J., Jr.

1989-01-01

High-rate concatenated coding systems with bandwidth-efficient trellis inner codes and Reed-Solomon (RS) outer codes are investigated for application in high-speed satellite communication systems. Two concatenated coding schemes are proposed. In one the inner code is decoded with soft-decision Viterbi decoding, and the outer RS code performs error-correction-only decoding (decoding without side information). In the other, the inner code is decoded with a modified Viterbi algorithm, which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, whereas branch metrics are used to provide reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. The two schemes have been proposed for high-speed data communication on NASA satellite channels. The rates considered are at least double those used in current NASA systems, and the results indicate that high system reliability can still be achieved.
Variations in endothelin receptor B subtype 2 (EDNRB2) coding sequences and mRNA expression levels in 4 Muscovy duck plumage colour phenotypes.

PubMed

Wu, N; Qin, H; Wang, M; Bian, Y; Dong, B; Sun, G; Zhao, W; Chang, G; Xu, Q; Chen, G

2017-04-01

1. Endothelin receptor B subtype 2 (EDNRB2) is a paralog of EDNRB, which encodes a 7-transmembrane G-protein coupled receptor. Previous studies reported that EDNRB was essential for melanoblast migration in mammals and ducks. 2. Muscovy ducks have different plumage colour phenotypes. Variations in EDNRB2 coding sequences (CDSs) and mRNA expression levels were investigated in 4 different Muscovy duck plumage colour phenotypes, including black, black mutant, silver and white head. 3. The EDNRB2 gene from Muscovy duck was cloned; it had a length of 6435 bp and encoded 437 amino acids. The coding region was screened and potential single nucleotide polymorphisms were identified. Eight mutations were obtained, including one missense variant (c.64C > T) and 7 synonymous substitutions. The substitutions were associated with plumage colour phenotypes. 4. The EDNRB2 mRNA expression levels were compared between feather pulp from black birds and black mutant birds. The results indicated that EDNRB2 transcripts in feather pulp were significantly higher in black feathers than in white feathers. 5. The results determined the variation of EDNRB2 CDS and mRNA expression in Muscovy ducks of various plumage colours.
The draft genome of the pest tephritid fruit fly Bactrocera tryoni: resources for the genomic analysis of hybridising species.

PubMed

Gilchrist, Anthony Stuart; Shearman, Deborah C A; Frommer, Marianne; Raphael, Kathryn A; Deshpande, Nandan P; Wilkins, Marc R; Sherwin, William B; Sved, John A

2014-12-20

The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations. We produced a draft de novo genome assembly of Australia's major tephritid pest species, Bactrocera tryoni. The male genome (650-700 Mbp) includes approximately 150 Mb of interspersed repetitive DNA sequences and 60 Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions. This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA families. These genetic resources will facilitate the further investigations of genetic mechanisms responsible for the behavioural and morphological differences between these three species and other tephritids. We have also shown how whole genome sequence data can be used to generate simple diagnostic tests between very closely-related species where only one of the species is scaffolded.
Complete genome sequence of lymphocystis disease virus isolated from China.

PubMed

Zhang, Qi-Ya; Xiao, Feng; Xie, Jian; Li, Zheng-Qiu; Gui, Jian-Fang

2004-07-01

Lymphocystis diseases in fish throughout the world have been extensively described. Here we report the complete genome sequence of lymphocystis disease virus isolated in China (LCDV-C), an LCDV isolated from cultured flounder (Paralichthys olivaceus) with lymphocystis disease in China. The LCDV-C genome is 186,250 bp, with a base composition of 27.25% G+C. Computer-assisted analysis revealed 240 potential open reading frames (ORFs) and 176 nonoverlapping putative viral genes, which encode polypeptides ranging from 40 to 1,193 amino acids. The percent coding density is 67%, and the average length of each ORF is 702 bp. A search of the GenBank database using the 176 individual putative genes revealed 103 homologues to the corresponding ORFs of LCDV-1 and 73 potential genes that were not found in LCDV-1 and other iridoviruses. Among the 73 genes, there are 8 genes that contain conserved domains of cellular genes and 65 novel genes that do not show any significant homology with the sequences in public databases. Although a certain extent of similarity between putative gene products of LCDV-C and corresponding proteins of LCDV-1 was revealed, no colinearity was detected when their ORF arrangements and coding strategies were compared to each other, suggesting that a high degree of genetic rearrangements between them has occurred. And a large number of tandem and overlapping repeated sequences were observed in the LCDV-C genome. The deduced amino acid sequence of the major capsid protein (MCP) presents the highest identity to those of LCDV-1 and other iridoviruses among the LCDV-C gene products. Furthermore, a phylogenetic tree was constructed based on the multiple alignments of nine MCP amino acid sequences. Interestingly, LCDV-C and LCDV-1 were clustered together, but their amino acid identity is much less than that in other clusters. The unexpected levels of divergence between their genomes in size, gene organization, and gene product identity suggest that LCDV-C and LCDV-1 shouldn't belong to a same species and that LCDV-C should be considered a species different from LCDV-1.
Complete Genome Sequence of Lymphocystis Disease Virus Isolated from China

PubMed Central

Zhang, Qi-Ya; Xiao, Feng; Xie, Jian; Li, Zheng-Qiu; Gui, Jian-Fang

2004-01-01

Lymphocystis diseases in fish throughout the world have been extensively described. Here we report the complete genome sequence of lymphocystis disease virus isolated in China (LCDV-C), an LCDV isolated from cultured flounder (Paralichthys olivaceus) with lymphocystis disease in China. The LCDV-C genome is 186,250 bp, with a base composition of 27.25% G+C. Computer-assisted analysis revealed 240 potential open reading frames (ORFs) and 176 nonoverlapping putative viral genes, which encode polypeptides ranging from 40 to 1,193 amino acids. The percent coding density is 67%, and the average length of each ORF is 702 bp. A search of the GenBank database using the 176 individual putative genes revealed 103 homologues to the corresponding ORFs of LCDV-1 and 73 potential genes that were not found in LCDV-1 and other iridoviruses. Among the 73 genes, there are 8 genes that contain conserved domains of cellular genes and 65 novel genes that do not show any significant homology with the sequences in public databases. Although a certain extent of similarity between putative gene products of LCDV-C and corresponding proteins of LCDV-1 was revealed, no colinearity was detected when their ORF arrangements and coding strategies were compared to each other, suggesting that a high degree of genetic rearrangements between them has occurred. And a large number of tandem and overlapping repeated sequences were observed in the LCDV-C genome. The deduced amino acid sequence of the major capsid protein (MCP) presents the highest identity to those of LCDV-1 and other iridoviruses among the LCDV-C gene products. Furthermore, a phylogenetic tree was constructed based on the multiple alignments of nine MCP amino acid sequences. Interestingly, LCDV-C and LCDV-1 were clustered together, but their amino acid identity is much less than that in other clusters. The unexpected levels of divergence between their genomes in size, gene organization, and gene product identity suggest that LCDV-C and LCDV-1 shouldn't belong to a same species and that LCDV-C should be considered a species different from LCDV-1. PMID:15194775
Intact coding region of the serotonin transporter gene in obsessive-compulsive disorder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Altemus, M.; Murphy, D.L.; Greenberg, B.

1996-07-26

Epidemiologic studies indicate that obsessive-compulsive disorder is genetically transmitted in some families, although no genetic abnormalities have been identified in individuals with this disorder. The selective response of obsessive-compulsive disorder to treatment with agents which block serotonin reuptake suggests the gene coding for the serotonin transporter as a candidate gene. The primary structure of the serotonin-transporter coding region was sequenced in 22 patients with obsessive-compulsive disorder, using direct PCR sequencing of cDNA synthesized from platelet serotonin-transporter mRNA. No variations in amino acid sequence were found among the obsessive-compulsive disorder patients or healthy controls. These results do not support a rolemore » for alteration in the primary structure of the coding region of the serotonin-transporter gene in the pathogenesis of obsessive-compulsive disorder. 27 refs.« less
Novel coding, translation, and gene expression of a replicating covalently closed circular RNA of 220 nt.

PubMed

AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer

2014-10-07

The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact "nanogenome."
Comprehensive Identification of Long Non-coding RNAs in Purified Cell Types from the Brain Reveals Functional LncRNA in OPC Fate Determination.

PubMed

Dong, Xiaomin; Chen, Kenian; Cuevas-Diaz Duran, Raquel; You, Yanan; Sloan, Steven A; Zhang, Ye; Zong, Shan; Cao, Qilin; Barres, Ben A; Wu, Jia Qian

2015-12-01

Long non-coding RNAs (lncRNAs) (> 200 bp) play crucial roles in transcriptional regulation during numerous biological processes. However, it is challenging to comprehensively identify lncRNAs, because they are often expressed at low levels and with more cell-type specificity than are protein-coding genes. In the present study, we performed ab initio transcriptome reconstruction using eight purified cell populations from mouse cortex and detected more than 5000 lncRNAs. Predicting the functions of lncRNAs using cell-type specific data revealed their potential functional roles in Central Nervous System (CNS) development. We performed motif searches in ENCODE DNase I digital footprint data and Mouse ENCODE promoters to infer transcription factor (TF) occupancy. By integrating TF binding and cell-type specific transcriptomic data, we constructed a novel framework that is useful for systematically identifying lncRNAs that are potentially essential for brain cell fate determination. Based on this integrative analysis, we identified lncRNAs that are regulated during Oligodendrocyte Precursor Cell (OPC) differentiation from Neural Stem Cells (NSCs) and that are likely to be involved in oligodendrogenesis. The top candidate, lnc-OPC, shows highly specific expression in OPCs and remarkable sequence conservation among placental mammals. Interestingly, lnc-OPC is significantly up-regulated in glial progenitors from experimental autoimmune encephalomyelitis (EAE) mouse models compared to wild-type mice. OLIG2-binding sites in the upstream regulatory region of lnc-OPC were identified by ChIP (chromatin immunoprecipitation)-Sequencing and validated by luciferase assays. Loss-of-function experiments confirmed that lnc-OPC plays a functional role in OPC genesis. Overall, our results substantiated the role of lncRNA in OPC fate determination and provided an unprecedented data source for future functional investigations in CNS cell types. We present our datasets and analysis results via the interactive genome browser at our laboratory website that is freely accessible to the research community. This is the first lncRNA expression database of collective populations of glia, vascular cells, and neurons. We anticipate that these studies will advance the knowledge of this major class of non-coding genes and their potential roles in neurological development and diseases.
Gene and genon concept: coding versus regulation

PubMed Central

2007-01-01

We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

PubMed Central

Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

2015-01-01

Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190
Reading biological processes from nucleotide sequences

NASA Astrophysics Data System (ADS)

Murugan, Anand

Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical mechanisms.
Stability of Yellow Fever Virus under Recombinatory Pressure as Compared with Chikungunya Virus

PubMed Central

McGee, Charles E.; Tsetsarkin, Konstantin A.; Guy, Bruno; Lang, Jean; Plante, Kenneth; Vanlandingham, Dana L.; Higgs, Stephen

2011-01-01

Recombination is a mechanism whereby positive sense single stranded RNA viruses exchange segments of genetic information. Recent phylogenetic analyses of naturally occurring recombinant flaviviruses have raised concerns regarding the potential for the emergence of virulent recombinants either post-vaccination or following co-infection with two distinct wild-type viruses. To characterize the conditions and sequences that favor RNA arthropod-borne virus recombination we constructed yellow fever virus (YFV) 17D recombinant crosses containing complementary deletions in the envelope protein coding sequence. These constructs were designed to strongly favor recombination, and the detection conditions were optimized to achieve high sensitivity recovery of putative recombinants. Full length recombinant YFV 17D virus was never detected under any of the experimental conditions examined, despite achieving estimated YFV replicon co-infection levels of ∼2.4×106 in BHK-21 (vertebrate) cells and ∼1.05×105 in C710 (arthropod) cells. Additionally YFV 17D superinfection resistance was observed in vertebrate and arthropod cells harboring a primary infection with wild-type YFV Asibi strain. Furthermore recombination potential was also evaluated using similarly designed chikungunya virus (CHIKV) replicons towards validation of this strategy for recombination detection. Non-homologus recombination was observed for CHIKV within the structural gene coding sequence resulting in an in-frame duplication of capsid and E3 gene. Based on these data, it is concluded that even in the unlikely event of a high level acute co-infection of two distinct YFV genomes in an arthropod or vertebrate host, the generation of viable flavivirus recombinants is extremely unlikely. PMID:21826243
Stability of yellow fever virus under recombinatory pressure as compared with chikungunya virus.

PubMed

McGee, Charles E; Tsetsarkin, Konstantin A; Guy, Bruno; Lang, Jean; Plante, Kenneth; Vanlandingham, Dana L; Higgs, Stephen

2011-01-01

Recombination is a mechanism whereby positive sense single stranded RNA viruses exchange segments of genetic information. Recent phylogenetic analyses of naturally occurring recombinant flaviviruses have raised concerns regarding the potential for the emergence of virulent recombinants either post-vaccination or following co-infection with two distinct wild-type viruses. To characterize the conditions and sequences that favor RNA arthropod-borne virus recombination we constructed yellow fever virus (YFV) 17D recombinant crosses containing complementary deletions in the envelope protein coding sequence. These constructs were designed to strongly favor recombination, and the detection conditions were optimized to achieve high sensitivity recovery of putative recombinants. Full length recombinant YFV 17D virus was never detected under any of the experimental conditions examined, despite achieving estimated YFV replicon co-infection levels of ∼2.4 x 10⁶ in BHK-21 (vertebrate) cells and ∼1.05 x 10⁵ in C₇10 (arthropod) cells. Additionally YFV 17D superinfection resistance was observed in vertebrate and arthropod cells harboring a primary infection with wild-type YFV Asibi strain. Furthermore recombination potential was also evaluated using similarly designed chikungunya virus (CHIKV) replicons towards validation of this strategy for recombination detection. Non-homologus recombination was observed for CHIKV within the structural gene coding sequence resulting in an in-frame duplication of capsid and E3 gene. Based on these data, it is concluded that even in the unlikely event of a high level acute co-infection of two distinct YFV genomes in an arthropod or vertebrate host, the generation of viable flavivirus recombinants is extremely unlikely.
A new polymorphic and multicopy MHC gene family related to nonmammalian class I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

1994-12-31

The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less
Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia.

PubMed

Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee

2016-01-01

Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5' and 3' non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63-81% among themselves and 63-96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions.

PubMed

Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai

2017-01-01

Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

PubMed Central

Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

2015-01-01

Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082

Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions

PubMed Central

Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize

2017-01-01

Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399
Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia

PubMed Central

Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee

2016-01-01

Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5′ and 3′ non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63–81% among themselves and 63–96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection. PMID:27199901
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

PubMed

Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

2014-01-13

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Nodeomics: Pathogen Detection in Vertebrate Lymph Nodes Using Meta-Transcriptomics

USGS Publications Warehouse

Wittekindt, Nicola E.; Padhi, Abinash; Schuster, Stephan C.; Qi, Ji; Zhao, Fangqing; Tomsho, Lynn P.; Kasson, Lindsay R.; Packard, Michael; Cross, Paul C.; Poss, Mary

2010-01-01

The ongoing emergence of human infections originating from wildlife highlights the need for better knowledge of the microbial community in wildlife species where traditional diagnostic approaches are limited. Here we evaluate the microbial biota in healthy mule deer (Odocoileus hemionus) by analyses of lymph node meta-transcriptomes. cDNA libraries from five individuals and two pools of samples were prepared from retropharyngeal lymph node RNA enriched for polyadenylated RNA and sequenced using Roche-454 Life Sciences technology. Protein-coding and 16S ribosomal RNA (rRNA) sequences were taxonomically profiled using protein and rRNA specific databases. Representatives of all bacterial phyla were detected in the seven libraries based on protein-coding transcripts indicating that viable microbiota were present in lymph nodes. Residents of skin and rumen, and those ubiquitous in mule deer habitat dominated classifiable bacterial species. Based on detection of both rRNA and protein-coding transcripts, we identified two new proteobacterial species; a Helicobacter closely related to Helicobacter cetorum in the Helicobacter pylori/Helicobacter acinonychis complex and an Acinetobacter related to Acinetobacter schindleri. Among viruses, a novel gamma retrovirus and other members of the Poxviridae and Retroviridae were identified. We additionally evaluated bacterial diversity by amplicon sequencing the hypervariable V6 region of 16S rRNA and demonstrate that overall taxonomic diversity is higher with the meta-transcriptomic approach. These data provide the most complete picture to date of the microbial diversity within a wildlife host. Our research advances the use of meta-transcriptomics to study microbiota in wildlife tissues, which will facilitate detection of novel organisms with pathogenic potential to human and animals.
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

PubMed

Schafer, Phillip B; Jin, Dezhe Z

2014-03-01

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Coding visual features extracted from video sequences.

PubMed

Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

2014-05-01

Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
Complete mitochondrial genome of Camponotus atrox (Hymenoptera: Formicidae): a new tRNA arrangement in Hymenoptera.

PubMed

Kim, Min Jee; Hong, Eui Jeong; Kim, Iksoo

2016-01-01

We sequenced the complete mitochondrial (mt) genome of Camponotus atrox (Hymenoptera: Formicidae), which is only distributed in Korea. The genome was 16 540 bp in size and contained typical sets of genes (13 protein-coding genes, 22 tRNAs, and 2 rRNAs). The C. atrox A+T-rich region, at 1402 bp, was the longest of all sequenced ant genomes and was composed of an identical tandem repeat consisting of six 100-bp copies and one 96-bp copy. A total of 315 bp of intergenic spacer sequence was spread over 23 regions. An alignment of the spacer sequences in ants was largely feasible among congeneric species, and there was substantial sequence divergence, indicating their potential use as molecular markers for congeneric species. The A/T contents at the first and second codon positions of protein-coding genes (PCGs) were similar for ant species, including C. atrox (73.9% vs. 72.3%, on average). With increased taxon sampling among hymenopteran superfamilies, differences in the divergence rates (i.e., the non-synonymous substitution rates) between the suborders Symphyta and Apocrita were detected, consistent with previous results. The C. atrox mt genome had a unique gene arrangement, trnI-trnM-trnQ, at the A+T-rich region and ND2 junction (underline indicates inverted gene). This may have originated from a tandem duplication of trnM-trnI, resulting in trnM-trnI-trnM-trnI-trnQ, and the subsequent loss of the first trnM and second trnI, resulting in trnI-trnM-trnQ.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

PubMed

Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

2010-02-01

Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

PubMed Central

Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

2010-01-01

Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Creating reference gene annotation for the mouse C57BL6/J genome assembly.

PubMed

Mudge, Jonathan M; Harrow, Jennifer

2015-10-01

Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, alongside experimental and in silico validation pipelines from other members of the consortium. We discuss the more recent incorporation of next-generation sequencing datasets into this workflow, including the usage of mass-spectrometry data to potentially identify novel protein-coding genes. Finally, we will outline how the C57BL6/J genebuild can be used to gain insights into the variant sites that distinguish different mouse strains and species.
Long non-coding RNAs and their biological roles in plants.

PubMed

Liu, Xue; Hao, Lili; Li, Dayong; Zhu, Lihuang; Hu, Songnian

2015-06-01

With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non-protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs (lnpcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in plants. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

PubMed Central

Tramontano, A; Macchiato, M F

1986-01-01

An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
Identification of novel craniofacial regulatory domains located far upstream of SOX9 and disrupted in Pierre Robin sequence

PubMed Central

Gordon, Christopher T.; Attanasio, Catia; Bhatia, Shipra; Benko, Sabina; Ansari, Morad; Tan, Tiong Y.; Munnich, Arnold; Pennacchio, Len A.; Abadie, Véronique; Temple, I. Karen; Goldenberg, Alice; van Heyningen, Veronica; Amiel, Jeanne; FitzPatrick, David; Kleinjan, Dirk A.; Visel, Axel; Lyonnet, Stanislas

2015-01-01

Mutations in the coding sequence of SOX9 cause campomelic dysplasia (CD), a disorder of skeletal development associated with 46,XY disorders of sex development (DSDs). Translocations, deletions and duplications within a ~2 Mb region upstream of SOX9 can recapitulate the CD-DSD phenotype fully or partially, suggesting the existence of an unusually large cis-regulatory control region. Pierre Robin sequence (PRS) is a craniofacial disorder that is frequently an endophenotype of CD and a locus for isolated PRS at ~1.2-1.5 Mb upstream of SOX9 has been previously reported. The craniofacial regulatory potential within this locus, and within the greater genomic domain surrounding SOX9, remains poorly defined. We report two novel deletions upstream of SOX9 in families with PRS, allowing refinement of the regions harbouring candidate craniofacial regulatory elements. In parallel, ChIP-Seq for p300 binding sites in mouse craniofacial tissue led to the identification of several novel craniofacial enhancers at the SOX9 locus, which were validated in transgenic reporter mice and zebrafish. Notably, some of the functionally validated elements fall within the PRS deletions. These studies suggest that multiple non-coding elements contribute to the craniofacial regulation of SOX9 expression, and that their disruption results in PRS. PMID:24934569
Molecular control of copper homeostasis in filamentous fungi: increased expression of a metallothionein gene during aging of Podospora anserina.

PubMed

Averbeck, N B; Borghouts, C; Hamann, A; Specke, V; Osiewacz, H D

2001-01-01

The lifespan of the ascomycete Podospora anserina was previously demonstrated to be significantly increased in a copper-uptake mutant, suggesting that copper is a potential stressor involved in degenerative processes. In order to determine whether changes in copper stress occur in the cells during normal aging of cultures, we cloned and characterized a gene coding for a component of the molecular machinery involved in the control of copper homeostasis. This gene, PaMt1, is a single-copy gene that encodes a metallothionein of 26 amino acids. The coding sequence of PaMt1 is interrupted by a single intron. The deduced amino acid sequence shows a high degree of sequence identity to metallothioneins of the filamentous ascomycete Neurospora crassa and the basidiomycete Agaricus bisporus, and to the N-terminal portion of mammalian metallothioneins. Levels of PaMt1 transcript increase in response to elevated amounts of copper in the growth medium and during aging of wild-type cultures. In contrast, in the long-lived mutant grisea, transcript levels first increase but then decrease again. The ability of wild-type cultures to respond to exogenous copper stress via the induction of PaMt1 transcription is not affected as they grow older.
mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

PubMed

Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

2013-08-15

Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
The complete mitochondrial genome sequence of the spider habronattus oregonensis reveals rearranged and extremely truncated tRNAs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Masta, Susan E.; Boore, Jeffrey L.

2004-01-31

We sequenced the entire mitochondrial genome of the jumping spider Habronattus oregonensis of the arachnid order Araneae (Arthropoda: Chelicerata). A number of unusual features distinguish this genome from other chelicerate and arthropod mitochondrial genomes. Most of the transfer RNA gene sequences are greatly reduced in size and cannot be folded into typical cloverleaf-shaped secondary structures. At least nine of the tRNA sequences lack the potential to form TYC arm stem pairings, and instead are inferred to have TV-replacement loops. Furthermore, sequences that could encode the 3' aminoacyl acceptor stems in at least 10 tRNAs appear to be lacking, because fullymore » paired acceptor stems are not possible and because the downstream sequences instead encode adjacent genes. Hence, these appear to be among the smallest known tRNA genes. We postulate that an RNA editing mechanism must exist to restore the 3' aminoacyl acceptor stems in order to allow the tRNAs to function. At least seven tRN As are rearranged with respect to the chelicerate Limulus polyphemus, although the arrangement of the protein-coding genes is identical. Most mitochondrial protein-coding genes of H. oregonensis have ATN as initiation codons, as commonly found in arthropod mtDNAs, but cytochrome oxidase subunit 2 and 3 genes apparently use UUG as an initiation codon. Finally, many of the gene sequences overlap one another and are truncated. This 14,381 bp genome, the first mitochondrial genome of a spider yet sequenced, is one of the smallest arthropod mitochondrial genomes known. We suggest that post transcriptional RNA editing can likely maintain function of the tRNAs while permitting the accumulation of mutations that would otherwise be deleterious. Such mechanisms may have allowed for the minimization of the spider mitochondrial genome.« less
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

PubMed

Cao, Yinhe; Tung, Wen-Wen; Gao, J B

2004-01-01

With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
A candidate gene for choanal atresia in alpaca.

PubMed

Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G

2010-03-01

Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.
Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

PubMed

Pang, Erli; Wu, Xiaomei; Lin, Kui

2016-06-01

Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
Translational resistivity/conductivity of coding sequences during exponential growth of Escherichia coli.

PubMed

Takai, Kazuyuki

2017-01-21

Codon adaptation index (CAI) has been widely used for prediction of expression of recombinant genes in Escherichia coli and other organisms. However, CAI has no mechanistic basis that rationalizes its application to estimation of translational efficiency. Here, I propose a model based on which we could consider how codon usage is related to the level of expression during exponential growth of bacteria. In this model, translation of a gene is considered as an analog of electric current, and an analog of electric resistance corresponding to each gene is considered. "Translational resistance" is dependent on the steady-state concentration and the sequence of the mRNA species, and "translational resistivity" is dependent only on the mRNA sequence. The latter is the sum of two parts: one is the resistivity for the elongation reaction (coding sequence resistivity), and the other comes from all of the other steps of the decoding reaction. This electric circuit model clearly shows that some conditions should be met for codon composition of a coding sequence to correlate well with its expression level. On the other hand, I calculated relative frequency of each of the 61 sense codon triplets translated during exponential growth of E. coli from a proteomic dataset covering over 2600 proteins. A tentative method for estimating relative coding sequence resistivity based on the data is presented. Copyright Â© 2016. Published by Elsevier Ltd.

Origins of Genes: "Big Bang" or Continuous Creation?

NASA Astrophysics Data System (ADS)

Kesse, Paul K.; Gibbs, Adrian

1992-10-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

PubMed

Brunak, S; Engelbrecht, J

1996-06-01

A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.
Evolution and Diversity of the Human Hepatitis D Virus Genome

PubMed Central

Huang, Chi-Ruei; Lo, Szecheng J.

2010-01-01

Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work. PMID:20204073
Draft Genome Sequence of Geobacillus sp. LEMMY01, a Thermophilic Bacterium Isolated from the Site of a Burning Grass Pile.

PubMed

de Souza, Yuri Pinheiro Alves; da Mota, Fábio Faria; Rosado, Alexandre Soares

2017-05-11

We report here the 3,586,065-bp draft genome of Geobacillus sp. LEMMY01, which was isolated (axenic culture) from a thermophilic chemolitoautotrophic consortium obtained from the site of a burning grass pile. The genome contains biosynthetic gene clusters coding for secondary metabolites, such as terpene and lantipeptide, confirming the biotechnological potential of this strain. Copyright © 2017 de Souza et al.
Draft Genome Sequence of Roseovarius sp. A-2, an Iodide-Oxidizing Bacterium Isolated from Natural Gas Brine Water, Chiba, Japan.

PubMed

Yuliana, Tri; Nakajima, Nobuyoshi; Yamamura, Shigeki; Tomita, Masaru; Suzuki, Haruo; Amachi, Seigo

2017-01-01

Roseovarius sp. A-2 is a heterotrophic iodide (I - )-oxidizing bacterium isolated from iodide-rich natural gas brine water in Chiba, Japan. This strain oxidizes iodide to molecular iodine (I 2 ) by means of an extracellular multicopper oxidase. Here we report the draft genome sequence of strain A-2. The draft genome contained 46 tRNA genes, 1 copy of a 16S-23S-5S rRNA operon, and 4,514 protein coding DNA sequences, of which 1,207 (27%) were hypothetical proteins. The genome contained a gene encoding IoxA, a multicopper oxidase previously found to catalyze the oxidation of iodide in Iodidimonas sp. Q-1. This draft genome provides detailed insights into the metabolism and potential application of Roseovarius sp. A-2.
Fast and low-cost structured light pattern sequence projection.

PubMed

Wissmann, Patrick; Forster, Frank; Schmitt, Robert

2011-11-21

We present a high-speed and low-cost approach for structured light pattern sequence projection. Using a fast rotating binary spatial light modulator, our method is potentially capable of projection frequencies in the kHz domain, while enabling pattern rasterization as low as 2 μm pixel size and inherently linear grayscale reproduction quantized at 12 bits/pixel or better. Due to the circular arrangement of the projected fringe patterns, we extend the widely used ray-plane triangulation method to ray-cone triangulation and provide a detailed description of the optical calibration procedure. Using the proposed projection concept in conjunction with the recently published coded phase shift (CPS) pattern sequence, we demonstrate high accuracy 3-D measurement at 200 Hz projection frequency and 20 Hz 3-D reconstruction rate. © 2011 Optical Society of America
Random digital encryption secure communication system

NASA Technical Reports Server (NTRS)

Doland, G. D. (Inventor)

1982-01-01

The design of a secure communication system is described. A product code, formed from two pseudorandom sequences of digital bits, is used to encipher or scramble data prior to transmission. The two pseudorandom sequences are periodically changed at intervals before they have had time to repeat. One of the two sequences is transmitted continuously with the scrambled data for synchronization. In the receiver portion of the system, the incoming signal is compared with one of two locally generated pseudorandom sequences until correspondence between the sequences is obtained. At this time, the two locally generated sequences are formed into a product code which deciphers the data from the incoming signal. Provision is made to ensure synchronization of the transmitting and receiving portions of the system.
Expressed gene sequence of the IFN-gamma-response chemokine CXCL9 of cattle, horses, and swine

USDA-ARS?s Scientific Manuscript database

This report describes the cloning and characterization of expressed gene sequences of bovine, equine, and swine CXCL9 from RNA obtained from peripheral blood mononuclear cell (PBMC) or other tissues. The bovine coding region was 378 nucleotides in length, while the equine and swine coding regions w...
Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

PubMed

Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

2012-01-01

Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.
ISSYS: An integrated synergistic Synthesis System

NASA Technical Reports Server (NTRS)

Dovi, A. R.

1980-01-01

Integrated Synergistic Synthesis System (ISSYS), an integrated system of computer codes in which the sequence of program execution and data flow is controlled by the user, is discussed. The commands available to exert such control, the ISSYS major function and rules, and the computer codes currently available in the system are described. Computational sequences frequently used in the aircraft structural analysis and synthesis are defined. External computer codes utilized by the ISSYS system are documented. A bibliography on the programs is included.
Transcriptome Analysis of Scorpion Species Belonging to the Vaejovis Genus

PubMed Central

Quintero-Hernández, Verónica; Ramírez-Carreto, Santos; Romero-Gutiérrez, María Teresa; Valdez-Velázquez, Laura L.; Becerril, Baltazar; Possani, Lourival D.; Ortiz, Ernesto

2015-01-01

Scorpions belonging to the Buthidae family have traditionally drawn much of the biochemist’s attention due to the strong toxicity of their venoms. Scorpions not toxic to mammals, however, also have complex venoms. They have been shown to be an important source of bioactive peptides, some of them identified as potential drug candidates for the treatment of several emerging diseases and conditions. It is therefore important to characterize the large diversity of components found in the non-Buthidae venoms. As a contribution to this goal, this manuscript reports the construction and characterization of cDNA libraries from four scorpion species belonging to the Vaejovis genus of the Vaejovidae family: Vaejovis mexicanus, V. intrepidus, V. subcristatus and V. punctatus. Some sequences coding for channel-acting toxins were found, as expected, but the main transcribed genes in the glands actively producing venom were those coding for non disulfide-bridged peptides. The ESTs coding for putative channel-acting toxins, corresponded to sodium channel β toxins, to members of the potassium channel-acting α or κ families, and to calcium channel-acting toxins of the calcin family. Transcripts for scorpine-like peptides of two different lengths were found, with some of the species coding for the two kinds. One sequence coding for La1-like peptides, of yet unknown function, was found for each species. Finally, the most abundant transcripts corresponded to peptides belonging to the long chain multifunctional NDBP-2 family and to the short antimicrobials of the NDBP-4 family. This apparent venom composition is in correspondence with the data obtained to date for other non-Buthidae species. Our study constitutes the first approach to the characterization of the venom gland transcriptome for scorpion species belonging to the Vaejovidae family. PMID:25659089
Transcriptome analysis of scorpion species belonging to the Vaejovis genus.

PubMed

Quintero-Hernández, Verónica; Ramírez-Carreto, Santos; Romero-Gutiérrez, María Teresa; Valdez-Velázquez, Laura L; Becerril, Baltazar; Possani, Lourival D; Ortiz, Ernesto

2015-01-01

Scorpions belonging to the Buthidae family have traditionally drawn much of the biochemist's attention due to the strong toxicity of their venoms. Scorpions not toxic to mammals, however, also have complex venoms. They have been shown to be an important source of bioactive peptides, some of them identified as potential drug candidates for the treatment of several emerging diseases and conditions. It is therefore important to characterize the large diversity of components found in the non-Buthidae venoms. As a contribution to this goal, this manuscript reports the construction and characterization of cDNA libraries from four scorpion species belonging to the Vaejovis genus of the Vaejovidae family: Vaejovis mexicanus, V. intrepidus, V. subcristatus and V. punctatus. Some sequences coding for channel-acting toxins were found, as expected, but the main transcribed genes in the glands actively producing venom were those coding for non disulfide-bridged peptides. The ESTs coding for putative channel-acting toxins, corresponded to sodium channel β toxins, to members of the potassium channel-acting α or κ families, and to calcium channel-acting toxins of the calcin family. Transcripts for scorpine-like peptides of two different lengths were found, with some of the species coding for the two kinds. One sequence coding for La1-like peptides, of yet unknown function, was found for each species. Finally, the most abundant transcripts corresponded to peptides belonging to the long chain multifunctional NDBP-2 family and to the short antimicrobials of the NDBP-4 family. This apparent venom composition is in correspondence with the data obtained to date for other non-Buthidae species. Our study constitutes the first approach to the characterization of the venom gland transcriptome for scorpion species belonging to the Vaejovidae family.
Connection anonymity analysis in coded-WDM PONs

NASA Astrophysics Data System (ADS)

Sue, Chuan-Ching

2008-04-01

A coded wavelength division multiplexing passive optical network (WDM PON) is presented for fiber to the home (FTTH) systems to protect against eavesdropping. The proposed scheme applies spectral amplitude coding (SAC) with a unipolar maximal-length sequence (M-sequence) code matrix to generate a specific signature address (coding) and to retrieve its matching address codeword (decoding) by exploiting the cyclic properties inherent in array waveguide grating (AWG) routers. In addition to ensuring the confidentiality of user data, the proposed coded-WDM scheme is also a suitable candidate for the physical layer with connection anonymity. Under the assumption that the eavesdropper applies a photo-detection strategy, it is shown that the coded WDM PON outperforms the conventional TDM PON and WDM PON schemes in terms of a higher degree of connection anonymity. Additionally, the proposed scheme allows the system operator to partition the optical network units (ONUs) into appropriate groups so as to achieve a better degree of anonymity.
Identification of a novel 15.5 kb SHOX deletion associated with marked intrafamilial phenotypic variability and analysis of its molecular origin.

PubMed

Alexandrou, Angelos; Papaevripidou, Ioannis; Tsangaras, Kyriakos; Alexandrou, Ioanna; Tryfonidis, Marios; Christophidou-Anastasiadou, Violetta; Zamba-Papanicolaou, Eleni; Koumbaris, George; Neocleous, Vassos; Phylactou, Leonidas A; Skordis, Nicos; Tanteles, George A; Sismani, Carolina

2016-12-01

Haploinsufficiency of the short stature homeobox contaning SHOX gene has been shown to result in a spectrum of phenotypes ranging from Leri-Weill dyschondrosteosis (LWD) at the more severe end to SHOX-related short stature at the milder end of the spectrum. Most alterations are whole gene deletions, point mutations within the coding region, or microdeletions in its flanking sequences. Here, we present the clinical and molecular data as well as the potential molecular mechanism underlying a novel microdeletion, causing a variable SHOX-related haploinsufficiency disorder in a three-generation family. The phenotype resembles that of LWD in females, in males, however, the phenotypic expression is milder. The 15523-bp SHOX intragenic deletion, encompassing exons 3-6, was initially detected by array-CGH, followed by MLPA analysis. Sequencing of the breakpoints indicated an Alu recombination-mediated deletion (ARMD) as the potential causative mechanism.
Novel coding, translation, and gene expression of a replicating covalently closed circular RNA of 220 nt

PubMed Central

AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer

2014-01-01

The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact “nanogenome.” PMID:25253891
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3; The Map and Related Decoding Algirithms

NASA Technical Reports Server (NTRS)

Lin, Shu; Fossorier, Marc

1998-01-01

In a coded communication system with equiprobable signaling, MLD minimizes the word error probability and delivers the most likely codeword associated with the corresponding received sequence. This decoding has two drawbacks. First, minimization of the word error probability is not equivalent to minimization of the bit error probability. Therefore, MLD becomes suboptimum with respect to the bit error probability. Second, MLD delivers a hard-decision estimate of the received sequence, so that information is lost between the input and output of the ML decoder. This information is important in coded schemes where the decoded sequence is further processed, such as concatenated coding schemes, multi-stage and iterative decoding schemes. In this chapter, we first present a decoding algorithm which both minimizes bit error probability, and provides the corresponding soft information at the output of the decoder. This algorithm is referred to as the MAP (maximum aposteriori probability) decoding algorithm.
Automatic vehicle location system

NASA Technical Reports Server (NTRS)

Hansen, G. R., Jr. (Inventor)

1973-01-01

An automatic vehicle detection system is disclosed, in which each vehicle whose location is to be detected carries active means which interact with passive elements at each location to be identified. The passive elements comprise a plurality of passive loops arranged in a sequence along the travel direction. Each of the loops is tuned to a chosen frequency so that the sequence of the frequencies defines the location code. As the vehicle traverses the sequence of the loops as it passes over each loop, signals only at the frequency of the loop being passed over are coupled from a vehicle transmitter to a vehicle receiver. The frequencies of the received signals in the receiver produce outputs which together represent a code of the traversed location. The code location is defined by a painted pattern which reflects light to a vehicle carried detector whose output is used to derive the code defined by the pattern.
Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India.

PubMed

Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

2016-06-01

Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58' N; 76° 31' E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species.
gCUP: rapid GPU-based HIV-1 co-receptor usage prediction for next-generation sequencing.

PubMed

Olejnik, Michael; Steuwer, Michel; Gorlatch, Sergei; Heider, Dominik

2014-11-15

Next-generation sequencing (NGS) has a large potential in HIV diagnostics, and genotypic prediction models have been developed and successfully tested in the recent years. However, albeit being highly accurate, these computational models lack computational efficiency to reach their full potential. In this study, we demonstrate the use of graphics processing units (GPUs) in combination with a computational prediction model for HIV tropism. Our new model named gCUP, parallelized and optimized for GPU, is highly accurate and can classify >175 000 sequences per second on an NVIDIA GeForce GTX 460. The computational efficiency of our new model is the next step to enable NGS technologies to reach clinical significance in HIV diagnostics. Moreover, our approach is not limited to HIV tropism prediction, but can also be easily adapted to other settings, e.g. drug resistance prediction. The source code can be downloaded at http://www.heiderlab.de d.heider@wz-straubing.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Sequence differences in the diagnostic region of the cysteine protease 8 gene of Tritrichomonas foetus parasites of cats and cattle.

PubMed

Sun, Zichen; Stack, Colin; Šlapeta, Jan

2012-05-25

In order to investigate the genetic variation between Tritrichomonas foetus from bovine and feline origins, cysteine protease 8 (CP8) coding sequence was selected as the polymorphic DNA marker. Direct sequencing of CP8 coding sequence of T. foetus from four feline isolates and two bovine isolates with polymerase chain reaction successfully revealed conserved nucleotide polymorphisms between feline and bovine isolates. These results provide useful information for CP8-based molecular differentiation of T. foetus genotypes. Copyright © 2011 Elsevier B.V. All rights reserved.

EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

PubMed

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-07-01

EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.
Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

PubMed

Lathe, R

1985-05-05

Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.
What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

NASA Astrophysics Data System (ADS)

Liebovitch, Larry

1998-03-01

The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.
A Bioinformatics-Based Alternative mRNA Splicing Code that May Explain Some Disease Mutations Is Conserved in Animals.

PubMed

Qu, Wen; Cingolani, Pablo; Zeeberg, Barry R; Ruden, Douglas M

2017-01-01

Deep sequencing of cDNAs made from spliced mRNAs indicates that most coding genes in many animals and plants have pre-mRNA transcripts that are alternatively spliced. In pre-mRNAs, in addition to invariant exons that are present in almost all mature mRNA products, there are at least 6 additional types of exons, such as exons from alternative promoters or with alternative polyA sites, mutually exclusive exons, skipped exons, or exons with alternative 5' or 3' splice sites. Our bioinformatics-based hypothesis is that, in analogy to the genetic code, there is an "alternative-splicing code" in introns and flanking exon sequences, analogous to the genetic code, that directs alternative splicing of many of the 36 types of introns. In humans, we identified 42 different consensus sequences that are each present in at least 100 human introns. 37 of the 42 top consensus sequences are significantly enriched or depleted in at least one of the 36 types of introns. We further supported our hypothesis by showing that 96 out of 96 analyzed human disease mutations that affect RNA splicing, and change alternative splicing from one class to another, can be partially explained by a mutation altering a consensus sequence from one type of intron to that of another type of intron. Some of the alternative splicing consensus sequences, and presumably their small-RNA or protein targets, are evolutionarily conserved from 50 plant to animal species. We also noticed the set of introns within a gene usually share the same splicing codes, thus arguing that one sub-type of splicesosome might process all (or most) of the introns in a given gene. Our work sheds new light on a possible mechanism for generating the tremendous diversity in protein structure by alternative splicing of pre-mRNAs.
Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

PubMed Central

Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

2013-01-01

Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE PAGES

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

2015-05-12

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
PIPI: PTM-Invariant Peptide Identification Using Coding Method.

PubMed

Yu, Fengchao; Li, Ning; Yu, Weichuan

2016-12-02

In computational proteomics, the identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost associated with database search increases exponentially with respect to the number of modified amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible PTM patterns. To address this issue, one group of methods named restricted tools (including Mascot, Comet, and MS-GF+) only allow a small number of PTM types in database search process. Alternatively, the other group of methods named unrestricted tools (including MS-Alignment, ProteinProspector, and MODa) avoids enumerating PTM patterns with an alignment-based approach to localizing and characterizing modified amino acids. However, because of the large search space and PTM localization issue, the sensitivity of these unrestricted tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI belongs to the category of unrestricted tools. It first codes peptide sequences into Boolean vectors and codes experimental spectra into real-valued vectors. For each coded spectrum, it then searches the coded sequence database to find the top scored peptide sequences as candidates. After that, PIPI uses dynamic programming to localize and characterize modified amino acids in each candidate. We used simulation experiments and real data experiments to evaluate the performance in comparison with restricted tools (i.e., Mascot, Comet, and MS-GF+) and unrestricted tools (i.e., Mascot with error tolerant search, MS-Alignment, ProteinProspector, and MODa). Comparison with restricted tools shows that PIPI has a close sensitivity and running speed. Comparison with unrestricted tools shows that PIPI has the highest sensitivity except for Mascot with error tolerant search and ProteinProspector. These two tools simplify the task by only considering up to one modified amino acid in each peptide, which results in a higher sensitivity but has difficulty in dealing with multiple modified amino acids. The simulation experiments also show that PIPI has the lowest false discovery proportion, the highest PTM characterization accuracy, and the shortest running time among the unrestricted tools.
Comparative and Evolutionary Analyses of Meloidogyne spp. Based on Mitochondrial Genome Sequences

PubMed Central

García, Laura Evangelina; Sánchez-Puerta, M. Virginia

2015-01-01

Molecular taxonomy and evolution of nematodes have been recently the focus of several studies. Mitochondrial sequences were proposed as an alternative for precise identification of Meloidogyne species, to study intraspecific variability and to follow maternal lineages. We characterized the mitochondrial genomes (mtDNAs) of the root knot nematodes M. floridensis, M. hapla and M. incognita. These were AT rich (81–83%) and highly compact, encoding 12 proteins, 2 rRNAs, and 22 tRNAs. Comparisons with published mtDNAs of M. chitwoodi, M. incognita (another strain) and M. graminicola revealed that they share protein and rRNA gene order but differ in the order of tRNAs. The mtDNAs of M. floridensis and M. incognita were strikingly similar (97–100% identity for all coding regions). In contrast, M. floridensis, M. chitwoodi, M. hapla and M. graminicola showed 65–84% nucleotide identity for coding regions. Variable mitochondrial sequences are potentially useful for evolutionary and taxonomic studies. We developed a molecular taxonomic marker by sequencing a highly-variable ~2 kb mitochondrial region, nad5-cox1, from 36 populations of root-knot nematodes to elucidate relationships within the genus Meloidogyne. Isolates of five species formed monophyletic groups and showed little intraspecific variability. We also present a thorough analysis of the mitochondrial region cox2-rrnS. Phylogenies based on either mitochondrial region had good discrimination power but could not discriminate between M. arenaria, M. incognita and M. floridensis. PMID:25799071
Molecular cloning and characterization of a novel RING zinc-finger protein gene up-regulated under in vitro salt stress in cassava.

PubMed

dos Reis, Sávio Pinho; Tavares, Liliane de Souza Conceição; Costa, Carinne de Nazaré Monteiro; Brígida, Aílton Borges Santa; de Souza, Cláudia Regina Batista

2012-06-01

Cassava (Manihot esculenta Crantz) is one of the world's most important food crops. It is cultivated mainly in developing countries of tropics, since its root is a major source of calories for low-income people due to its high productivity and resistance to many abiotic and biotic factors. A previous study has identified a partial cDNA sequence coding for a putative RING zinc finger in cassava storage root. The RING zinc finger protein is a specialized type of zinc finger protein found in many organisms. Here, we isolated the full-length cDNA sequence coding for M. esculenta RZF (MeRZF) protein by a combination of 5' and 3' RACE assays. BLAST analysis showed that its deduced amino acid sequence has a high level of similarity to plant proteins of RZF family. MeRZF protein contains a signature sequence motif for a RING zinc finger at its C-terminal region. In addition, this protein showed a histidine residue at the fifth coordination site, likely belonging to the RING-H2 subgroup, as confirmed by our phylogenetic analysis. There is also a transmembrane domain in its N-terminal region. Finally, semi-quantitative RT-PCR assays showed that MeRZF expression is increased in detached leaves treated with sodium chloride. Here, we report the first evidence of a RING zinc finger gene of cassava showing potential role in response to salt stress.
A comparative analysis of MC4R gene sequence, polymorphism, and chromosomal localization in Chinese raccoon dog and Arctic fox.

PubMed

Skorczyk, Anna; Flisikowski, Krzysztof; Switonski, Marek

2012-05-01

Numerous mutations of the human melanocortin receptor type 4 (MC4R) gene are responsible for monogenic obesity, and some of them appear to be associated with predisposition or resistance to polygenic obesity. Thus, this gene is considered a functional candidate for fat tissue accumulation and body weight in domestic mammals. The aim of the study was comparative analysis of chromosome localization, nucleotide sequence, and polymorphism of the MC4R gene in two farmed species of the Canidae family, namely the Chinese raccoon dog (Nycterutes procyonoides procyonoides) and the arctic fox (Alopex lagopus). The whole coding sequence, including fragments of 3'UTR and 5'UTR, shows 89% similarity between the arctic fox (1276 bp) and Chinese raccoon dog (1213 bp). Altogether, 30 farmed Chinese raccoon dogs and 30 farmed arctic foxes were searched for polymorphisms. In the Chinese raccoon dog, only one silent substitution in the coding sequence was identified; whereas in the arctic fox, four InDels and two single-nucleotide polymorphisms (SNPs) in the 5'UTR and six silent SNPs in the exon were found. The studied gene was mapped by FISH to the Chinese raccoon dog chromosome 9 (NPP9q1.2) and arctic fox chromosome 24 (ALA24q1.2-1.3). The obtained results are discussed in terms of genome evolution of species belonging to the family Canidae and their potential use in animal breeding.
Whole genome sequencing and integrative genomic analysis approach on two 22q11.2 deletion syndrome family trios for genotype to phenotype correlations

PubMed Central

Chung, Jonathan H.; Cai, Jinlu; Suskin, Barrie G.; Zhang, Zhengdong; Coleman, Karlene

2015-01-01

The 22q11.2 deletion syndrome (22q11DS) affects 1:4000 live births and presents with highly variable phenotype expressivity. In this study, we developed an analytical approach utilizing whole genome sequencing and integrative analysis to discover genetic modifiers. Our pipeline combined available tools in order to prioritize rare, predicted deleterious, coding and non-coding single nucleotide variants (SNVs) and insertion/deletions (INDELs) from whole genome sequencing (WGS). We sequenced two unrelated probands with 22q11DS, with contrasting clinical findings, and their unaffected parents. Proband P1 had cognitive impairment, psychotic episodes, anxiety, and tetralogy of Fallot (TOF); while proband P2 had juvenile rheumatoid arthritis but no other major clinical findings. In P1, we identified common variants in COMT and PRODH on 22q11.2 as well as rare potentially deleterious DNA variants in other behavioral/neurocognitive genes. We also identified a de novo SNV in ADNP2 (NM_014913.3:c.2243G>C), encoding a neuroprotective protein that may be involved in behavioral disorders. In P2, we identified a novel non-synonymous SNV in ZFPM2 (NM_012082.3:c.1576C>T), a known causative gene for TOF, which may act as a protective variant downstream of TBX1, haploinsufficiency of which is responsible for congenital heart disease in individuals with 22q11DS. PMID:25981510
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
RNAcentral: A comprehensive database of non-coding RNA sequences

DOE PAGES

Williams, Kelly Porter; Lau, Britney Yan

2016-10-28

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less
RNAcentral: A comprehensive database of non-coding RNA sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Kelly Porter; Lau, Britney Yan

RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less
A specific indel marker for the Philippines Schistosoma japonicum revealed by analysis of mitochondrial genome sequences.

PubMed

Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan

2015-07-01

In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

PubMed Central

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-01-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

PubMed

Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

2014-06-01

It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
Identification of Putative Nuclear Receptors and Steroidogenic Enzymes in Murray-Darling Rainbowfish (Melanotaenia fluviatilis) Using RNA-Seq and De Novo Transcriptome Assembly.

PubMed

Bain, Peter A; Papanicolaou, Alexie; Kumar, Anupama

2015-01-01

Murray-Darling rainbowfish (Melanotaenia fluviatilis [Castelnau, 1878]; Atheriniformes: Melanotaeniidae) is a small-bodied teleost currently under development in Australasia as a test species for aquatic toxicological studies. To date, efforts towards the development of molecular biomarkers of contaminant exposure have been hindered by the lack of available sequence data. To address this, we sequenced messenger RNA from brain, liver and gonads of mature male and female fish and generated a high-quality draft transcriptome using a de novo assembly approach. 149,742 clusters of putative transcripts were obtained, encompassing 43,841 non-redundant protein-coding regions. Deduced amino acid sequences were annotated by functional inference based on similarity with sequences from manually curated protein sequence databases. The draft assembly contained protein-coding regions homologous to 95.7% of the complete cohort of predicted proteins from the taxonomically related species, Oryzias latipes (Japanese medaka). The mean length of rainbowfish protein-coding sequences relative to their medaka homologues was 92.1%, indicating that despite the limited number of tissues sampled a large proportion of the total expected number of protein-coding genes was captured in the study. Because of our interest in the effects of environmental contaminants on endocrine pathways, we manually curated subsets of coding regions for putative nuclear receptors and steroidogenic enzymes in the rainbowfish transcriptome, revealing 61 candidate nuclear receptors encompassing all known subfamilies, and 41 putative steroidogenic enzymes representing all major steroidogenic enzymes occurring in teleosts. The transcriptome presented here will be a valuable resource for researchers interested in biomarker development, protein structure and function, and contaminant-response genomics in Murray-Darling rainbowfish.
Applications of statistical physics and information theory to the analysis of DNA sequences

NASA Astrophysics Data System (ADS)

Grosse, Ivo

2000-10-01

DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

Detecting authorized and unauthorized genetically modified organisms containing vip3A by real-time PCR and next-generation sequencing.

PubMed

Liang, Chanjuan; van Dijk, Jeroen P; Scholtens, Ingrid M J; Staats, Martijn; Prins, Theo W; Voorhuijzen, Marleen M; da Silva, Andrea M; Arisi, Ana Carolina Maisonnave; den Dunnen, Johan T; Kok, Esther J

2014-04-01

The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.
Shared prefetching to reduce execution skew in multi-threaded systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichenberger, Alexandre E; Gunnels, John A

Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
Chromosomal localization and partial genomic structure of the human peroxisome proliferator activated receptor-gamma (hPPAR gamma) gene.

PubMed

Beamer, B A; Negri, C; Yen, C J; Gavrilova, O; Rumberger, J M; Durcan, M J; Yarnall, D P; Hawkins, A L; Griffin, C A; Burns, D K; Roth, J; Reitman, M; Shuldiner, A R

1997-04-28

We determined the chromosomal localization and partial genomic structure of the coding region of the human PPAR gamma gene (hPPAR gamma), a nuclear receptor important for adipocyte differentiation and function. Sequence analysis and long PCR of human genomic DNA with primers that span putative introns revealed that intron positions and sizes of hPPAR gamma are similar to those previously determined for the mouse PPAR gamma gene[13]. Fluorescent in situ hybridization localized hPPAR gamma to chromosome 3, band 3p25. Radiation hybrid mapping with two independent primer pairs was consistent with hPPAR gamma being within 1.5 Mb of marker D3S1263 on 3p25-p24.2. These sequences of the intron/exon junctions of the 6 coding exons shared by hPPAR gamma 1 and hPPAR gamma 2 will facilitate screening for possible mutations. Furthermore, D3S1263 is a suitable polymorphic marker for linkage analysis to evaluate PPAR gamma's potential contribution to genetic susceptibility to obesity, lipoatrophy, insulin resistance, and diabetes.
Characterization of an Equine α-S2-Casein Variant Due to a 1.3 kb Deletion Spanning Two Coding Exons

PubMed Central

Brinkmann, Julia; Koudelka, Tomas; Keppler, Julia K.; Tholey, Andreas; Schwarz, Karin; Thaller, Georg; Tetens, Jens

2015-01-01

The production and consumption of mare’s milk in Europe has gained importance, mainly based on positive health effects and a lower allergenic potential as compared to cows’ milk. The allergenicity of milk is to a certain extent affected by different genetic variants. In classical dairy species, much research has been conducted into the genetic variability of milk proteins, but the knowledge in horses is scarce. Here, we characterize two major forms of equine αS2-casein arising from genomic 1.3 kb in-frame deletion involving two coding exons, one of which represents an equid specific duplication. Findings at the DNA-level have been verified by cDNA sequencing from horse milk of mares with different genotypes. At the protein-level, we were able to show by SDS-page and in-gel digestion with subsequent LC-MS analysis that both proteins are actually expressed. The comparison with published sequences of other equids revealed that the deletion has probably occurred before the ancestor of present-day asses and zebras diverged from the horse lineage. PMID:26444874
The evolution of transcriptional regulation in eukaryotes

NASA Technical Reports Server (NTRS)

Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.

2003-01-01

Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.
Convolutional encoding of self-dual codes

NASA Technical Reports Server (NTRS)

Solomon, G.

1994-01-01

There exist almost complete convolutional encodings of self-dual codes, i.e., block codes of rate 1/2 with weights w, w = 0 mod 4. The codes are of length 8m with the convolutional portion of length 8m-2 and the nonsystematic information of length 4m-1. The last two bits are parity checks on the two (4m-1) length parity sequences. The final information bit complements one of the extended parity sequences of length 4m. Solomon and van Tilborg have developed algorithms to generate these for the Quadratic Residue (QR) Codes of lengths 48 and beyond. For these codes and reasonable constraint lengths, there are sequential decodings for both hard and soft decisions. There are also possible Viterbi-type decodings that may be simple, as in a convolutional encoding/decoding of the extended Golay Code. In addition, the previously found constraint length K = 9 for the QR (48, 24;12) Code is lowered here to K = 8.
Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

PubMed

Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

2013-07-01

Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.
Gene and translation initiation site prediction in metagenomic sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

2012-01-01

Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
Improved Contrast-Enhanced Ultrasound Imaging With Multiplane-Wave Imaging.

PubMed

Gong, Ping; Song, Pengfei; Chen, Shigao

2018-02-01

Contrast-enhanced ultrasound (CEUS) imaging has great potential for use in new ultrasound clinical applications such as myocardial perfusion imaging and abdominal lesion characterization. In CEUS imaging, contrast agents (i.e., microbubbles) are used to improve contrast between blood and tissue because of their high nonlinearity under low ultrasound pressure. However, the quality of CEUS imaging sometimes suffers from a low signal-to-noise ratio (SNR) in deeper imaging regions when a low mechanical index (MI) is used to avoid microbubble disruption, especially for imaging at off-resonance transmit frequencies. In this paper, we propose a new strategy of combining CEUS sequences with the recently proposed multiplane-wave (MW) compounding method to improve the SNR of CEUS in deeper imaging regions without increasing MI or sacrificing frame rate. The MW-CEUS method emits multiple Hadamard-coded CEUS pulses in each transmission event (i.e., pulse-echo event). The received echo signals first undergo fundamental bandpass filtering (i.e., the filter is centered on the transmit frequency) to eliminate the microbubble's second-harmonic signals because they cannot be encoded by pulse inversion. The filtered signals are then Hadamard decoded and realigned in fast time to recover the signals as they would have been obtained using classic CEUS pulses, followed by designed recombination to cancel the linear tissue responses. The MW-CEUS method significantly improved contrast-to-tissue ratio and SNR of CEUS imaging by transmitting longer coded pulses. The image resolution was also preserved. The microbubble disruption ratio and motion artifacts in MW-CEUS were similar to those of classic CEUS imaging. In addition, the MW-CEUS sequence can be adapted to other transmission coding formats. These properties of MW-CEUS can potentially facilitate CEUS imaging for many clinical applications, especially assessing deep abdominal organs or the heart.
Isolation of an intertypic poliovirus capsid recombinant from a child with vaccine-associated paralytic poliomyelitis.

PubMed

Martín, Javier; Samoilovich, Elena; Dunn, Glynis; Lackenby, Angie; Feldman, Esphir; Heath, Alan; Svirchevskaya, Ekaterina; Cooper, Gill; Yermalovich, Marina; Minor, Philip D

2002-11-01

The isolation of a capsid intertypic poliovirus recombinant from a child with vaccine-associated paralytic poliomyelitis is described. Virus 31043 had a Sabin-derived type 3-type 2-type 1 recombinant genome with a 5'-end crossover point within the capsid coding region. The result was a poliovirus chimera containing the entire coding sequence for antigenic site 3a derived from the Sabin type 2 strain. The recombinant virus showed altered antigenic properties but did not acquire type 2 antigenic characteristics. The significance of the presence in nature of such poliovirus chimeras and the consequences for the current efforts to detect potentially dangerous vaccine-derived poliovirus strains are discussed in the context of the global polio eradication initiative.
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

PubMed

Hua, Wei; Wang, Jiasong; Zhao, Jian

2014-01-01

Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.
The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

PubMed

Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

2015-08-06

Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system

USDA-ARS?s Scientific Manuscript database

Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...
Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb

2011-10-28

Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less
Pooled Sequencing of 531 Genes in Inflammatory Bowel Disease Identifies an Associated Rare Variant in BTNL2 and Implicates Other Immune Related Genes

PubMed Central

Prescott, Natalie J.; Lehne, Benjamin; Stone, Kristina; Lee, James C.; Taylor, Kirstin; Knight, Jo; Papouli, Efterpi; Mirza, Muddassar M.; Simpson, Michael A.; Spain, Sarah L.; Lu, Grace; Fraternali, Franca; Bumpstead, Suzannah J.; Gray, Emma; Amar, Ariella; Bye, Hannah; Green, Peter; Chung-Faye, Guy; Hayee, Bu’Hussain; Pollok, Richard; Satsangi, Jack; Parkes, Miles; Barrett, Jeffrey C.; Mansfield, John C.; Sanderson, Jeremy; Lewis, Cathryn M.; Weale, Michael E.; Schlitt, Thomas; Mathew, Christopher G.

2015-01-01

The contribution of rare coding sequence variants to genetic susceptibility in complex disorders is an important but unresolved question. Most studies thus far have investigated a limited number of genes from regions which contain common disease associated variants. Here we investigate this in inflammatory bowel disease by sequencing the exons and proximal promoters of 531 genes selected from both genome-wide association studies and pathway analysis in pooled DNA panels from 474 cases of Crohn’s disease and 480 controls. 80 variants with evidence of association in the sequencing experiment or with potential functional significance were selected for follow up genotyping in 6,507 IBD cases and 3,064 population controls. The top 5 disease associated variants were genotyped in an extension panel of 3,662 IBD cases and 3,639 controls, and tested for association in a combined analysis of 10,147 IBD cases and 7,008 controls. A rare coding variant p.G454C in the BTNL2 gene within the major histocompatibility complex was significantly associated with increased risk for IBD (p = 9.65x10−10, OR = 2.3[95% CI = 1.75–3.04]), but was independent of the known common associated CD and UC variants at this locus. Rare (<1%) and low frequency (1–5%) variants in 3 additional genes showed suggestive association (p<0.005) with either an increased risk (ARIH2 c.338-6C>T) or decreased risk (IL12B p.V298F, and NICN p.H191R) of IBD. These results provide additional insights into the involvement of the inhibition of T cell activation in the development of both sub-phenotypes of inflammatory bowel disease. We suggest that although rare coding variants may make a modest overall contribution to complex disease susceptibility, they can inform our understanding of the molecular pathways that contribute to pathogenesis. PMID:25671699
Loss-of-function mutations in APOC3, triglycerides, and coronary disease.

PubMed

Crosby, Jacy; Peloso, Gina M; Auer, Paul L; Crosslin, David R; Stitziel, Nathan O; Lange, Leslie A; Lu, Yingchang; Tang, Zheng-zheng; Zhang, He; Hindy, George; Masca, Nicholas; Stirrups, Kathleen; Kanoni, Stavroula; Do, Ron; Jun, Goo; Hu, Youna; Kang, Hyun Min; Xue, Chenyi; Goel, Anuj; Farrall, Martin; Duga, Stefano; Merlini, Pier Angelica; Asselta, Rosanna; Girelli, Domenico; Olivieri, Oliviero; Martinelli, Nicola; Yin, Wu; Reilly, Dermot; Speliotes, Elizabeth; Fox, Caroline S; Hveem, Kristian; Holmen, Oddgeir L; Nikpay, Majid; Farlow, Deborah N; Assimes, Themistocles L; Franceschini, Nora; Robinson, Jennifer; North, Kari E; Martin, Lisa W; DePristo, Mark; Gupta, Namrata; Escher, Stefan A; Jansson, Jan-Håkan; Van Zuydam, Natalie; Palmer, Colin N A; Wareham, Nicholas; Koch, Werner; Meitinger, Thomas; Peters, Annette; Lieb, Wolfgang; Erbel, Raimund; Konig, Inke R; Kruppa, Jochen; Degenhardt, Franziska; Gottesman, Omri; Bottinger, Erwin P; O'Donnell, Christopher J; Psaty, Bruce M; Ballantyne, Christie M; Abecasis, Goncalo; Ordovas, Jose M; Melander, Olle; Watkins, Hugh; Orho-Melander, Marju; Ardissino, Diego; Loos, Ruth J F; McPherson, Ruth; Willer, Cristen J; Erdmann, Jeanette; Hall, Alistair S; Samani, Nilesh J; Deloukas, Panos; Schunkert, Heribert; Wilson, James G; Kooperberg, Charles; Rich, Stephen S; Tracy, Russell P; Lin, Dan-Yu; Altshuler, David; Gabriel, Stacey; Nickerson, Deborah A; Jarvik, Gail P; Cupples, L Adrienne; Reiner, Alex P; Boerwinkle, Eric; Kathiresan, Sekar

2014-07-03

Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype. We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project. We conducted tests to determine whether rare mutations in coding sequence, individually or in aggregate within a gene, were associated with plasma triglyceride levels. For mutations associated with triglyceride levels, we subsequently evaluated their association with the risk of coronary heart disease in 110,970 persons. An aggregate of rare mutations in the gene encoding apolipoprotein C3 (APOC3) was associated with lower plasma triglyceride levels. Among the four mutations that drove this result, three were loss-of-function mutations: a nonsense mutation (R19X) and two splice-site mutations (IVS2+1G→A and IVS3+1G→T). The fourth was a missense mutation (A43T). Approximately 1 in 150 persons in the study was a heterozygous carrier of at least one of these four mutations. Triglyceride levels in the carriers were 39% lower than levels in noncarriers (P<1×10(-20)), and circulating levels of APOC3 in carriers were 46% lower than levels in noncarriers (P=8×10(-10)). The risk of coronary heart disease among 498 carriers of any rare APOC3 mutation was 40% lower than the risk among 110,472 noncarriers (odds ratio, 0.60; 95% confidence interval, 0.47 to 0.75; P=4×10(-6)). Rare mutations that disrupt APOC3 function were associated with lower levels of plasma triglycerides and APOC3. Carriers of these mutations were found to have a reduced risk of coronary heart disease. (Funded by the National Heart, Lung, and Blood Institute and others.).
Comparative analysis of long non-coding RNAs in Atlantic and Coho salmon reveals divergent transcriptome responses associated with immunity and tissue repair during sea lice infestation.

PubMed

Valenzuela-Muñoz, Valentina; Valenzuela-Miranda, Diego; Gallardo-Escárate, Cristian

2018-05-24

The increasing capacity of transcriptomic analysis by high throughput sequencing has highlighted the presence of a large proportion of transcripts that do not encode proteins. In particular, long non-coding RNAs (lncRNAs) are sequences with low coding potential and conservation among species. Moreover, cumulative evidence has revealed important roles in post-transcriptional gene modulation in several taxa. In fish, the role of lncRNAs has been scarcely studied and even less so during the immune response against sea lice. In the present study we mined for lncRNAs in Atlantic salmon (Salmo salar) and Coho salmon (Oncorhynkus kisutch), which are affected by the sea louse Caligus rogercresseyi, evaluating the degree of sequence conservation between these two fish species and their putative roles during the infection process. Herein, Atlantic and Coho salmon were infected with 35 lice/fish and evaluated after 7 and 14 days post-infestation (dpi). For RNA sequencing, samples from skin and head kidney were collected. A total of 5658/4140 and 3678/2123 lncRNAs were identified in uninfected/infected Atlantic and Coho salmon transcriptomes, respectively. Species-specific transcription patterns were observed in exclusive lncRNAs according to the tissue analyzed. Furthermore, neighbor gene GO enrichment analysis of the top 100 highly regulated lncRNAs in Atlantic salmon showed that lncRNAs were localized near genes related to the immune response. On the other hand, in Coho salmon the highly regulated lncRNAs were localized near genes involved in tissue repair processes. This study revealed high regulation of lncRNAs closely localized to immune and tissue repair-related genes in Atlantic and Coho salmon, respectively, suggesting putative roles for lncRNAs in salmon against sea lice infestation. Copyright © 2018 Elsevier Ltd. All rights reserved.
Loss-of-Function Mutations in APOC3, Triglycerides, and Coronary Disease

PubMed Central

2014-01-01

Background Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype. Methods We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project. We conducted tests to determine whether rare mutations in coding sequence, individually or in aggregate within a gene, were associated with plasma triglyceride levels. For mutations associated with triglyceride levels, we subsequently evaluated their association with the risk of coronary heart disease in 110,970 persons. Results An aggregate of rare mutations in the gene encoding apolipoprotein C3 (APOC3) was associated with lower plasma triglyceride levels. Among the four mutations that drove this result, three were loss-of-function mutations: a nonsense mutation (R19X) and two splice-site mutations (IVS2+1G→A and IVS3+1G→T). The fourth was a missense mutation (A43T). Approximately 1 in 150 persons in the study was a heterozygous carrier of at least one of these four mutations. Triglyceride levels in the carriers were 39% lower than levels in noncarriers (P<1×10−20), and circulating levels of APOC3 in carriers were 46% lower than levels in noncarriers (P = 8×10−10). The risk of coronary heart disease among 498 carriers of any rare APOC3 mutation was 40% lower than the risk among 110,472 noncarriers (odds ratio, 0.60; 95% confidence interval, 0.47 to 0.75; P = 4×10−6). Conclusions Rare mutations that disrupt APOC3 function were associated with lower levels of plasma triglycerides and APOC3. Carriers of these mutations were found to have a reduced risk of coronary heart disease. (Funded by the National Heart, Lung, and Blood Institute and others.) PMID:24941081
Novel numerical and graphical representation of DNA sequences and proteins.

PubMed

Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

2006-12-01

We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

PubMed

Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

2012-07-01

Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.

The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

PubMed Central

Fanning, T; Singer, M

1987-01-01

Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227
Working memory for pitch, timbre, and words

PubMed Central

Tillmann, Barbara

2012-01-01

Aiming to further our understanding of fundamental mechanisms of auditory working memory (WM), the present study compared performance for three auditory materials (words, tones, timbres). In a forward recognition task (Experiment 1) participants indicated whether the order of the items in the second sequence was the same as in the first sequence. In a backward recognition task (Experiment 2) participants indicated whether the items of the second sequence were played in the correct backward order. In Experiment 3 participants performed an articulatory suppression task during the retention delay of the backward task. To investigate potential length effects the number of items per sequence was manipulated. Overall findings underline the benefit of a cross-material experimental approach and suggest that human auditory WM is not a unitary system. Whereas WM processes for timbres differed from those for tones and words, similarities and differences were observed for words and tones: Both types of stimuli appear to rely on rehearsal mechanisms, but might differ in the involved sensorimotor codes. PMID:23116413
Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production.

PubMed

Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata

2013-12-20

Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.
Swine and Poultry Pathogens: the Complete Genome Sequences of Two Strains of Mycoplasma hyopneumoniae and a Strain of Mycoplasma synoviae†

PubMed Central

Vasconcelos, Ana Tereza R.; Ferreira, Henrique B.; Bizarro, Cristiano V.; Bonatto, Sandro L.; Carvalho, Marcos O.; Pinto, Paulo M.; Almeida, Darcy F.; Almeida, Luiz G. P.; Almeida, Rosana; Alves-Filho, Leonardo; Assunção, Enedina N.; Azevedo, Vasco A. C.; Bogo, Maurício R.; Brigido, Marcelo M.; Brocchi, Marcelo; Burity, Helio A.; Camargo, Anamaria A.; Camargo, Sandro S.; Carepo, Marta S.; Carraro, Dirce M.; de Mattos Cascardo, Júlio C.; Castro, Luiza A.; Cavalcanti, Gisele; Chemale, Gustavo; Collevatti, Rosane G.; Cunha, Cristina W.; Dallagiovanna, Bruno; Dambrós, Bibiana P.; Dellagostin, Odir A.; Falcão, Clarissa; Fantinatti-Garboggini, Fabiana; Felipe, Maria S. S.; Fiorentin, Laurimar; Franco, Gloria R.; Freitas, Nara S. A.; Frías, Diego; Grangeiro, Thalles B.; Grisard, Edmundo C.; Guimarães, Claudia T.; Hungria, Mariangela; Jardim, Sílvia N.; Krieger, Marco A.; Laurino, Jomar P.; Lima, Lucymara F. A.; Lopes, Maryellen I.; Loreto, Élgion L. S.; Madeira, Humberto M. F.; Manfio, Gilson P.; Maranhão, Andrea Q.; Martinkovics, Christyanne T.; Medeiros, Sílvia R. B.; Moreira, Miguel A. M.; Neiva, Márcia; Ramalho-Neto, Cicero E.; Nicolás, Marisa F.; Oliveira, Sergio C.; Paixão, Roger F. C.; Pedrosa, Fábio O.; Pena, Sérgio D. J.; Pereira, Maristela; Pereira-Ferrari, Lilian; Piffer, Itamar; Pinto, Luciano S.; Potrich, Deise P.; Salim, Anna C. M.; Santos, Fabrício R.; Schmitt, Renata; Schneider, Maria P. C.; Schrank, Augusto; Schrank, Irene S.; Schuck, Adriana F.; Seuanez, Hector N.; Silva, Denise W.; Silva, Rosane; Silva, Sérgio C.; Soares, Célia M. A.; Souza, Kelly R. L.; Souza, Rangel C.; Staats, Charley C.; Steffens, Maria B. R.; Teixeira, Santuza M. R.; Urmenyi, Turan P.; Vainstein, Marilene H.; Zuccherato, Luciana W.; Simpson, Andrew J. G.; Zaha, Arnaldo

2005-01-01

This work reports the results of analyses of three complete mycoplasma genomes, a pathogenic (7448) and a nonpathogenic (J) strain of the swine pathogen Mycoplasma hyopneumoniae and a strain of the avian pathogen Mycoplasma synoviae; the genome sizes of the three strains were 920,079 bp, 897,405 bp, and 799,476 bp, respectively. These genomes were compared with other sequenced mycoplasma genomes reported in the literature to examine several aspects of mycoplasma evolution. Strain-specific regions, including integrative and conjugal elements, and genome rearrangements and alterations in adhesin sequences were observed in the M. hyopneumoniae strains, and all of these were potentially related to pathogenicity. Genomic comparisons revealed that reduction in genome size implied loss of redundant metabolic pathways, with maintenance of alternative routes in different species. Horizontal gene transfer was consistently observed between M. synoviae and Mycoplasma gallisepticum. Our analyses indicated a likely transfer event of hemagglutinin-coding DNA sequences from M. gallisepticum to M. synoviae. PMID:16077101
Characterization of an endogenous retrovirus class in elephants and their relatives

PubMed Central

Greenwood, Alex D; Englbrecht, Claudia C; MacPhee, Ross DE

2004-01-01

Background Endogenous retrovirus-like elements (ERV-Ls, primed with tRNA leucine) are a diverse group of reiterated sequences related to foamy viruses and widely distributed among mammals. As shown in previous investigations, in many primates and rodents this class of elements has remained transpositionally active, as reflected by increased copy number and high sequence diversity within and among taxa. Results Here we examine whether proviral-like sequences may be suitable molecular probes for investigating the phylogeny of groups known to have high element diversity. As a test we characterized ERV-Ls occurring in a sample of extant members of superorder Uranotheria (Asian and African elephants, manatees, and hyraxes). The ERV-L complement in this group is even more diverse than previously suspected, and there is sequence evidence for active expansion, particularly in elephantids. Many of the elements characterized have protein coding potential suggestive of activity. Conclusions In general, the evidence supports the hypothesis that the complement had a single origin within basal Uranotheria. PMID:15476555
Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined.

PubMed

Falkner, Jayson; Andrews, Philip

2005-05-15

Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
5’-Terminal AUGs in Escherichia coli mRNAs with Shine-Dalgarno Sequences: Identification and Analysis of Their Roles in Non-Canonical Translation Initiation

PubMed Central

Beck, Heather J.; Fleming, Ian M. C.

2016-01-01

Analysis of the Escherichia coli transcriptome identified a unique subset of messenger RNAs (mRNAs) that contain a conventional untranslated leader and Shine-Dalgarno (SD) sequence upstream of the gene’s start codon while also containing an AUG triplet at the mRNA’s 5’- terminus (5’-uAUG). Fusion of the coding sequence specified by the 5’-terminal putative AUG start codon to a lacZ reporter gene, as well as primer extension inhibition assays, reveal that the majority of the 5’-terminal upstream open reading frames (5’-uORFs) tested support some level of lacZ translation, indicating that these mRNAs can function both as leaderless and canonical SD-leadered mRNAs. Although some of the uORFs were expressed at low levels, others were expressed at levels close to that of the respective downstream genes and as high as the naturally leaderless cI mRNA of bacteriophage λ. These 5’-terminal uORFs potentially encode peptides of varying lengths, but their functions, if any, are unknown. In an effort to determine whether expression from the 5’-terminal uORFs impact expression of the immediately downstream cistron, we examined expression from the downstream coding sequence after mutations were introduced that inhibit efficient 5’-uORF translation. These mutations were found to affect expression from the downstream cistrons to varying degrees, suggesting that some 5’-uORFs may play roles in downstream regulation. Since the 5’-uAUGs found on these conventionally leadered mRNAs can function to bind ribosomes and initiate translation, this indicates that canonical mRNAs containing 5’-uAUGs should be examined for their potential to function also as leaderless mRNAs. PMID:27467758
Identification of medicinal plants in the family Fabaceae using a potential DNA barcode ITS2.

PubMed

Gao, Ting; Yao, Hui; Song, Jingyuan; Liu, Chang; Zhu, Yingjie; Ma, Xinye; Pang, Xiaohui; Xu, Hongxi; Chen, Shilin

2010-07-06

To test whether the ITS2 region is an effective marker for use in authenticating of the family Fabaceae which contains many important medicinal plants. The ITS2 regions of 114 samples in Fabaceae were amplified. Sequence assembly was assembled by CodonCode Aligner V3.0. In combination with sequences from public database, the sequences were aligned by Clustal W, and genetic distances were computed using MEGA V4.0. The intra- vs. inter-specific variations were assessed by six metrics, wilcoxon two-sample tests and "barcoding gaps". Species identification was accomplished using TaxonGAP V2.4, BLAST1 and the nearest distance method. ITS2 sequences had considerable variation at the genus and species level. The intra-specific divergence ranged from 0% to 14.4%, with an average of 1.7%, and the inter-specific divergence ranged from 0% to 63.0%, with an average of 8.6%. Twenty-four species found in the Chinese Pharmacopoeia, along with another 66 species including their adulterants, were successfully identified based on ITS2 sequences. In addition, ITS2 worked well, with over 80.0% of species and 100% of genera being correctly differentiated for the 1507 sequences derived from 1126 species belonging to 196 genera. Our findings support the notion that ITS2 can be used as an efficient and powerful marker and a potential barcode to distinguish various species in Fabaceae. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.
Origins of genes: "big bang" or continuous creation?

PubMed Central

Keese, P K; Gibbs, A

1992-01-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes. PMID:1329098
Abstract feature codes: The building blocks of the implicit learning system.

PubMed

Eberhardt, Katharina; Esser, Sarah; Haider, Hilde

2017-07-01

According to the Theory of Event Coding (TEC; Hommel, Müsseler, Aschersleben, & Prinz, 2001), action and perception are represented in a shared format in the cognitive system by means of feature codes. In implicit sequence learning research, it is still common to make a conceptual difference between independent motor and perceptual sequences. This supposedly independent learning takes place in encapsulated modules (Keele, Ivry, Mayr, Hazeltine, & Heuer 2003) that process information along single dimensions. These dimensions have remained underspecified so far. It is especially not clear whether stimulus and response characteristics are processed in separate modules. Here, we suggest that feature dimensions as they are described in the TEC should be viewed as the basic content of modules of implicit learning. This means that the modules process all stimulus and response information related to certain feature dimensions of the perceptual environment. In 3 experiments, we investigated by means of a serial reaction time task the nature of the basic units of implicit learning. As a test case, we used stimulus location sequence learning. The results show that a stimulus location sequence and a response location sequence cannot be learned without interference (Experiment 2) unless one of the sequences can be coded via an alternative, nonspatial dimension (Experiment 3). These results support the notion that spatial location is one module of the implicit learning system and, consequently, that there are no separate processing units for stimulus versus response locations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Upolu virus and Aransas Bay virus, Two Presumptive Bunyaviruses, Are Novel Members of the Family Orthomyxoviridae

PubMed Central

Chowdhary, Rashmi; Travassos da Rosa, Amelia; Hutchison, Stephen K.; Popov, Vsevolod; Street, Craig; Tesh, Robert B.; Lipkin, W. Ian

2014-01-01

ABSTRACT Emerging and zoonotic pathogens pose continuing threats to human health and ongoing challenges to diagnostics. As nucleic acid tests are playing increasingly prominent roles in diagnostics, the genetic characterization of molecularly uncharacterized agents is expected to significantly enhance detection and surveillance capabilities. We report the identification of two previously unrecognized members of the family Orthomyxoviridae, which includes the influenza viruses and the tick-transmitted Thogoto and Dhori viruses. We provide morphological, serologic, and genetic evidence that Upolu virus (UPOV) from Australia and Aransas Bay virus (ABV) from North America, both previously considered potential bunyaviruses based on electron microscopy and physicochemical features, are orthomyxoviruses instead. Their genomes show up to 68% nucleotide sequence identity to Thogoto virus (segment 2; ∼74% at the amino acid level) and a more distant relationship to Dhori virus, the two prototype viruses of the recognized species of the genus Thogotovirus. Despite sequence similarity, the coding potentials of UPOV and ABV differed from that of Thogoto virus, instead being like that of Dhori virus. Our findings suggest that the tick-transmitted viruses UPOV and ABV represent geographically distinct viruses in the genus Thogotovirus of the family Orthomyxoviridae that do not fit in the two currently recognized species of this genus. IMPORTANCE Upolu virus (UPOV) and Aransas Bay virus (ABV) are shown to be orthomyxoviruses instead of bunyaviruses, as previously thought. Genetic characterization and adequate classification of agents are paramount in this molecular age to devise appropriate surveillance and diagnostics. Although more closely related to Thogoto virus by sequence, UPOV and ABV differ in their coding potentials by lacking a proposed pathogenicity factor. In this respect, they are similar to Dhori virus, which, despite the lack of a pathogenicity factor, can cause disease. These findings enable further studies into the evolution and pathogenicity of orthomyxoviruses. PMID:24574415
Portable and Error-Free DNA-Based Data Storage.

PubMed

Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

2017-07-10

DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
Adaptive decoding of convolutional codes

NASA Astrophysics Data System (ADS)

Hueske, K.; Geldmacher, J.; Götze, J.

2007-06-01

Convolutional codes, which are frequently used as error correction codes in digital transmission systems, are generally decoded using the Viterbi Decoder. On the one hand the Viterbi Decoder is an optimum maximum likelihood decoder, i.e. the most probable transmitted code sequence is obtained. On the other hand the mathematical complexity of the algorithm only depends on the used code, not on the number of transmission errors. To reduce the complexity of the decoding process for good transmission conditions, an alternative syndrome based decoder is presented. The reduction of complexity is realized by two different approaches, the syndrome zero sequence deactivation and the path metric equalization. The two approaches enable an easy adaptation of the decoding complexity for different transmission conditions, which results in a trade-off between decoding complexity and error correction performance.
Optimization of algorithm of coding of genetic information of Chlamydia

NASA Astrophysics Data System (ADS)

Feodorova, Valentina A.; Ulyanov, Sergey S.; Zaytsev, Sergey S.; Saltykov, Yury V.; Ulianova, Onega V.

2018-04-01

New method of coding of genetic information using coherent optical fields is developed. Universal technique of transformation of nucleotide sequences of bacterial gene into laser speckle pattern is suggested. Reference speckle patterns of the nucleotide sequences of omp1 gene of typical wild strains of Chlamydia trachomatis of genovars D, E, F, G, J and K and Chlamydia psittaci serovar I as well are generated. Algorithm of coding of gene information into speckle pattern is optimized. Fully developed speckles with Gaussian statistics for gene-based speckles have been used as criterion of optimization.
Simulated Assessment of Interference Effects in Direct Sequence Spread Spectrum (DSSS) QPSK Receiver

DTIC Science & Technology

2014-03-27

bit error rate BPSK binary phase shift keying CDMA code division multiple access CSI comb spectrum interference CW continuous wave DPSK differential... CDMA ) and GPS systems which is a Gold code. This code is generated by a modulo-2 operation between two different preferred m-sequences. The preferred m...10 SNR Sim (dB) S N R O ut ( dB ) SNR RF SNR DS Figure 3.26: Comparison of input S NRS im and S NROut of the band-pass RF filter (S NRRF) and
Decoding the complex genetic causes of heart diseases using systems biology.

PubMed

Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K

2015-03-01

The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
Hiding message into DNA sequence through DNA coding and chaotic maps.

PubMed

Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

2014-09-01

The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
Physics behind the mechanical nucleosome positioning code

NASA Astrophysics Data System (ADS)

Zuiddam, Martijn; Everaers, Ralf; Schiessel, Helmut

2017-11-01

The positions along DNA molecules of nucleosomes, the most abundant DNA-protein complexes in cells, are influenced by the sequence-dependent DNA mechanics and geometry. This leads to the "nucleosome positioning code", a preference of nucleosomes for certain sequence motives. Here we introduce a simplified model of the nucleosome where a coarse-grained DNA molecule is frozen into an idealized superhelical shape. We calculate the exact sequence preferences of our nucleosome model and find it to reproduce qualitatively all the main features known to influence nucleosome positions. Moreover, using well-controlled approximations to this model allows us to come to a detailed understanding of the physics behind the sequence preferences of nucleosomes.
EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

PubMed Central

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-01-01

EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408
An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

PubMed

Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

2011-01-01

cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.

PubMed Central

Gustafson, G; Armour, S L

1986-01-01

The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962
Colour cyclic code for Brillouin distributed sensors

NASA Astrophysics Data System (ADS)

Le Floch, Sébastien; Sauser, Florian; Llera, Miguel; Rochat, Etienne

2015-09-01

For the first time, a colour cyclic coding (CCC) is theoretically and experimentally demonstrated for Brillouin optical time-domain analysis (BOTDA) distributed sensors. Compared to traditional intensity-modulated cyclic codes, the code presents an additional gain of √2 while keeping the same number of sequences as for a colour coding. A comparison with a standard BOTDA sensor is realized and validates the theoretical coding gain.
The genome sequence of Dyella jiangningensis FCAV SCS01 from a lignocellulose-decomposing microbial consortium metagenome reveals potential for biotechnological applications.

PubMed

Desiderato, Joana G; Alvarenga, Danillo O; Constancio, Milena T L; Alves, Lucia M C; Varani, Alessandro M

2018-05-14

Cellulose and its associated polymers are structural components of the plant cell wall, constituting one of the major sources of carbon and energy in nature. The carbon cycle is dependent on cellulose- and lignin-decomposing microbial communities and their enzymatic systems acting as consortia. These microbial consortia are under constant exploration for their potential biotechnological use. Herein, we describe the characterization of the genome of Dyella jiangningensis FCAV SCS01, recovered from the metagenome of a lignocellulose-degrading microbial consortium, which was isolated from a sugarcane crop soil under mechanical harvesting and covered by decomposing straw. The 4.7 Mbp genome encodes 4,194 proteins, including 36 glycoside hydrolases (GH), supporting the hypothesis that this bacterium may contribute to lignocellulose decomposition. Comparative analysis among fully sequenced Dyella species indicate that the genome synteny is not conserved, and that D. jiangningensis FCAV SCS01 carries 372 unique genes, including an alpha-glucosidase and maltodextrin glucosidase coding genes, and other potential biomass degradation related genes. Additional genomic features, such as prophage-like, genomic islands and putative new biosynthetic clusters were also uncovered. Overall, D. jiangningensis FCAV SCS01 represents the first South American Dyella genome sequenced and shows an exclusive feature among its genus, related to biomass degradation.
Clinically actionable mutation profiles in patients with cancer identified by whole-genome sequencing

PubMed Central

Mizani, Tuba; Hamblin, Angela; Parton, Marina; Orosz, Zsolt; Athanasou, Nick; Hassan, Bass; Flanagan, Adrienne M.; Ahmed, Ahmed; Winter, Stuart; Harris, Adrian; Popitsch, Niko; Church, David; Taylor, Jenny C.

2018-01-01

Next-generation sequencing (NGS) efforts have established catalogs of mutations relevant to cancer development. However, the clinical utility of this information remains largely unexplored. Here, we present the results of the first eight patients recruited into a clinical whole-genome sequencing (WGS) program in the United Kingdom. We performed PCR-free WGS of fresh frozen tumors and germline DNA at 75× and 30×, respectively, using the HiSeq2500 HTv4. Subtracted tumor VCFs and paired germlines were subjected to comprehensive analysis of coding and noncoding regions, integration of germline with somatically acquired variants, and global mutation signatures and pathway analyses. Results were classified into tiers and presented to a multidisciplinary tumor board. WGS results helped to clarify an uncertain histopathological diagnosis in one case, led to informed or supported prognosis in two cases, leading to de-escalation of therapy in one, and indicated potential treatments in all eight. Overall 26 different tier 1 potentially clinically actionable findings were identified using WGS compared with six SNVs/indels using routine targeted NGS. These initial results demonstrate the potential of WGS to inform future diagnosis, prognosis, and treatment choice in cancer and justify the systematic evaluation of the clinical utility of WGS in larger cohorts of patients with cancer. PMID:29610388
Delineating slowly and rapidly evolving fractions of the Drosophila genome.

PubMed

Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

2008-05-01

Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
Metagenomic Analysis of the Microbiota from the Crop of an Invasive Snail Reveals a Rich Reservoir of Novel Genes

PubMed Central

Cardoso, Alexander M.; Cavalcante, Janaína J. V.; Cantão, Maurício E.; Thompson, Claudia E.; Flatschart, Roberto B.; Glogauer, Arnaldo; Scapin, Sandra M. N.; Sade, Youssef B.; Beltrão, Paulo J. M. S. I.; Gerber, Alexandra L.; Martins, Orlando B.; Garcia, Eloi S.; de Souza, Wanderley; Vasconcelos, Ana Tereza R.

2012-01-01

The shortage of petroleum reserves and the increase in CO2 emissions have raised global concerns and highlighted the importance of adopting sustainable energy sources. Second-generation ethanol made from lignocellulosic materials is considered to be one of the most promising fuels for vehicles. The giant snail Achatina fulica is an agricultural pest whose biotechnological potential has been largely untested. Here, the composition of the microbial population within the crop of this invasive land snail, as well as key genes involved in various biochemical pathways, have been explored for the first time. In a high-throughput approach, 318 Mbp of 454-Titanium shotgun metagenomic sequencing data were obtained. The predominant bacterial phylum found was Proteobacteria, followed by Bacteroidetes and Firmicutes. Viruses, Fungi, and Archaea were present to lesser extents. The functional analysis reveals a variety of microbial genes that could assist the host in the degradation of recalcitrant lignocellulose, detoxification of xenobiotics, and synthesis of essential amino acids and vitamins, contributing to the adaptability and wide-ranging diet of this snail. More than 2,700 genes encoding glycoside hydrolase (GH) domains and carbohydrate-binding modules were detected. When we compared GH profiles, we found an abundance of sequences coding for oligosaccharide-degrading enzymes (36%), very similar to those from wallabies and giant pandas, as well as many novel cellulase and hemicellulase coding sequences, which points to this model as a remarkable potential source of enzymes for the biofuel industry. Furthermore, this work is a major step toward the understanding of the unique genetic profile of the land snail holobiont. PMID:23133637
Unraveling the oral cancer lncRNAome: Identification of novel lncRNAs associated with malignant progression and HPV infection.

PubMed

Nohata, Nijiro; Abba, Martin C; Gutkind, J Silvio

2016-08-01

The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. Copyright © 2016 Elsevier Ltd. All rights reserved.
Unraveling the Oral Cancer lncRNAome: Identification of Novel lncRNAs Associated with Malignant Progression and HPV Infection

PubMed Central

Nohata, Nijiro; Abba, Martin C.; Gutkind, J. Silvio

2017-01-01

Objectives The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. Materials and Methods The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Results Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. Conclusion LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. PMID:27424183
Human pluripotent stem cells recurrently acquire and expand dominant negative P53 mutations

PubMed Central

Kamitaki, Nolan; Mitchell, Jana; Avior, Yishai; Mello, Curtis; Kashin, Seva; Mekhoubad, Shila; Ilic, Dusko; Charlton, Maura; Saphier, Genevieve; Handsaker, Robert E.; Genovese, Giulio; Bar, Shiran; Benvenisty, Nissim; McCarroll, Steven A.; Eggan, Kevin

2017-01-01

Human pluripotent stem cells (hPSCs) can self-renew indefinitely, making them an attractive source for regenerative therapies. This expansion potential has been linked with acquisition of large copy number variants (CNVs) that provide mutant cells with a growth advantage in culture1–3. However, the nature, extent, and functional impact of other acquired genome sequence mutations in cultured hPSCs is not known. Here, we sequenced the protein-coding genes (exomes) of 140 independent human embryonic stem cell (hESC) lines, including 26 lines prepared for potential clinical use4. We then applied computational strategies for identifying mutations present in a subset of cells5. Though such mosaic mutations were generally rare, we identified five unrelated hESC lines that carried six mutations in the TP53 gene that encodes the tumor suppressor P53. Notably, the TP53 mutations we observed are dominant negative and are the mutations most commonly seen in human cancers. We used droplet digital PCR to demonstrate that the TP53 mutant allelic fraction increased with passage number under standard culture conditions, suggesting that P53 mutation confers selective advantage. When we then mined published RNA sequencing data from 117 hPSC lines, we observed another nine TP53 mutations, all resulting in coding changes in the DNA binding domain of P53. Strikingly, in three lines, the allelic fraction exceeded 50%, suggesting additional selective advantage resulting from loss of heterozygosity at the TP53 locus. As the acquisition and favored expansion of cancer-associated mutations in hPSCs may go unnoticed during most applications, we suggest that careful genetic characterization of hPSCs and their differentiated derivatives should be carried out prior to clinical use. PMID:28445466
Nuclear Power Plant Cyber Security Discrete Dynamic Event Tree Analysis (LDRD 17-0958) FY17 Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wheeler, Timothy A.; Denman, Matthew R.; Williams, R. A.

Instrumentation and control of nuclear power is transforming from analog to modern digital assets. These control systems perform key safety and security functions. This transformation is occurring in new plant designs as well as in the existing fleet of plants as the operation of those plants is extended to 60 years. This transformation introduces new and unknown issues involving both digital asset induced safety issues and security issues. Traditional nuclear power risk assessment tools and cyber security assessment methods have not been modified or developed to address the unique nature of cyber failure modes and of cyber security threat vulnerabilities.more » iii This Lab-Directed Research and Development project has developed a dynamic cyber-risk in- formed tool to facilitate the analysis of unique cyber failure modes and the time sequencing of cyber faults, both malicious and non-malicious, and impose those cyber exploits and cyber faults onto a nuclear power plant accident sequence simulator code to assess how cyber exploits and cyber faults could interact with a plants digital instrumentation and control (DI&C) system and defeat or circumvent a plants cyber security controls. This was achieved by coupling an existing Sandia National Laboratories nuclear accident dynamic simulator code with a cyber emulytics code to demonstrate real-time simulation of cyber exploits and their impact on automatic DI&C responses. Studying such potential time-sequenced cyber-attacks and their risks (i.e., the associated impact and the associated degree of difficulty to achieve the attack vector) on accident management establishes a technical risk informed framework for developing effective cyber security controls for nuclear power.« less
Agouti sequence polymorphisms in coyotes, wolves and dogs suggest hybridization.

PubMed

Schmutz, Sheila M; Berryere, Thomas G; Barta, Jodi L; Reddick, Kimberley D; Schmutz, Josef K

2007-01-01

Domestic dogs have been shown to have multiple alleles of the Agouti Signal Peptide (ASIP) in exon 4 and we wished to determine the level of polymorphism in the common wild canids of Canada, wolves and coyotes, in comparison. All Canadian coyotes and most wolves have banded hairs. The ASIP coding sequence of the wolf did not vary from the domestic dog but one variant was detected in exon 4 of coyotes that did not alter the arginine at this position. Two other differences were found in the sequence flanking exon 4 of coyotes compared with the 45 dogs and 1 wolf. The coyotes also demonstrated a relatively common polymorphism in the 3' UTR sequence that could be used for population studies. One of the ASIP alleles (R96C) in domestic dogs causes a solid black coat color in homozygotes. Although some wolves are melanistic, this phenotype does not appear to be caused by this same mutation. However, one wolf, potentially a dog-wolf hybrid or descendant thereof, was heterozygous for this allele. Likewise 2 coyotes, potentially dog-coyote or wolf-coyote hybrid descendants, were heterozygous for the several polymorphisms in and flanking exon 4. We could conclude that these were coyote-dog hybrids because both were heterozygous for 2 mutations causing fawn coat color in dogs.
New encoded single-indicator sequences based on physico-chemical parameters for efficient exon identification.

PubMed

Meher, J K; Meher, P K; Dash, G N; Raval, M K

2012-01-01

The first step in gene identification problem based on genomic signal processing is to convert character strings into numerical sequences. These numerical sequences are then analysed spectrally or using digital filtering techniques for the period-3 peaks, which are present in exons (coding areas) and absent in introns (non-coding areas). In this paper, we have shown that single-indicator sequences can be generated by encoding schemes based on physico-chemical properties. Two new methods are proposed for generating single-indicator sequences based on hydration energy and dipole moments. The proposed methods produce high peak at exon locations and effectively suppress false exons (intron regions having greater peak than exon regions) resulting in high discriminating factor, sensitivity and specificity.
TANDEM: matching proteins with tandem mass spectra.

PubMed

Craig, Robertson; Beavis, Ronald C

2004-06-12

Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.
Ribosomal protein S14 transcripts are edited in Oenothera mitochondria.

PubMed Central

Schuster, W; Unseld, M; Wissinger, B; Brennicke, A

1990-01-01

The gene encoding ribosomal protein S14 (rps14) in Oenothera mitochondria is located upstream of the cytochrome b gene (cob). Sequence analysis of independently derived cDNA clones covering the entire rps14 coding region shows two nucleotides edited from the genomic DNA to the mRNA derived sequences by C to U modifications. A third editing event occurs four nucleotides upstream of the AUG initiation codon and improves a potential ribosome binding site. A CGG codon specifying arginine in a position conserved in evolution between chloroplasts and E. coli as a UGG tryptophan codon is not edited in any of the cDNAs analysed. An inverted repeat 3' of an unidentified open reading frame is located upstream of the rps14 gene. The inverted repeat sequence is highly conserved at analogous regions in other Oenothera mitochondrial loci. Images PMID:2326162
Complete Sequence and Comparative Analysis of the Chloroplast Genome of Coconut Palm (Cocos nucifera)

PubMed Central

Huang, Ya-Yi; Matzke, Antonius J. M.; Matzke, Marjori

2013-01-01

Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available. PMID:24023703
Transposable elements and G-quadruplexes.

PubMed

Kejnovsky, Eduard; Tokan, Viktor; Lexa, Matej

2015-09-01

A significant part of eukaryotic genomes is formed by transposable elements (TEs) containing not only genes but also regulatory sequences. Some of the regulatory sequences located within TEs can form secondary structures like hairpins or three-stranded (triplex DNA) and four-stranded (quadruplex DNA) conformations. This review focuses on recent evidence showing that G-quadruplex-forming sequences in particular are often present in specific parts of TEs in plants and humans. We discuss the potential role of these structures in the TE life cycle as well as the impact of G-quadruplexes on replication, transcription, translation, chromatin status, and recombination. The aim of this review is to emphasize that TEs may serve as vehicles for the genomic spread of G-quadruplexes. These non-canonical DNA structures and their conformational switches may constitute another regulatory system that, together with small and long non-coding RNA molecules and proteins, contribute to the complex cellular network resulting in the large diversity of eukaryotes.
Complete sequence and comparative analysis of the chloroplast genome of coconut palm (Cocos nucifera).

PubMed

Huang, Ya-Yi; Matzke, Antonius J M; Matzke, Marjori

2013-01-01

Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available.
Draft Genome Sequence of the Deinococcus-Thermus Bacterium Meiothermus ruber Strain A

DOE PAGES

Thiel, Vera; Tomsho, Lynn P.; Burhans, Richard; ...

2015-03-26

The draft genome sequence of the Deinococcus-Thermus group bacterium Meiothermus ruber strain A, isolated from a cyanobacterial enrichment culture obtained from Octopus Spring (Yellowstone National Park, WY), comprises 2,968,099 bp in 170 contigs. It is predicted to contain 2,895 protein-coding genes, 44 tRNA-coding genes, and 2 rRNA operons.
Draft Genome Sequence of Staphylococcus cohnii subsp. urealyticus Isolated from a Healthy Dog

PubMed Central

Wigmore, Sarah M.; Wareham, David W.

2017-01-01

ABSTRACT Staphylococcus cohnii subsp. urealyticus strain SW120 was isolated from the ear swab of a healthy dog. The isolate is resistant to methicillin and fusidic acid. The SW120 draft genome is 2,805,064 bp and contains 2,667 coding sequences, including 58 tRNAs and nine complete rRNA coding regions. PMID:28209829
Draft Genome Sequence of a Canine Isolate of Methicillin-Resistant Staphylococcus haemolyticus

PubMed Central

Wigmore, Sarah M.; Wareham, David W.

2017-01-01

ABSTRACT Staphylococcus haemolyticus strain SW007 was isolated from a nasal swab taken from a healthy dog. The isolate is resistant to methicillin, mupirocin, macrolides, and sulfonamides. The SW007 draft genome is 2,325,410 bp and contains 2,277 coding sequences, including 60 tRNAs and nine complete rRNA-coding regions. PMID:28385855

Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

USDA-ARS?s Scientific Manuscript database

Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Sequencing and comparison of the Rickettsia genomes from the whitefly Bemisia tabaci Middle East Asia Minor I.

PubMed

Zhu, Dan-Tong; Xia, Wen-Qiang; Rao, Qiong; Liu, Shu-Sheng; Ghanim, Murad; Wang, Xiao-Wei

2016-08-01

The whitefly, Bemisia tabaci, harbors the primary symbiont 'Candidatus Portiera aleyrodidarum' and a variety of secondary symbionts. Among these secondary symbionts, Rickettsia is the only one that can be detected both inside and outside the bacteriomes. Infection with Rickettsia has been reported to influence several aspects of the whitefly biology, such as fitness, sex ratio, virus transmission and resistance to pesticides. However, mechanisms underlying these differences remain unclear, largely due to the lack of genomic information of Rickettsia. In this study, we sequenced the genome of two Rickettsia strains isolated from the Middle East Asia Minor 1 (MEAM1) species of the B. tabaci complex in China and Israel. Both Rickettsia genomes were of high coding density and AT-rich, containing more than 1000 coding sequences, much larger than that of the coexisted primary symbiont, Portiera. Moreover, the two Rickettsia strains isolated from China and Israel shared most of the genes with 100% identity and only nine genes showed sequence differences. The phylogenetic analysis using orthologs shared in the genus, inferred the proximity of Rickettsia in MEAM1 and Rickettsia bellii. Functional analysis revealed that Rickettsia was unable to synthesize amino acids required for complementing the whitefly nutrition. Besides, a type IV secretion system and a number of virulence-related genes were detected in the Rickettsia genome. The presence of virulence-related genes might benefit the symbiotic life of the bacteria, and hint on potential effects of Rickettsia on whiteflies. The genome sequences of Rickettsia provided a basis for further understanding the function of Rickettsia in whiteflies. © 2016 Institute of Zoology, Chinese Academy of Sciences.
Homoplastic microinversions and the avian tree of life

PubMed Central

2011-01-01

Background Microinversions are cytologically undetectable inversions of DNA sequences that accumulate slowly in genomes. Like many other rare genomic changes (RGCs), microinversions are thought to be virtually homoplasy-free evolutionary characters, suggesting that they may be very useful for difficult phylogenetic problems such as the avian tree of life. However, few detailed surveys of these genomic rearrangements have been conducted, making it difficult to assess this hypothesis or understand the impact of microinversions upon genome evolution. Results We surveyed non-coding sequence data from a recent avian phylogenetic study and found substantially more microinversions than expected based upon prior information about vertebrate inversion rates, although this is likely due to underestimation of these rates in previous studies. Most microinversions were lineage-specific or united well-accepted groups. However, some homoplastic microinversions were evident among the informative characters. Hemiplasy, which reflects differences between gene trees and the species tree, did not explain the observed homoplasy. Two specific loci were microinversion hotspots, with high numbers of inversions that included both the homoplastic as well as some overlapping microinversions. Neither stem-loop structures nor detectable sequence motifs were associated with microinversions in the hotspots. Conclusions Microinversions can provide valuable phylogenetic information, although power analysis indicates that large amounts of sequence data will be necessary to identify enough inversions (and similar RGCs) to resolve short branches in the tree of life. Moreover, microinversions are not perfect characters and should be interpreted with caution, just as with any other character type. Independent of their use for phylogenetic analyses, microinversions are important because they have the potential to complicate alignment of non-coding sequences. Despite their low rate of accumulation, they have clearly contributed to genome evolution, suggesting that active identification of microinversions will prove useful in future phylogenomic studies. PMID:21612607
FunGene: the functional gene pipeline and repository.

PubMed

Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

2013-01-01

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

PubMed Central

Kikhno, Irina

2014-01-01

Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
Characterization of the complete mitochondrial genome of Marshallagia marshalli and phylogenetic implications for the superfamily Trichostrongyloidea.

PubMed

Sun, Miao-Miao; Han, Liang; Zhang, Fu-Kai; Zhou, Dong-Hui; Wang, Shu-Qing; Ma, Jun; Zhu, Xing-Quan; Liu, Guo-Hua

2018-01-01

Marshallagia marshalli (Nematoda: Trichostrongylidae) infection can lead to serious parasitic gastroenteritis in sheep, goat, and wild ruminant, causing significant socioeconomic losses worldwide. Up to now, the study concerning the molecular biology of M. marshalli is limited. Herein, we sequenced the complete mitochondrial (mt) genome of M. marshalli and examined its phylogenetic relationship with selected members of the superfamily Trichostrongyloidea using Bayesian inference (BI) based on concatenated mt amino acid sequence datasets. The complete mt genome sequence of M. marshalli is 13,891 bp, including 12 protein-coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes. All protein-coding genes are transcribed in the same direction. Phylogenetic analyses based on concatenated amino acid sequences of the 12 protein-coding genes supported the monophylies of the families Haemonchidae, Molineidae, and Dictyocaulidae with strong statistical support, but rejected the monophyly of the family Trichostrongylidae. The determination of the complete mt genome sequence of M. marshalli provides novel genetic markers for studying the systematics, population genetics, and molecular epidemiology of M. marshalli and its congeners.
The mitochondrial genome of the multicolored Asian lady beetle Harmonia axyridis (Pallas) and a phylogenetic analysis of the Polyphaga (Insecta: Coleoptera).

PubMed

Niu, Fang-Fang; Zhu, Liang; Wang, Su; Wei, Shu-Jun

2016-07-01

Here, we report the mitochondrial genome sequence of the multicolored Asian lady beetle Harmonia axyridis (Pallas, 1773) (Coleoptera: Coccinellidae) (GenBank accession No. KR108208). This is the first species with sequenced mitochondrial genome from the genus Harmonia. The current length with partitial A + T-rich region of this mitochondrial genome is 16,387 bp. All the typical genes were sequenced except the trnI and trnQ. As in most other sequenced mitochondrial genomes of Coleoptera, there is no re-arrangement in the sequenced region compared with the pupative ancestral arrangement of insects. All protein-coding genes start with ATN codons. Five, five and three protein-coding genes stop with termination codon TAA, TA and T, respectively. Phylogenetic analysis using Bayesian method based on the first and second codon positions of the protein-coding genes supported that the Scirtidae is a basal lineage of Polyphaga. The Harmonia and the Coccinella form a sister lineage. The monophyly of Staphyliniformia, Scarabaeiformia and Cucujiformia was supported. The Buprestidae was found to be a sister group to the Bostrichiformia.
Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

PubMed

Shao, Renfu; Barker, Stephen C

2011-02-15

The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

PubMed Central

2014-01-01

Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Cloning and Sequencing of Defective Particles Derived from the Autonomous Parvovirus Minute Virus of Mice for the Construction of Vectors with Minimal cis-Acting Sequences

PubMed Central

Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick

2001-01-01

The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501
Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India).

PubMed

Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

2016-03-01

16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85'N and 78°25'E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/.
On Asymptotically Good Ramp Secret Sharing Schemes

NASA Astrophysics Data System (ADS)

Geil, Olav; Martin, Stefano; Martínez-Peñas, Umberto; Matsumoto, Ryutaroh; Ruano, Diego

Asymptotically good sequences of linear ramp secret sharing schemes have been intensively studied by Cramer et al. in terms of sequences of pairs of nested algebraic geometric codes. In those works the focus is on full privacy and full reconstruction. In this paper we analyze additional parameters describing the asymptotic behavior of partial information leakage and possibly also partial reconstruction giving a more complete picture of the access structure for sequences of linear ramp secret sharing schemes. Our study involves a detailed treatment of the (relative) generalized Hamming weights of the considered codes.
Statistical and linguistic features of DNA sequences

NASA Technical Reports Server (NTRS)

Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1995-01-01

We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

PubMed

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-02-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

PubMed

Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

2010-12-15

Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

PubMed Central

Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

2014-01-01

Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396
The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

PubMed

Shen, Jinhui; Cong, Qian; Grishin, Nick V

2015-09-01

Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.
Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India

PubMed Central

Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.

2016-01-01

Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58′ N; 76° 31′ E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species. PMID:27141529
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.

PubMed

Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel

2013-09-01

RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.