Science.gov

Sample records for coding sequence incompleteness

  1. Systematic analysis of mRNA 5' coding sequence incompleteness in Danio rerio: an automated EST-based approach

    PubMed Central

    Frabetti, Flavia; Casadei, Raffaella; Lenzi, Luca; Canaider, Silvia; Vitale, Lorenza; Facchin, Federica; Carinci, Paolo; Zannotti, Maria; Strippoli, Pierluigi

    2007-01-01

    Background All standard methods for cDNA cloning are affected by a potential inability to effectively clone the 5' region of mRNA. The aim of this work was to estimate mRNA open reading frame (ORF) 5' region sequence completeness in the model organism Danio rerio (zebrafish). Results We implemented a novel automated approach (5'_ORF_Extender) that systematically compares available expressed sequence tags (ESTs) with all the zebrafish experimentally determined mRNA sequences, identifies additional sequence stretches at 5' region and scans for the presence of all conditions needed to define a new, extended putative ORF. Our software was able to identify 285 (3.3%) mRNAs with putatively incomplete ORFs at 5' region and, in three example cases selected (selt1a, unc119.2, nppa), the extended coding region at 5' end was cloned by reverse transcription-polymerase chain reaction (RT-PCR). Conclusion The implemented method, which could also be useful for the analysis of other genomes, allowed us to describe the relevance of the "5' end mRNA artifact" problem for genomic annotation and functional genomic experiment design in zebrafish. Open peer review This article was reviewed by Alexey V. Kochetov (nominated by Mikhail Gelfand), Shamil Sunyaev, and Gáspár Jékely. For the full reviews, please go to the Reviewers' Comments section. PMID:18042283

  2. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  3. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  4. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  5. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts.

    PubMed

    Sun, Liang; Luo, Haitao; Bu, Dechao; Zhao, Guoguang; Yu, Kuntao; Zhang, Changhai; Liu, Yuanning; Chen, Runsheng; Zhao, Yi

    2013-09-01

    It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci. PMID:23892401

  6. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  7. Fingerprinting Codes for Internet-Based Live Pay-TV System Using Balanced Incomplete Block Designs

    NASA Astrophysics Data System (ADS)

    Hou, Shuhui; Uehara, Tetsutaro; Satoh, Takashi; Morimura, Yoshitaka; Minoh, Michihiko

    In recent years, with the rapid growth of the Internet as well as the increasing demand for broadband services, live pay-television broadcasting via the Internet has become a promising business. To get this implemented, it is necessary to protect distributed contents from illegal copying and redistributing after they are accessed. Fingerprinting system is a useful tool for it. This paper shows that the anti-collusion code has advantages over other existing fingerprinting codes in terms of efficiency and effectivity for live pay-television broadcasting. Next, this paper presents how to achieve efficient and effective anti-collusion codes based on unital and affine plane, which are two known examples of balanced incomplete block design (BIBD). Meanwhile, performance evaluations of anti-collusion codes generated from unital and affine plane are conducted. Their practical explicit constructions are given last.

  8. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409(T) with an incomplete denitrification pathway.

    PubMed

    Zhou, En-Min; Murugapiran, Senthil K; Mefferd, Chrisabelle C; Liu, Lan; Xian, Wen-Dong; Yin, Yi-Rui; Ming, Hong; Yu, Tian-Tian; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T B K; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Spunde, Alexander; Kyrpides, Nikos; Woyke, Tanja; Li, Wen-Jun; Hedlund, Brian P

    2016-01-01

    Thermus amyloliquefaciens type strain YIM 77409(T) is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409(T) together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transporters and enzymes for heterotrophy highlight the broad heterotrophic lifestyle of this organism. A denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain. PMID:26925197

  9. The multiple codes of nucleotide sequences.

    PubMed

    Trifonov, E N

    1989-01-01

    Nucleotide sequences carry genetic information of many different kinds, not just instructions for protein synthesis (triplet code). Several codes of nucleotide sequences are discussed including: (1) the translation framing code, responsible for correct triplet counting by the ribosome during protein synthesis; (2) the chromatin code, which provides instructions on appropriate placement of nucleosomes along the DNA molecules and their spatial arrangement; (3) a putative loop code for single-stranded RNA-protein interactions. The codes are degenerate and corresponding messages are not only interspersed but actually overlap, so that some nucleotides belong to several messages simultaneously. Tandemly repeated sequences frequently considered as functionless "junk" are found to be grouped into certain classes of repeat unit lengths. This indicates some functional involvement of these sequences. A hypothesis is formulated according to which the tandem repeats are given the role of weak enhancer-silencers that modulate, in a copy number-dependent way, the expression of proximal genes. Fast amplification and elimination of the repeats provides an attractive mechanism of species adaptation to a rapidly changing environment. PMID:2673451

  10. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal

  11. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  12. Coding visual features extracted from video sequences.

    PubMed

    Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2014-05-01

    Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics. PMID:24818244

  13. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  14. Program generator for the Incomplete Cholesky Conjugate Gradient (ICCG) method with a symmetrizing preprocessor. [GENIC code package

    SciTech Connect

    Kuo-Petravic, G.; Petravic, M.

    1980-03-01

    This paper is an extension of the previous paper, A Program Generator for the Incomplete LU-Decomposition-Conjugate Gradient (ILUCG) Method which appeared in Computer Physics Communications. In that paper a generator program was presented which produced a code package to solve the system of equations Ax/sub approx./ = b/sub approx./, where A is an arbitrary nonsingular matrix, by the ILUCG method. In the present paper an alternative generator program is offered which produces a code package applicable to the case where A is symmetric and positive definite. The numerical algorithm used is the Incomplete Cholesky Conjugate Gradient (ICCG) method of Meijerink and Van der Vorst, which executes approximately twice as fast per iteration as the ILUCG method. In addition, an optional preprocessor is provided to treat the case of a not diagonally dominant nonsymmetric and nonsingular matrix A by solving the equation A/sup T/Ax/sub approx./ = A/sup T/b/sub approx./.

  15. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409T with an incomplete denitrification pathway

    DOE PAGESBeta

    Zhou, En -Min; Murugapiran, Senthil K.; Mefferd, Chrisabelle C.; Liu, Lan; Xian, Wen -Dong; Yin, Yi -Rui; Ming, Hong; Yu, Tian -Tian; Huntemann, Marcel; Clum, Alicia; et al

    2016-02-27

    Thermus amyloliquefaciens type strain YIM 77409T is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409T together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transporters and enzymesmore » for heterotrophy highlight the broad heterotrophic lifestyle of this organism. Furthermore, a denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain.« less

  16. Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness

    MedlinePlus

    ... Consumers Consumer Updates Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness Share Tweet Linkedin Pin ... have millions of different genomes, or sequences of genetic code, each as unique as a fingerprint. Get ...

  17. Designedly Incomplete Utterances: A Pedagogical Practice for Eliciting Knowledge Displays in Error Correction Sequences.

    ERIC Educational Resources Information Center

    Koshik, Irene

    2002-01-01

    Uses a conversation analytic framework to analyze a practice used by teachers in 1-0-1, second language writing conferences when eliciting self-correction of students' written language errors. This type of turn used to elicit a knowledge display from the student is labeled designedly incomplete utterance (DIU). Teachers use DIUs made up of…

  18. Hybrid ARQ schemes employing coded modulation and sequence combining

    NASA Astrophysics Data System (ADS)

    Deng, Robert H.

    1994-06-01

    We propose and analyze two hybrid automatic-repeat-request (ARQ) schemes employing bandwidth efficient coded modulation and coded sequence combining. In the first scheme, a trellis-coded modulation (TCM) is used to control channel noise; while in the second scheme a concatenated coded modulation is employed. The concatenated coded modulation is formed by cascading a Reed-Solomon (RS) outer code and a coded modulation (BCM) inner code. In both schemes, the coded modulation decoder, by performing sequence combining and soft-decision maximum likelihood decoding, makes full use of the information available in all received sequences corresponding to a given information message. It is shown, by means of analysis as well as computer simulations, that both schemes are capable of providing high throughput efficiencies over a wide range of signal-to-noise ratios. The schemes are suitable for large file transfers over satellite communication links where high throughput and high reliability are required.

  19. Orpinomyces cellulase celf protein and coding sequences

    DOEpatents

    Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.

    2000-09-05

    A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.

  20. Three Ingredients for Improved Global Aftershock Forecasts: Tectonic Region, Time-Dependent Catalog Incompleteness, and Inter-Sequence Variability

    NASA Astrophysics Data System (ADS)

    Page, M. T.; Hardebeck, J.; Felzer, K. R.; Michael, A. J.; van der Elst, N.

    2015-12-01

    Following a large earthquake, seismic hazard can be orders of magnitude higher than the long-term average as a result of aftershock triggering. Due to this heightened hazard, there is a demand from emergency managers and the public for rapid, authoritative, and reliable aftershock forecasts. In the past, USGS aftershock forecasts following large, global earthquakes have been released on an ad-hoc basis with inconsistent methods, and in some cases, aftershock parameters adapted from California. To remedy this, we are currently developing an automated aftershock product that will generate more accurate forecasts based on the Reasenberg and Jones (Science, 1989) method. To better capture spatial variations in aftershock productivity and decay, we estimate regional aftershock parameters for sequences within the Garcia et al. (BSSA, 2012) tectonic regions. We find that regional variations for mean aftershock productivity exceed a factor of 10. The Reasenberg and Jones method combines modified-Omori aftershock decay, Utsu productivity scaling, and the Gutenberg-Richter magnitude distribution. We additionally account for a time-dependent magnitude of completeness following large events in the catalog. We generalize the Helmstetter et al. (2005) equation for short-term aftershock incompleteness and solve for incompleteness levels in the global NEIC catalog following large mainshocks. In addition to estimating average sequence parameters within regions, we quantify the inter-sequence parameter variability. This allows for a more complete quantification of the forecast uncertainties and Bayesian updating of the forecast as sequence-specific information becomes available.

  1. SEQassembly: A Practical Tools Program for Coding Sequences Splicing

    NASA Astrophysics Data System (ADS)

    Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

    CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.

  2. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  3. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  4. Using Huffman coding method to visualize and analyze DNA sequences.

    PubMed

    Qi, Zhao-Hui; Li, Ling; Qi, Xiao-Qin

    2011-11-30

    On the basis of the Huffman coding method, we propose a new graphical representation of DNA sequence. The representation can avoid degeneracy and loss of information in the transfer of data from a DNA sequence to its graphical representation. Then a multicomponent vector from the representation is introduced to characterize quantitatively DNA sequences. The components of the vector are derived from the graphical representation of DNA primary sequence. The examination of similarities and dissimilarities among the complete coding sequences of β-globin gene of 11 species and six ND6 proteins shows the utility of the scheme. PMID:21953557

  5. FRAGS: estimation of coding sequence substitution rates from fragmentary data

    PubMed Central

    Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

    2004-01-01

    Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802

  6. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  7. High-speed Viterbi decoding with overlapping code sequences

    NASA Technical Reports Server (NTRS)

    Ross, Michael D.; Osborne, William P.

    1993-01-01

    The Viterbi Algorithm for decoding convolutional codes and Trellis Coded Modulation is suited to VLSI implementation but contains a feedback loop which limits the speed of pipelined architecture. The feedback loop is circumvented by decoding independent sequences simultaneously, resulting in a 5-9 fold speed-up with a two-fold hardware expansion.

  8. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  9. CodingMotif: exact determination of overrepresented nucleotide motifs in coding sequences

    PubMed Central

    2012-01-01

    Background It has been increasingly appreciated that coding sequences harbor regulatory sequence motifs in addition to encoding for protein. These sequence motifs are expected to be overrepresented in nucleotide sequences bound by a common protein or small RNA. However, detecting overrepresented motifs has been difficult because of interference by constraints at the protein level. Sampling-based approaches to solve this problem based on codon-shuffling have been limited to exploring only an infinitesimal fraction of the sequence space and by their use of parametric approximations. Results We present a novel O(N(log N)2)-time algorithm, CodingMotif, to identify nucleotide-level motifs of unusual copy number in protein-coding regions. Using a new dynamic programming algorithm we are able to exhaustively calculate the distribution of the number of occurrences of a motif over all possible coding sequences that encode the same amino acid sequence, given a background model for codon usage and dinucleotide biases. Our method takes advantage of the sparseness of loci where a given motif can occur, greatly speeding up the required convolution calculations. Knowledge of the distribution allows one to assess the exact non-parametric p-value of whether a given motif is over- or under- represented. We demonstrate that our method identifies known functional motifs more accurately than sampling and parametric-based approaches in a variety of coding datasets of various size, including ChIP-seq data for the transcription factors NRSF and GABP. Conclusions CodingMotif provides a theoretically and empirically-demonstrated advance for the detection of motifs overrepresented in coding sequences. We expect CodingMotif to be useful for identifying motifs in functional genomic datasets such as DNA-protein binding, RNA-protein binding, or microRNA-RNA binding within coding regions. A software implementation is available at http://bioinformatics.bc.edu/chuanglab/codingmotif.tar PMID

  10. The Coding and Inter-Manual Transfer of Movement Sequences

    PubMed Central

    Shea, Charles H.; Kovacs, Attila J.; Panzer, Stefan

    2011-01-01

    The manuscript reviews recent experiments that use inter-manual transfer and inter-manual practice paradigms to determine the coordinate system (visual–spatial or motor) used in the coding of movement sequences during physical and observational practice. The results indicated that multi-element movement sequences are more effectively coded in visual–spatial coordinates even following extended practice, while very early in practice movement sequences with only a few movement elements and relatively short durations are coded in motor coordinates. Likewise, inter-manual practice of relatively simple movement sequences show benefits of right and left limb practice that involves the same motor coordinates while the opposite is true for more complex sequences. The results suggest that the coordinate system used to code the sequence information is linked to both the task characteristics and the control processes used to produce the sequence. These findings have the potential to greatly enhance our understanding of why in some conditions participants following practice with one limb or observation of one limb practice can effectively perform the task with the contralateral limb while in other (often similar) conditions cannot. PMID:21716583

  11. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  12. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination. PMID:16381882

  13. Dose proportionality and pharmacokinetics of carvedilol sustained-release formulation: a single dose-ascending 10-sequence incomplete block study

    PubMed Central

    Kim, Yo Han; Choi, Hee Youn; Noh, Yook-Hwan; Lee, Shi Hyang; Lim, Hyeong-Seok; Kim, Chin; Bae, Kyun-Seop

    2015-01-01

    Background Carvedilol is a third-generation β-blocker indicated for congestive heart failure and high blood pressure. The aim of this study was to investigate the dose proportionality of the carvedilol sustained-release (SR) formulation in healthy male subjects. Methods An open-label, single dose-ascending, 10-sequence, 3-period balanced incomplete block study was performed using healthy male subjects. In varying sequences, each subject received three of five carvedilol SR formulations (8, 16, 32, 64, or 128 mg once). The treatment periods were separated by a washout period of 7 days. Serial blood samples were collected up to 48 h after dosing. The plasma concentrations of carvedilol were determined by using validated liquid chromatography–tandem mass spectrometry. Pharmacokinetic parameters including the area under the plasma concentration–time curve (AUC) from time 0 to the last measurable time (AUClast), AUC extrapolated to infinity (AUCinf), and the measured peak plasma concentration (Cmax) were obtained by noncompartmental analysis. Dose proportionality was evaluated if the ln–ln plots of AUClast, AUCinf, and Cmax versus dose were linear and the 90% confidence intervals (CIs) of the slopes were within 0.9195 and 1.0805. Tolerability was assessed by vital signs, electrocardiogram, clinical laboratory tests, and monitoring of adverse events (AEs) throughout the study. Results A total of 31 subjects were enrolled, and 30 completed the study. The assessment of dose proportionality meets the statistical criteria; the point estimates of slope were 1.0104 (90% CI: 0.9849–1.0359) for AUClast, 1.0003 (90% CI: 0.9748–1.0258) for AUCinf, and 0.9901 (90% CI: 0.9524–1.0277) for Cmax, respectively. All AEs were mild, and none of the subjects dropped out due to AEs. Conclusion In this study, exposure to carvedilol was proportional over the therapeutic dose range of 8–128 mg. The carvedilol SR formulation was well tolerated. PMID:26089641

  14. Machine-Checked Sequencer for Critical Embedded Code Generator

    NASA Astrophysics Data System (ADS)

    Izerrouken, Nassima; Pantel, Marc; Thirioux, Xavier

    This paper presents the development of a correct-by-construction block sequencer for GeneAuto a qualifiable (according to DO178B/ED12B recommendation) automatic code generator. It transforms Simulink models to MISRA C code for safety critical systems. Our approach which combines classical development process and formal specification and verification using proof-assistants, led to preliminary fruitful exchanges with certification authorities. We present parts of the classical user and tools requirements and derived formal specifications, implementation and verification for the correctness and termination of the block sequencer. This sequencer has been successfully applied to real-size industrial use cases from various transportation domain partners and led to requirement errors detection and a correct-by-construction implementation.

  15. Radio frequency interference effect on PN code sequence lock detector

    NASA Technical Reports Server (NTRS)

    Kwon, Hyuck M.; Tu, Kwei; Loh, Y. C.

    1991-01-01

    The authors find the probabilities of detection and false alarm of the pseudonoise (PN) sequence code lock detector when strong radio frequency interference (RFI) hits the communications link. Both a linear model and a soft-limiter nonlinear model for a transponder receiver are considered. In addition, both continuous wave (CW) RFI and pulse RFI are analyzed, and a discussion is included of how strong CW RFI can knock out the PN code lock detector in a linear or a soft-limiter transponder. As an example, the Space Station Freedom forward S-band PN system is evaluated. It is shown that a soft-limiter transponder can protect the PN code lock detector against a typical pulse RFI, but it can degrade the PN code lock detector performance more than a linear transponder if CW RFI hits the link.

  16. Evolution of codes, crosstalk, and sequence niches in biomolecular signaling

    NASA Astrophysics Data System (ADS)

    Myers, Christopher

    2007-03-01

    Signaling and regulation in cellular networks is mediated through biomolecular interactions, which can be somewhat promiscuous, involving the molecular recognition of broad sets of binding targets. This leads to some basic questions concerning crosstalk among similar sets of biomolecules: does it occur, to what extent can it be avoided, how can phenotypic errors due to crosstalk be minimized, and when might crosstalk be advantageous? Beyond biology, questions of this sort have connections to phase transitions in constraint satisfaction problems, and to the theory of message coding in noisy channels. Expanding upon my previous work exploring the nature of the satisfiability (SAT-UNSAT) transition in a simple model of protein-protein interactions, this talk will investigate the role of sequence evolution in shaping high-dimensional sequence niches and biomolecular codes.

  17. Coding Deficits in Noise-Induced Hidden Hearing Loss May Stem from Incomplete Repair of Ribbon Synapses in the Cochlea

    PubMed Central

    Shi, Lijuan; Chang, Yin; Li, Xiaowei; Aiken, Steven J.; Liu, Lijie; Wang, Jian

    2016-01-01

    Recent evidence has shown that noise-induced damage to the synapse between inner hair cells (IHCs) and type I afferent auditory nerve fibers (ANFs) may occur in the absence of permanent threshold shift (PTS), and that synapses connecting IHCs with low spontaneous rate (SR) ANFs are disproportionately affected. Due to the functional importance of low-SR ANF units for temporal processing and signal coding in noisy backgrounds, deficits in cochlear coding associated with noise-induced damage may result in significant difficulties with temporal processing and hearing in noise (i.e., “hidden hearing loss”). However, significant noise-induced coding deficits have not been reported at the single unit level following the loss of low-SR units. We have found evidence to suggest that some aspects of neural coding are not significantly changed with the initial loss of low-SR ANFs, and that further coding deficits arise in association with the subsequent reestablishment of the synapses. This suggests that synaptopathy in hidden hearing loss may be the result of insufficient repair of disrupted synapses, and not simply due to the loss of low-SR units. These coding deficits include decreases in driven spike rate for intensity coding as well as several aspects of temporal coding: spike latency, peak-to-sustained spike ratio and the recovery of spike rate as a function of click-interval. PMID:27252621

  18. Sequence and Structural Analyses for Functional Non-coding RNAs

    NASA Astrophysics Data System (ADS)

    Sakakibara, Yasubumi; Sato, Kengo

    Analysis and detection of functional RNAs are currently important topics in both molecular biology and bioinformatics research. Several computational methods based on stochastic context-free grammars (SCFGs) have been developed for modeling and analysing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNAs and are used for structural alignments of RNA sequences. Such stochastic models, however, are not sufficient to discriminate member sequences of an RNA family from non-members, and hence to detect non-coding RNA regions from genome sequences. Recently, the support vector machine (SVM) and kernel function techniques have been actively studied and proposed as a solution to various problems in bioinformatics. SVMs are trained from positive and negative samples and have strong, accurate discrimination abilities, and hence are more appropriate for the discrimination tasks. A few kernel functions that extend the string kernel to measure the similarity of two RNA sequences from the viewpoint of secondary structures have been proposed. In this article, we give an overview of recent progress in SCFG-based methods for RNA sequence analysis and novel kernel functions tailored to measure the similarity of two RNA sequences and developed for use with support vector machines (SVM) in discriminating members of an RNA family from non-members.

  19. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  20. The most frequent short sequences in non-coding DNA.

    PubMed

    Subirana, Juan A; Messeguer, Xavier

    2010-03-01

    The purpose of this work is to determine the most frequent short sequences in non-coding DNA. They may play a role in maintaining the structure and function of eukaryotic chromosomes. We present a simple method for the detection and analysis of such sequences in several genomes, including Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. We also study two chromosomes of man and mouse with a length similar to the whole genomes of the other species. We provide a list of the most common sequences of 9-14 bases in each genome. As expected, they are present in human Alu sequences. Our programs may also give a graph and a list of their position in the genome. Detection of clusters is also possible. In most cases, these sequences contain few alternating regions. Their intrinsic structure and their influence on nucleosome formation are not known. In particular, we have found new features of short sequences in C. elegans, which are distributed in heterogeneous clusters. They appear as punctuation marks in the chromosomes. Such clusters are not found in either A. thaliana or D. melanogaster. We discuss the possibility that they play a role in centromere function and homolog recognition in meiosis. PMID:19966278

  1. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    PubMed

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  2. Licensee Event Report sequence coding and search procedure workshop

    SciTech Connect

    Cottrell, W.B.; Gallaher, R.B.

    1981-03-01

    Since mid-1980, the Office for Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC) has been developing procedures for the systematic review and analysis of Licensee Event Reports (LERs). These procedures generally address several areas of concern, including identification of significant trends and patterns, event sequence of occurrences, component failures, and system and plant effects. The AEOD and NSIC conducted a workshop on the new coding procedure at the American Museum of Science and Energy in Oak Ridge, TN, on November 24, 1980.

  3. Code-Time Diversity for Direct Sequence Spread Spectrum Systems

    PubMed Central

    Hassan, A. Y.

    2014-01-01

    Time diversity is achieved in direct sequence spread spectrum by receiving different faded delayed copies of the transmitted symbols from different uncorrelated channel paths when the transmission signal bandwidth is greater than the coherence bandwidth of the channel. In this paper, a new time diversity scheme is proposed for spread spectrum systems. It is called code-time diversity. In this new scheme, N spreading codes are used to transmit one data symbol over N successive symbols interval. The diversity order in the proposed scheme equals to the number of the used spreading codes N multiplied by the number of the uncorrelated paths of the channel L. The paper represents the transmitted signal model. Two demodulators structures will be proposed based on the received signal models from Rayleigh flat and frequency selective fading channels. Probability of error in the proposed diversity scheme is also calculated for the same two fading channels. Finally, simulation results are represented and compared with that of maximal ration combiner (MRC) and multiple-input and multiple-output (MIMO) systems. PMID:24982925

  4. An integrative approach to predicting the functional effects of non-coding and coding sequence variation

    PubMed Central

    Shihab, Hashem A.; Rogers, Mark F.; Gough, Julian; Mort, Matthew; Cooper, David N.; Day, Ian N. M.; Gaunt, Tom R.; Campbell, Colin

    2015-01-01

    Motivation: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source. Results: We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions. Availability and implementation: The FATHMM-MKL webserver is available at: http://fathmm.biocompute.org.uk Contact: H.Shihab@bristol.ac.uk or Mark.Rogers@bristol.ac.uk or C.Campbell@bristol.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25583119

  5. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  6. A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence

    PubMed Central

    Forman, Joshua J.; Legesse-Miller, Aster; Coller, Hilary A.

    2008-01-01

    Recognition sites for microRNAs (miRNAs) have been reported to be located in the 3′ untranslated regions of transcripts. In a computational screen for highly conserved motifs within coding regions, we found an excess of sequences conserved at the nucleotide level within coding regions in the human genome, the highest scoring of which are enriched for miRNA target sequences. To validate our results, we experimentally demonstrated that the let-7 miRNA directly targets the miRNA-processing enzyme Dicer within its coding sequence, thus establishing a mechanism for a miRNA/Dicer autoregulatory negative feedback loop. We also found computational evidence to suggest that miRNA target sites in coding regions and 3′ UTRs may differ in mechanism. This work demonstrates that miRNAs can directly target transcripts within their coding region in animals, and it suggests that a complete search for the regulatory targets of miRNAs should be expanded to include genes with recognition sites within their coding regions. As more genomes are sequenced, the methodological approach that we used for identifying motifs with high sequence conservation will be increasingly valuable for detecting functional sequence motifs within coding regions. PMID:18812516

  7. Coding in 2D: Using Intentional Dispersity to Enhance the Information Capacity of Sequence-Coded Polymer Barcodes.

    PubMed

    Laure, Chloé; Karamessini, Denise; Milenkovic, Olgica; Charles, Laurence; Lutz, Jean-François

    2016-08-26

    A 2D approach was studied for the design of polymer-based molecular barcodes. Uniform oligo(alkoxyamine amide)s, containing a monomer-coded binary message, were synthesized by orthogonal solid-phase chemistry. Sets of oligomers with different chain-lengths were prepared. The physical mixture of these uniform oligomers leads to an intentional dispersity (1st dimension fingerprint), which is measured by electrospray mass spectrometry. Furthermore, the monomer sequence of each component of the mass distribution can be analyzed by tandem mass spectrometry (2nd dimension sequencing). By summing the sequence information of all components, a binary message can be read. A 4-bytes extended ASCII-coded message was written on a set of six uniform oligomers. Alternatively, a 3-bytes sequence was written on a set of five oligomers. In both cases, the coded binary information was recovered. PMID:27484303

  8. Functional annotation of non-coding sequence variants

    PubMed Central

    Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria; Flicek, Paul

    2016-01-01

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants that fall in protein-coding regions our understanding of the genetic code and splicing allow us to identify likely candidates, but interpreting variants that fall outside of genic regions is more difficult. Here we present a new tool, GWAVA, which supports prioritisation of non-coding variants by integrating a range of annotations. PMID:24487584

  9. Molecular evolution of coding and non-coding sequences of the growth hormone receptor (GHR) gene in the family Bovidae.

    PubMed

    Maj, Andrzej; Zwierzchowski, Lech

    2006-01-01

    The GHR gene exon 1A and exon 4 with fragments of its flanking introns were sequenced in twelve Bovidae species and the obtained sequences were aligned and analysed by the ClustalW method. In coding exon 4 only three interspecies differences were found, one of which had an effect on the amino-acid sequence--leucine 152 proline. The average mutation frequency in non-coding exon 1A was 10.5 per 100 bp, and was 4.6-fold higher than that in coding exon 4 (2.3 per 100 bp). The mutation frequency in intron sequences was similar to that in non-coding exon 1A (8.9 vs 10.5/100 bp). For non-coding exon 1A, the mutation levels were lower within than between the subfamilies Bovinae and Caprinae. Exon 4 was 100% identical within the genera Ovis, Capra, Bison, and Bos and 97.7% identical for Ovis moschatus, Ammotragus lervia and Bovinae species. The identity level of non-coding exon 1A of the GHR gene was 93.8% between species belonging to Bovinae and Caprinae. The average mutation rate was 0.2222/100 bp/MY and 0.0513/100 bp/MY for the Bovidae GHR gene exons 1A and 4, respectively. Thus, the GHR gene is well conserved in the Bovidae family. Also, in this study some novel intraspecies polymorphisms were found for cattle and sheep. PMID:17044257

  10. OrfPredictor: predicting protein-coding regions in EST-derived sequences.

    PubMed

    Min, Xiang Jia; Butler, Gregory; Storms, Reginald; Tsang, Adrian

    2005-07-01

    OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at https://fungalgenome.concordia.ca/tools/OrfPredictor.html. PMID:15980561

  11. Complete Coding Genome Sequence of Putative Novel Bluetongue Virus Serotype 27

    PubMed Central

    Jenckel, Maria; Bréard, Emmanuel; Schulz, Claudia; Sailleau, Corinne; Viarouge, Cyril; Hoffmann, Bernd; Beer, Martin; Zientara, Stéphan

    2015-01-01

    We announce the complete coding genome sequence of a novel bluetongue virus (BTV) serotype (BTV-n = putative BTV-27) detected in goats in Corsica, France, in 2014. Sequence analysis confirmed the closest relationship between sequences of the novel BTV serotype and BTV-25 and BTV-26, recently discovered in Switzerland and Kuwait, respectively. PMID:25767218

  12. Sequence analysis of the 3' non-coding region of mouse immunoglobulin light chain messenger RNA.

    PubMed Central

    Hamlyn, P H; Gillam, S; Smith, M; Milstein, C

    1977-01-01

    Using an oligonucleotide d(pT10-C-A) as primer, cDNA has been transcribed from the 3' non-coding region of mouse immunoglobulin light chain mRNA and sequenced by a modification1 of the 'plus-minus' gel method2. The sequence obtained has partially corrected and extended a previously obtained sequence3. The new data contains an unusual sequence in which a trinucleotide is repeated seven times. Images PMID:405661

  13. Sequences encoding identical peptides for the analysis and manipulation of coding DNA

    PubMed Central

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  14. Sequences encoding identical peptides for the analysis and manipulation of coding DNA.

    PubMed

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  15. Multi-criterial coding sequence prediction. Combination of GeneMark with two novel, coding-character specific quantities.

    PubMed

    Almirantis, Yannis; Nikolaou, Christoforos

    2005-10-01

    This work applies two recently formulated quantities, strongly correlated with the coding character of a sequence, as an additional "module" on GeneMark, in a three-criterial method. The difference in the statistical approaches implicated by the methods combined here, is expected to contribute to an efficient assignment of functionality to unannotated genomic sequences. The developed combined algorithm is used to fractionalize a collection of GeneMark-predicted exons into sub-collections of different expectation to be coding. A further modification of the algorithm allows for the assignment of an improved estimation of the probability to be coding, to GeneMark-predicted exons. This is on the basis of a suitable training set of GeneMark-predicted exons of known functionality. PMID:15809100

  16. Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code

    NASA Astrophysics Data System (ADS)

    Jolivet, R.; Rothen, F.

    2001-08-01

    Statistical analysis of the distribution of codons in DNA coding sequences of bacteria or archaea suggests that, at some stage of the prebiotic world, the most successful RNA replicating sequences afforded some tendency toward a weak form of palindromic symmetry, namely complementary symmetry. As a consequence, as soon as the machinery allowing translation into proteins was beginning to settle, we assume that primeval versions of the genetic code essentially consisted of pairs of sense-antisense codons. Present-day DNA sequences display footprints of this early symmetry, provided that statistics are made over coding sequences issued from groups of organisms and not only from the genome of an individual species. These fossil traces are proven to be significant from the statistical point of view. They shed some light onto the possible evolution of the genetic code and set some constraints on the way it had to follow.

  17. Stochastic model of homogeneous coding and latent periodicity in DNA sequences.

    PubMed

    Chaley, Maria; Kutyrkin, Vladimir

    2016-02-01

    The concept of latent triplet periodicity in coding DNA sequences which has been earlier extensively discussed is confirmed in the result of analysis of a number of eukaryotic genomes, where latent periodicity of a new type, called profile periodicity, is recognized in the CDSs. Original model of Stochastic Homogeneous Organization of Coding (SHOC-model) in textual string is proposed. This model explains the existence of latent profile periodicity and regularity in DNA sequences. PMID:26656186

  18. A machine learning strategy to identify candidate binding sites in human protein-coding sequence

    PubMed Central

    Down, Thomas; Leong, Bernard; Hubbard, Tim JP

    2006-01-01

    Background The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. Results This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. Conclusion We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements. PMID:17002805

  19. Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

    PubMed

    Pastor, D; Amaya, W; García-Olcina, R; Sales, S

    2007-07-01

    We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning. PMID:17603606

  20. The Coding and Effector Transfer of Movement Sequences

    ERIC Educational Resources Information Center

    Kovacs, Attila J.; Muhlbauer, Thomas; Shea, Charles H.

    2009-01-01

    Three experiments utilizing a 14-element arm movement sequence were designed to determine if reinstating the visual-spatial coordinates, which require movements to the same spatial locations utilized during acquisition, results in better effector transfer than reinstating the motor coordinates, which require the same pattern of homologous muscle…

  1. Nanopore Sequencing: Electrical Measurements of the Code of Life

    PubMed Central

    Timp, Winston; Mirsaidov, Utkur M.; Wang, Deqiang; Comer, Jeff; Aksimentiev, Aleksei; Timp, Gregory

    2011-01-01

    Sequencing a single molecule of deoxyribonucleic acid (DNA) using a nanopore is a revolutionary concept because it combines the potential for long read lengths (>5 kbp) with high speed (1 bp/10 ns), while obviating the need for costly amplification procedures due to the exquisite single molecule sensitivity. The prospects for implementing this concept seem bright. The cost savings from the removal of required reagents, coupled with the speed of nanopore sequencing places the $1000 genome within grasp. However, challenges remain: high fidelity reads demand stringent control over both the molecular configuration in the pore and the translocation kinetics. The molecular configuration determines how the ions passing through the pore come into contact with the nucleotides, while the translocation kinetics affect the time interval in which the same nucleotides are held in the constriction as the data is acquired. Proteins like α-hemolysin and its mutants offer exquisitely precise self-assembled nanopores and have demonstrated the facility for discriminating individual nucleotides, but it is currently difficult to design protein structure ab initio, which frustrates tailoring a pore for sequencing genomic DNA. Nanopores in solid-state membranes have been proposed as an alternative because of the flexibility in fabrication and ease of integration into a sequencing platform. Preliminary results have shown that with careful control of the dimensions of the pore and the shape of the electric field, control of DNA translocation through the pore is possible. Furthermore, discrimination between different base pairs of DNA may be feasible. Thus, a nanopore promises inexpensive, reliable, high-throughput sequencing, which could thrust genomic science into personal medicine. PMID:21572978

  2. The primordial sequence, ribosomes, and the genetic code.

    NASA Technical Reports Server (NTRS)

    Fox, S. W.; Yuki, A.; Waehneldt, T. V.; Lacey, J. C., Jr.

    1971-01-01

    Experimental investigation of the key question of the origin of life concerning the chronological order in the primordial sequence of nucleic acid, protein, and cell. It is pointed out that, when viewed against the background of experiments on the selective reaction of basic homopolyamine acids with mononucleotides (Lacey and Pruitt, 1969; Woese, 1968), the experiments made help to establish a basis for understanding how information originally flowed from proteins to nucleic acids.

  3. Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit?

    PubMed

    Fuertes, Miguel Angel; Rodrigo, José Ramón; Alonso, Carlos

    2016-06-01

    It has been previously suggested that both the coding and the associated non-coding sequences of some human-mouse orthologs could evolve as a single unit. This letter deals with the observation that between mouse and humans some orthologs change significantly their compositional features as an indication that the molecular evolution is a local process. Moreover, the data shown indicate that the coding and the intron sequences of these orthologs do not evolve independently but instead both undergo a concerted evolution, evolving as a single unit, from a compositional cluster in mouse to a different compositional cluster in human. PMID:27220874

  4. Classification of Arabidopsis thaliana gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction.

    PubMed

    Mathé, C; Peresetsky, A; Déhais, P; Van Montagu, M; Rouzé, P

    1999-02-01

    While genomic sequences are accumulating, finding the location of the genes remains a major issue that can be solved only for about a half of them by homology searches. Prediction methods are thus required, but unfortunately are not fully satisfying. Most prediction methods implicitly assume a unique model for genes. This is an oversimplification as demonstrated by the possibility to group coding sequences into several classes in Escherichia coli and other genomes. As no classification existed for Arabidopsis thaliana, we classified genes according to the statistical features of their coding sequences. A clustering algorithm using a codon usage model was developed and applied to coding sequences from A. thaliana, E. coli, and a mixture of both. By using it, Arabidopsis sequences were clustered into two classes. The CU1 and CU2 classes differed essentially by the choice of pyrimidine bases at the codon silent sites: CU2 genes often use C whereas CU1 genes prefer T. This classification discriminated the Arabidopsis genes according to their expressiveness, highly expressed genes being clustered in CU2 and genes expected to have a lower expression, such as the regulatory genes, in CU1. The algorithm separated the sequences of the Escherichia-Arabidopsis mixed data set into five classes according to the species, except for one class. This mixed class contained 89 % Arabidopsis genes from CU1 and 11 % E. coli genes, mostly horizontally transferred. Interestingly, most genes encoding organelle-targeted proteins, except the photosynthetic and photoassimilatory ones, were clustered in CU1. By tailoring the GeneMark CDS prediction algorithm to the observed coding sequence classes, its quality of prediction was greatly improved. Similar improvement can be expected with other prediction systems. PMID:9925779

  5. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed Central

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-01-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  6. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein‐Coding Regions

    PubMed Central

    Lelieveld, Stefan H.; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A.

    2015-01-01

    ABSTRACT For next‐generation sequencing technologies, sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole‐genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole‐exome sequencing (WES) platforms, and compared single‐base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose. PMID:25973577

  7. Packet error probabilities in direct sequence spread spectrum packet radio networks with BCH codes

    NASA Astrophysics Data System (ADS)

    Georgiopoulos, Michael

    The author computes an upper bound on the packet error probability induced in direct-sequence spread-spectrum networks, when BCH codes are used for the encoding of the packets. The bound, which is introduced here, is valid independently of whether signals arrive with equal or unequal powers at the receiver site. Furthermore, it has a simple form and is easy to compute. In addition, it is valid for other classes of forward error correction codes (e.g., convolutional codes). However, numerical results are presented for BCH codes only.

  8. A minimal sequence code for switching protein structure and function.

    PubMed

    Alexander, Patrick A; He, Yanan; Chen, Yihong; Orban, John; Bryan, Philip N

    2009-12-15

    We present here a structural and mechanistic description of how a protein changes its fold and function, mutation by mutation. Our approach was to create 2 proteins that (i) are stably folded into 2 different folds, (ii) have 2 different functions, and (iii) are very similar in sequence. In this simplified sequence space we explore the mutational path from one fold to another. We show that an IgG-binding, 4beta+alpha fold can be transformed into an albumin-binding, 3-alpha fold via a mutational pathway in which neither function nor native structure is completely lost. The stabilities of all mutants along the pathway are evaluated, key high-resolution structures are determined by NMR, and an explanation of the switching mechanism is provided. We show that the conformational switch from 4beta+alpha to 3-alpha structure can occur via a single amino acid substitution. On one side of the switch point, the 4beta+alpha fold is >90% populated (pH 7.2, 20 degrees C). A single mutation switches the conformation to the 3-alpha fold, which is >90% populated (pH 7.2, 20 degrees C). We further show that a bifunctional protein exists at the switch point with affinity for both IgG and albumin. PMID:19923431

  9. Evaluation of correlation property of linear-frequency-modulated signals coded by maximum-length sequences

    NASA Astrophysics Data System (ADS)

    Yamanaka, Kota; Hirata, Shinnosuke; Hachiya, Hiroyuki

    2016-07-01

    Ultrasonic distance measurement for obstacles has been recently applied in automobiles. The pulse–echo method based on the transmission of an ultrasonic pulse and time-of-flight (TOF) determination of the reflected echo is one of the typical methods of ultrasonic distance measurement. Improvement of the signal-to-noise ratio (SNR) of the echo and the avoidance of crosstalk between ultrasonic sensors in the pulse–echo method are required in automotive measurement. The SNR of the reflected echo and the resolution of the TOF are improved by the employment of pulse compression using a maximum-length sequence (M-sequence), which is one of the binary pseudorandom sequences generated from a linear feedback shift register (LFSR). Crosstalk is avoided by using transmitted signals coded by different M-sequences generated from different LFSRs. In the case of lower-order M-sequences, however, the number of measurement channels corresponding to the pattern of the LFSR is not enough. In this paper, pulse compression using linear-frequency-modulated (LFM) signals coded by M-sequences has been proposed. The coding of LFM signals by the same M-sequence can produce different transmitted signals and increase the number of measurement channels. In the proposed method, however, the truncation noise in autocorrelation functions and the interference noise in cross-correlation functions degrade the SNRs of received echoes. Therefore, autocorrelation properties and cross-correlation properties in all patterns of combinations of coded LFM signals are evaluated.

  10. Severe accident source term characteristics for selected Peach Bottom sequences predicted by the MELCOR Code

    SciTech Connect

    Carbajo, J.J.

    1993-09-01

    The purpose of this report is to compare in-containment source terms developed for NUREG-1159, which used the Source Term Code Package (STCP), with those generated by MELCOR to identify significant differences. For this comparison, two short-term depressurized station blackout sequences (with a dry cavity and with a flooded cavity) and a Loss-of-Coolant Accident (LOCA) concurrent with complete loss of the Emergency Core Cooling System (ECCS) were analyzed for the Peach Bottom Atomic Power Station (a BWR-4 with a Mark I containment). The results indicate that for the sequences analyzed, the two codes predict similar total in-containment release fractions for each of the element groups. However, the MELCOR/CORBH Package predicts significantly longer times for vessel failure and reduced energy of the released material for the station blackout sequences (when compared to the STCP results). MELCOR also calculated smaller releases into the environment than STCP for the station blackout sequences.

  11. Coding-complete sequencing classifies parrot bornavirus 5 into a novel virus species.

    PubMed

    Marton, Szilvia; Bányai, Krisztián; Gál, János; Ihász, Katalin; Kugler, Renáta; Lengyel, György; Jakab, Ferenc; Bakonyi, Tamás; Farkas, Szilvia L

    2015-11-01

    In this study, we determined the sequence of the coding region of an avian bornavirus detected in a blue-and-yellow macaw (Ara ararauna) with pathological/histopathological changes characteristic of proventricular dilatation disease. The genomic organization of the macaw bornavirus is similar to that of other bornaviruses, and its nucleotide sequence is nearly identical to the available partial parrot bornavirus 5 (PaBV-5) sequences. Phylogenetic analysis showed that these strains formed a monophyletic group distinct from other mammalian and avian bornaviruses and in calculations performed with matrix protein coding sequences, the PaBV-5 and PaBV-6 genotypes formed a common cluster, suggesting that according to the recently accepted classification system for bornaviruses, these two genotypes may belong to a new species, provisionally named Psittaciform 2 bornavirus. PMID:26282234

  12. Is there an error correcting code in the base sequence in DNA?

    PubMed Central

    Liebovitch, L S; Tao, Y; Todorov, A T; Levine, L

    1996-01-01

    Modern methods of encoding information into digital form include error check digits that are functions of the other information digits. When digital information is transmitted, the values of the error check digits can be computed from the information digits to determine whether the information has been received accurately. These error correcting codes make it possible to detect and correct common errors in transmission. The sequence of bases in DNA is also a digital code consisting of four symbols: A, C, G, and T. Does DNA also contain an error correcting code? Such a code would allow repair enzymes to protect the fidelity of nonreplicating DNA and increase the accuracy of replication. If a linear block error correcting code is present in DNA then some bases would be a linear function of the other bases in each set of bases. We developed an efficient procedure to determine whether such an error correcting code is present in the base sequence. We illustrate the use of this procedure by using it to analyze the lac operon and the gene for cytochrome c. These genes do not appear to contain such a simple error correcting code. PMID:8874027

  13. Purifying selection shapes the coincident SNP distribution of primate coding sequences.

    PubMed

    Chen, Chia-Ying; Hung, Li-Yuan; Wu, Chan-Shuo; Chuang, Trees-Juen

    2016-01-01

    Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a "signature" during primate protein evolution. PMID:27255481

  14. POLYMORPHISM IN THE CODING REGION SEQUENCE OF GDF8 GENE IN INDIAN SHEEP.

    PubMed

    Pothuraju, M; Mishra, S K; Kumar, S N; Mohamed, N F; Kataria, R S; Yadav, D K; Arora, R

    2015-11-01

    The present study was undertaken to identify polymorphism in the coding sequence of GDF8gene across indigenous meat type sheep breeds. A 1647 bp sequence was generated, encompassing 208 bp of the 5'UTR, 1128 bp of coding region (exon1, 2 and 3) as well as 311 bp of 3'UTR. The sheep and goat GDF8 gene sequences were observed to be highly conserved as compared to cattle, buffalo, horse and pig. Several nucleotide variations were observed across coding sequence of GDF8 gene in Indian sheep. Three polymorphic sites were identified in the 5'UTR, one in exon 1 and one in the exon 2 regions. Both SNPs in the exonic region were found to be non-synonymous. The mutations c.539T > G and c.821T > A discovered in this study in the exon 1 and exon 2, respectively, have not been previously reported. The information generated provides preliminary indication of the functional diversity present in Indian sheep at the coding region of GDF8gene. The novel as well as the previously reported SNPs discovered in the Indian sheep warrant further analysis to see whether they affect the phenotype. Future studies will need to establish the affect of reported SNPs in the expression of the GDF8 gene in Indian sheep population. PMID:26845859

  15. Purifying selection shapes the coincident SNP distribution of primate coding sequences

    PubMed Central

    Chen, Chia-Ying; Hung, Li-Yuan; Wu, Chan-Shuo; Chuang, Trees-Juen

    2016-01-01

    Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a “signature” during primate protein evolution. PMID:27255481

  16. Complete coding sequence of Zika virus from Martinique outbreak in 2015.

    PubMed

    Piorkowski, G; Richard, P; Baronti, C; Gallian, P; Charrel, R; Leparc-Goffart, I; de Lamballerie, X

    2016-05-01

    Zika virus is an Aedes-borne Flavivirus causing fever, arthralgia, myalgia rash, associated with Guillain-Barré syndrome and suspected to induce microcephaly in the fetus. We report here the complete coding sequence of the first characterized Caribbean Zika virus strain, isolated from a patient from Martinique in December, 2015. PMID:27274849

  17. Complete Coding Sequences of Six Toscana Virus Strains Isolated from Human Patients in France

    PubMed Central

    Leparc-Goffart, Isabelle; Piorkowski, Geraldine; Coutard, Bruno; Papageorgiou, Nicolas; De Lamballerie, Xavier

    2016-01-01

    Toscana virus (TOSV) is an arthropod-borne phlebovirus belonging to the Sandfly fever Naples virus species (genus Phlebovirus, family Bunyaviridae). Here, we report the complete coding sequences of six TOSV strains isolated from human patients having acquired the infection in southeastern France during a 12-year period. PMID:27231377

  18. First Complete Coding Sequence of a Spanish Isolate of Swine Vesicular Disease Virus

    PubMed Central

    Vázquez-Calvo, Ángela; Saiz, Juan-Carlos; Martín-Acebes, Miguel A.

    2016-01-01

    Swine vesicular disease virus (SVDV) is a porcine pathogen and a member of the Enterovirus genus within the Picornaviridae family. The SVDV genome is composed of a single-stranded RNA molecule of positive polarity. Here, we report the first complete sequence of the coding region of a Spanish SVDV isolate (SPA/1/'93). PMID:26941157

  19. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  20. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  1. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution. PMID:26833483

  2. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  3. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  4. Key for protein coding sequences identification: computer analysis of codon strategy.

    PubMed Central

    Rodier, F; Gabarro-Arpa, J; Ehrlich, R; Reiss, C

    1982-01-01

    The signal qualifying an AUG or GUG as an initiator in mRNAs processed by E. coli ribosomes is not found to be a systematic, literal homology sequence. In contrast, stability analysis reveals that initiators always occur within nucleic acid domains of low stability, for which a high A/U content is observed. Since no aminoacid selection pressure can be detected at N-termini of the proteins, the A/U enrichment results from a biased usage of the code degeneracy. A computer analysis is presented which allows easy detection of the codon strategy. N-terminal codons carry rather systematically A or U in third position, which suggests a mechanism for translation initiation and helps to detect protein coding sequences in sequenced DNA. PMID:7038623

  5. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    SciTech Connect

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes

  6. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGESBeta

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; et al

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are

  7. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-01

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. PMID:26253316

  8. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape

    PubMed Central

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-01-01

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates’ conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water–land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods’ enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. PMID:26253316

  9. Large-scale coding sequence change underlies the evolution of postdevelopmental novelty in honey bees.

    PubMed

    Jasper, William Cameron; Linksvayer, Timothy A; Atallah, Joel; Friedman, Daniel; Chiu, Joanna C; Johnson, Brian R

    2015-02-01

    Whether coding or regulatory sequence change is more important to the evolution of phenotypic novelty is one of biology's major unresolved questions. The field of evo-devo has shown that in early development changes to regulatory regions are the dominant mode of genetic change, but whether this extends to the evolution of novel phenotypes in the adult organism is unclear. Here, we conduct ten RNA-Seq experiments across both novel and conserved tissues in the honey bee to determine to what extent postdevelopmental novelty is based on changes to the coding regions of genes. We make several discoveries. First, we show that with respect to novel physiological functions in the adult animal, positively selected tissue-specific genes of high expression underlie novelty by conferring specialized cellular functions. Such genes are often, but not always taxonomically restricted genes (TRGs). We further show that positively selected genes, whether TRGs or conserved genes, are the least connected genes within gene expression networks. Overall, this work suggests that the evo-devo paradigm is limited, and that the evolution of novelty, postdevelopment, follows additional rules. Specifically, evo-devo stresses that high network connectedness (repeated use of the same gene in many contexts) constrains coding sequence change as it would lead to negative pleiotropic effects. Here, we show that in the adult animal, the converse is true: Genes with low network connectedness (TRGs and tissue-specific conserved genes) underlie novel phenotypes by rapidly changing coding sequence to perform new-specialized functions. PMID:25351750

  10. Widespread Differential Expression of Coding Region and 3' UTR Sequences in Neurons and Other Tissues.

    PubMed

    Kocabas, Arif; Duarte, Terence; Kumar, Saranya; Hynes, Mary A

    2015-12-16

    Mature messenger RNAs (mRNAs) consist of coding sequence (CDS) and 5' and 3' UTRs, typically expected to show similar abundance within a given neuron. Examining mRNA from defined neurons, we unexpectedly show extremely common unbalanced expression of cognate 3' UTR and CDS sequences; many genes show high 3' UTR relative to CDS, others show high CDS to 3' UTR. In situ hybridization (19 of 19 genes) shows a broad range of 3' UTR-to-CDS expression ratios across neurons and tissues. Ratios may be spatially graded or change with developmental age but are consistent across animals. Further, for two genes examined, a 3' UTR-to-CDS ratio above a particular threshold in any given neuron correlated with reduced or undetectable protein expression. Our findings raise questions about the role of isolated 3' UTR sequences in regulation of protein expression and highlight the importance of separately examining 3' UTR and CDS sequences in gene expression analyses. PMID:26687222

  11. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  12. The influence of protein coding sequences on protein folding rates of all-β proteins.

    PubMed

    Li, Rui Fang; Li, Hong

    2011-06-01

    It is currently believed that the protein folding rate is related to the protein structures and its amino acid sequence. However, few studies have been done on the problem that whether the protein folding rate is influenced by its corresponding mRNA sequence. In this paper, we analyzed the possible relationship between the protein folding rates and the corresponding mRNA sequences. The content of guanine and cytosine (GC content) of palindromes in protein coding sequence was introduced as a new parameter and added in the Gromiha's model of predicting protein folding rates to inspect its effect in protein folding process. The multiple linear regression analysis and jack-knife test show that the new parameter is significant. The linear correlation coefficient between the experimental and the predicted values of the protein folding rates increased significantly from 0.96 to 0.99, and the population variance decreased from 0.50 to 0.24 compared with Gromiha's results. The results show that the GC content of palindromes in the corresponding protein coding sequence really influences the protein folding rate. Further analysis indicates that this kind of effect mostly comes from the synonymous codon usage and from the information of palindrome structure itself, but not from the translation information from codons to amino acids. PMID:21613670

  13. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. PMID:27278816

  14. Sequence Prediction With Sparse Distributed Hyperdimensional Coding Applied to the Analysis of Mobile Phone Use Patterns.

    PubMed

    Rasanen, Okko J; Saarinen, Jukka P

    2016-09-01

    Modeling and prediction of temporal sequences is central to many signal processing and machine learning applications. Prediction based on sequence history is typically performed using parametric models, such as fixed-order Markov chains ( n -grams), approximations of high-order Markov processes, such as mixed-order Markov models or mixtures of lagged bigram models, or with other machine learning techniques. This paper presents a method for sequence prediction based on sparse hyperdimensional coding of the sequence structure and describes how higher order temporal structures can be utilized in sparse coding in a balanced manner. The method is purely incremental, allowing real-time online learning and prediction with limited computational resources. Experiments with prediction of mobile phone use patterns, including the prediction of the next launched application, the next GPS location of the user, and the next artist played with the phone media player, reveal that the proposed method is able to capture the relevant variable-order structure from the sequences. In comparison with the n -grams and the mixed-order Markov models, the sparse hyperdimensional predictor clearly outperforms its peers in terms of unweighted average recall and achieves an equal level of weighted average recall as the mixed-order Markov chain but without the batch training of the mixed-order model. PMID:26285224

  15. Inference of Episodic Changes in Natural Selection Acting on Protein Coding Sequences via CODEML.

    PubMed

    Bielawski, Joseph P; Baker, Jennifer L; Mingrone, Joseph

    2016-01-01

    This unit provides protocols for using the CODEML program from the PAML package to make inferences about episodic natural selection in protein-coding sequences. The protocols cover inference tasks such as maximum likelihood estimation of selection intensity, testing the hypothesis of episodic positive selection, and identifying sites with a history of episodic evolution. We provide protocols for using the rich set of models implemented in CODEML to assess robustness, and for using bootstrapping to assess if the requirements for reliable statistical inference have been met. An example dataset is used to illustrate how the protocols are used with real protein-coding sequences. The workflow of this design, through automation, is readily extendable to a larger-scale evolutionary survey. © 2016 by John Wiley & Sons, Inc. PMID:27322407

  16. CodHonEditor: Spreadsheets for Codon Optimization and Editing of Protein Coding Sequences.

    PubMed

    Takai, Kazuyuki

    2016-05-01

    Gene synthesis is getting more important with the growing availability of low-cost commercial services. The coding sequences are often "optimized" as for the relative synonymous codon usage (RSCU) before synthesis, which is generally included in the commercial services. However, the codon optimization processes are different among different providers and are often hidden from the users. Here, the d'Hondt method, which is widely adopted as a method for determining the number of seats for each party in proportional-representation public elections, is applied to RSCU fitting. This allowed me to make a set of electronic spreadsheets for manual design of protein coding sequences for expression in Escherichia coli, with which users can see the process of codon optimization and can manually edit the codons after the automatic optimization. The spreadsheets may also be useful for molecular biology education. PMID:27002987

  17. Relating incomplete data and incomplete theory

    SciTech Connect

    Binetruy, P.; Kane, G.L.; Wang, Ting T.; Nelson, Brent D.; Wang, L.-T.

    2004-11-01

    Assuming string theorists will not soon provide a compelling case for the primary theory underlying particle physics, the field will proceed as it has historically: with data stimulating and testing ideas. Ideally the soft supersymmetry breaking Lagrangian will be measured and its patterns will point to the underlying theory. But there are two new problems. First a matter of principle: the theory may be simplest at distance scales and in numbers of dimensions where direct experiments are not possible. Second a practical problem: in the foreseeable future (with mainly hadron collider data) too few observables can be measured to lead to direct connections between experiment and theory. In this paper we discuss and study these issues and consider ways to circumvent the problems, studying models to test methods. We propose a semiquantitative method for focusing and sharpening thinking when trying to relate incomplete data to incomplete theory, as will probably be necessary.

  18. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  19. Picture quality measurement based on block visibility in discrete cosine transform-coded video sequences

    NASA Astrophysics Data System (ADS)

    Coudoux, Francois-Xavier; Gazalet, Marc G.; Derviaux, Christian; Corlay, Patrick

    2001-04-01

    In this paper, we present a perceptual measures that predicts the visibility of the well-known blocking effect in discrete cosine transform coded image sequences. The main objective of this work is to use the results of the measure derived for adaptive video postprocessing, in order to significantly improve the visual quality of the video decoded sequences at the receiver. The proposed measure is based on a visual model accounting for both the spatial and temporal properties of the human visual system. The input of the visual model is the distorted sequence only. Psycho- visual experiments have been carried out to determine the eye sensitivity to blocking artifacts, by varying a number of visually significant parameters: background level, spatial, and temporal activities in the surrounding image. Results obtained for the measurement of the viability thresholds enable us to estimate the model parameters. The visual model is finally applied to real coded video sequences. The comparison of measurement results with subjective tests shows that proposed perceptual measure has a good correlation with subjective evaluation.

  20. Mutations analysis of C1 inhibitor coding sequence gene among Portuguese patients with hereditary angioedema.

    PubMed

    Martinho, A; Mendes, J; Simões, O; Nunes, R; Gomes, J; Dias Castro, E; Leiria-Pinto, P; Ferreira, M B; Pereira, C; Castel-Branco, M G; Pais, L

    2013-04-01

    Mutations that modify the amino acid sequence of C1-INH (except Val458Met) are associated with HAE. More than 200 different mutations scattering the entire C1-INH gene have been reported. The main objective of this study was to report the mutational findings in a HAE cohort of 138 Portuguese patients followed in specialized consultation all over the country. DNA was extracted from peripheral blood with QiaSymphony BioRobot (QIAGEN Portugal). The sequence reactions were performed by using a DNA sequencing kit (Big Dye terminator cycle sequencing v1.1/v3.1 from Applied Biosystems) and sequencing products were immediately submitted to direct sequencing on an Applied Biosystem 3130 DNA Analyser. DNA sequences were analyzed at four different stages. Raw data and sequence alignments of all 8 exons and intron-exon boundaries were performed for each patient individually with SeqScape software and using SERPING1 gene NG_009625 of 24,300 bp (12-March-2011) as reference sequence. Sequence comparisons among patients and controls were performed with software CodonCode Aligner v.3.7 from CodonCode Corp and with Geneious 4.5 from Biomatters Lda. A total of 94 point mutations were observed among patients, and 67% of them were located on exon 8. In addition, we noticed one not described stop codon at position c.1459 C>T in three different patients. Translation termination was also found on exon 3 and 7, as a result of mutations at positions c.481A>7, c.1174C>T. In this population, the prevalence of the missense mutation p.Arg444Cys was 39 out of 42. Mutational analysis revealed 22 different pathogenic mutations, of which 64% were not described on HAE database. Although identification of disease causing mutations is not necessary to establish HAE diagnosis, studies on gene expression and characterization of rearrangements in SERPING1 gene are suggested in order to get new insights on function and genetic tests of C1 inhibitor. PMID:23123409

  1. The Cipher Code of Simple Sequence Repeats in “Vampire Pathogens”

    PubMed Central

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W.; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like “vampire pathogens” (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  2. A direct sequence spread spectrum code acquisition circuit for wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Ghaisari, Jafar; Ferdosi, Arash

    2011-06-01

    Narrow band (NB), spread spectrum (SS), and ultra wide band (UWB) are three physical layer bandwidth types used in wireless sensor networks (WSN). SS and UWB technologies have many advantages over NB, which make them preferable for WSN. Synchronisation of different nodes in a WSN is an important task that is necessary to improve cooperation and lifetime of nodes. Code acquisition is the main step of a node's time synchronisation. In this article, a pseudo noise code generator and a code acquisition circuit are proposed, designed and tested using direct sequence SS technique. To investigate the properties of the designed circuits, simulations are carried out via Xilinx Foundation Series software in the real mode. The results demonstrate excellent performance of the proposed algorithms and circuits in all realistic conditions. The code acquisition circuit proposed an adaptive testing window for single dwell serial search method. The code acquisition circuit is a clock phase free approach, thus the clock coherency step is cancelled. Moreover, clock phase difference between transmitter and receiver nodes does not mostly affect the acquisition and thus synchronisation time.

  3. Decomposition of incomplete fusion

    SciTech Connect

    Sobotka, L.B.; Sarantities, D.G.; Stracener, D.W.; Majka, Z.; Abenante, V.; Semkow, T.M.; Hensley, D.C.; Beene, J.R.; Halbert, M.L.

    1989-01-01

    The velocity distribution of fusion-like products formed in the reaction 701 MeV /sup 28/Si+/sup 100/Mo is decomposed into 26 incomplete fusion channels. The momentum deficit of the residue per nonevaporative mass unit is approximately equal to the beam momentum per nucleon. The yields of the incomplete fusion channels correlate with the Q-value for projectile fragmentation rather than that for incomplete fusion. The backward angle multiplicities of light particles and heavy ions increase with momentum transfer, however, the heavy ion multiplicities also depend on the extent of the fragmentation of the incomplete fusion channel. These data indicate that at fixed linear momentum transfer, increased fragmentation of the unfused component is related to a reduced transferred angular momentum. 22 refs., 6 figs., 1 tab.

  4. The untranslated side of hair and skin mammalian pigmentation: Beyond coding sequences.

    PubMed

    Rouzaud, Francois; Oulmouden, Ahmad; Kos, Lidia

    2010-05-01

    For several decades, tremendous advances in studying skin and hair pigmentation of mammals have been made using Mendelian genetics principles. A number of loci and their associated traits have been extensively examined, crossings performed, and phenotypes well documented. Continuously improving PCR techniques allowed the molecular cloning and sequencing of the first pigmentation genes at the end of the 20th century, a period followed by an intense effort to detect and describe polymorphisms in the coding regions and correlate allelic combinations with the observed melanogenic phenotypes. However, a number of phenotypes and biological events could not be elucidated solely by analysis of the coding regions of genes. Messenger RNA isolation, characterization and quantification techniques allowed groups to move ahead and investigate molecular mechanisms whose secrets lay within the noncoding regions of pigmentation genes transcripts such as MC1R, ASIP, or Mitf. The untranslated elements contain specific nucleotidic sequences and structures that dramatically influence the mRNA half-life and processing thus impacting protein translation and melanin production. As we are progressively uncovering the complex processes regulating melanocyte biology, unraveling complete mRNA structures and understanding mechanisms beyond coding regions has become critical. PMID:20222017

  5. MIMO Radar System for Respiratory Monitoring Using Tx and Rx Modulation with M-Sequence Codes

    NASA Astrophysics Data System (ADS)

    Miwa, Takashi; Ogiwara, Shun; Yamakoshi, Yoshiki

    The importance of respiratory monitoring systems during sleep have increased due to early diagnosis of sleep apnea syndrome (SAS) in the home. This paper presents a simple respiratory monitoring system suitable for home use having 3D ranging of targets. The range resolution and azimuth resolution are obtained by a stepped frequency transmitting signal and MIMO arrays with preferred pair M-sequence codes doubly modulating in transmission and reception, respectively. Due to the use of these codes, Gold sequence codes corresponding to all the antenna combinations are equivalently modulated in receiver. The signal to interchannel interference ratio of the reconstructed image is evaluated by numerical simulations. The results of experiments on a developed prototype 3D-MIMO radar system show that this system can extract only the motion of respiration of a human subject 2m apart from a metallic rotatable reflector. Moreover, it is found that this system can successfully measure the respiration information of sleeping human subjects for 96.6 percent of the whole measurement time except for instances of large posture change.

  6. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  7. Structure of the gene coding for the sequence-specific DNA-methyltransferase of the B. subtilis phage SPR.

    PubMed Central

    Pósfai, G; Baldauf, F; Erdei, S; Pósfai, J; Venetianer, P; Kiss, A

    1984-01-01

    The nucleotide sequence of the gene coding for the 5'-GGCC and 5'-CCGG specific DNA methyltransferase of the Bacillus subtilis phage SPR was determined by the Maxam-Gilbert procedure. Transcriptional and translational signals of the sequence were assigned with the help of S1 mapping and translation in E. coli minicells. The gene codes for a 49 kd polypeptide. The amino acid sequence of the SPR methylase shows regions of homology with the sequence of the 5'-GGCC-specific BspRI modification methylase. Images PMID:6096817

  8. Variation in the number of nucleoli and incomplete homogenization of 18S ribosomal DNA sequences in leaf cells of the cultivated Oriental ginseng (Panax ginseng Meyer)

    PubMed Central

    Chelomina, Galina N.; Rozhkovan, Konstantin V.; Voronova, Anastasia N.; Burundukova, Olga L.; Muzarok, Tamara I.; Zhuravlev, Yuri N.

    2015-01-01

    Background Wild ginseng, Panax ginseng Meyer, is an endangered species of medicinal plants. In the present study, we analyzed variations within the ribosomal DNA (rDNA) cluster to gain insight into the genetic diversity of the Oriental ginseng, P. ginseng, at artificial plant cultivation. Methods The roots of wild P. ginseng plants were sampled from a nonprotected natural population of the Russian Far East. The slides were prepared from leaf tissues using the squash technique for cytogenetic analysis. The 18S rDNA sequences were cloned and sequenced. The distribution of nucleotide diversity, recombination events, and interspecific phylogenies for the total 18S rDNA sequence data set was also examined. Results In mesophyll cells, mononucleolar nuclei were estimated to be dominant (75.7%), while the remaining nuclei contained two to four nucleoli. Among the analyzed 18S rDNA clones, 20% were identical to the 18S rDNA sequence of P. ginseng from Japan, and other clones differed in one to six substitutions. The nucleotide polymorphism was more expressed at the positions 440–640 bp, and distributed in variable regions, expansion segments, and conservative elements of core structure. The phylogenetic analysis confirmed conspecificity of ginseng plants cultivated in different regions, with two fixed mutations between P. ginseng and other species. Conclusion This study identified the evidences of the intragenomic nucleotide polymorphism in the 18S rDNA sequences of P. ginseng. These data suggest that, in cultivated plants, the observed genome instability may influence the synthesis of biologically active compounds, which are widely used in traditional medicine. PMID:27158239

  9. Detection of almond allergen coding sequences in processed foods by real time PCR.

    PubMed

    Prieto, Nuria; Iniesto, Elisa; Burbano, Carmen; Cabanillas, Beatriz; Pedrosa, Mercedes M; Rovira, Mercè; Rodríguez, Julia; Muzquiz, Mercedes; Crespo, Jesus F; Cuadrado, Carmen; Linacero, Rosario

    2014-06-18

    The aim of this work was to develop and analytically validate a quantitative RT-PCR method, using novel primer sets designed on Pru du 1, Pru du 3, Pru du 4, and Pru du 6 allergen-coding sequences, and contrast the sensitivity and specificity of these probes. The temperature and/or pressure processing influence on the ability to detect these almond allergen targets was also analyzed. All primers allowed a specific and accurate amplification of these sequences. The specificity was assessed by amplifying DNA from almond, different Prunus species and other common plant food ingredients. The detection limit was 1 ppm in unprocessed almond kernels. The method's robustness and sensitivity were confirmed using spiked samples. Thermal treatment under pressure (autoclave) reduced yield and amplificability of almond DNA; however, high-hydrostatic pressure treatments did not produced such effects. Compared with ELISA assay outcomes, this RT-PCR showed higher sensitivity to detect almond traces in commercial foodstuffs. PMID:24857239

  10. Serotype-specific glycoprotein of simian 11 rotavirus: coding assignment and gene sequence.

    PubMed Central

    Both, G W; Mattick, J S; Bellamy, A R

    1983-01-01

    Cloned DNA copies of the double-stranded RNA genomic segments of simian 11 rotavirus have been used to determine the coding assignment for VP7, the type-specific antigen of this virus. Translation of hybrid-selected mRNAs in an in vitro system supplemented with canine pancreatic microsomes permitted VP7 to be assigned to segment 9 and the two nonstructural viral proteins NCVP4 and NCVP3, to segments 7 and 8, respectively. Hybridization of cloned DNA probes for segments 7-9 with the corresponding segments of human rotavirus Wa confirmed these assignments. The complete nucleotide sequence of gene 9 has been determined. The deduced amino acid sequence reveals VP7 to be 326 amino acids in length with two NH2-terminal hydrophobic regions and a single glycosylation site at residues 69-71. Images PMID:6304692

  11. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    PubMed Central

    2010-01-01

    HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840

  12. Cloning and nucleotide sequence of the gene coding for citrate synthase from a thermotolerant Bacillus sp

    SciTech Connect

    Schendel, F.J.; August, P.R.; Anderson, C.R.; Flickinger, M.C. ); Hanson, R.S. )

    1992-01-01

    Acetate salts are emerging as potentially attractive bulk chemicals for a variety of environmental applications, for example, as catalysts to facilitate combustion of high-sulfur coal by electrical utilities and as the biodegradable noncorrosive highway deicing salt calcium magnesium acetate. The structural gene coding for citrate synthase from the gram-positive soil isolate Bacillus sp. strain C4 (ATCC 55182) capable of secreting acetic acid at pH 5.0 to 7.0 in the presence of dolime has been cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking citrate synthase. The nucleotide sequence of the entire 3.1-kb HindIII fragment has been determined, and one major open reading frame was found coding for citrate synthase (ctsA). Citrate synthase from Bacillus sp. strain C4 was found to be a dimer (M{sub r}, 84,500) with a sub unit with an M{sub r} of 42,000. The N-terminal sequence was found to be identical with that predicted from the gene sequence. The kinetics were best fit to a bisubstrate enzyme with an ordered mechanism. Bacillus sp. strain C4 citrate synthase was not activated by potassium chloride and was not inhibited by NADH, ATP, ADP, or AMP at levels up to 1 mM. The predicted amino acid sequence was compared with that of the E. coli, Acinetobacter anitratum, Pseudomonas aeruginosa, Rickettsia prowazekii, porcine heart, and Saccharomyces cerevisiae cytoplasmic and mitochondrial enzymes.

  13. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  14. A Full-Genomic Sequence-Verified Protein-Coding Gene Collection for Francisella tularensis

    PubMed Central

    Murthy, Tal; Rolfs, Andreas; Hu, Yanhui; Shi, Zhenwei; Raphael, Jacob; Moreira, Donna; Kelley, Fontina; McCarron, Seamus; Jepson, Daniel; Taycher, Elena; Zuo, Dongmei; Mohr, Stephanie E.; Fernandez, Mauricio; Brizuela, Leonardo; LaBaer, Joshua

    2007-01-01

    The rapid development of new technologies for the high throughput (HT) study of proteins has increased the demand for comprehensive plasmid clone resources that support protein expression. These clones must be full-length, sequence-verified and in a flexible format. The generation of these resources requires automated pipelines supported by software management systems. Although the availability of clone resources is growing, current collections are either not complete or not fully sequence-verified. We report an automated pipeline, supported by several software applications that enabled the construction of the first comprehensive sequence-verified plasmid clone resource for more than 96% of protein coding sequences of the genome of F. tularensis, a highly virulent human pathogen and the causative agent of tularemia. This clone resource was applied to a HT protein purification pipeline successfully producing recombinant proteins for 72% of the genes. These methods and resources represent significant technological steps towards exploiting the genomic information of F. tularensis in discovery applications. PMID:17593976

  15. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  16. Complete coding sequence and molecular epidemiological analysis of Sindbis virus isolates from mosquitoes and humans, Finland.

    PubMed

    Sane, Jussi; Kurkela, Satu; Putkuri, Niina; Huhtamo, Eili; Vaheri, Antti; Vapalahti, Olli

    2012-09-01

    Sindbis virus (SINV) is an arthropod-borne alphavirus, which causes rash-arthritis, particularly in Finland. SINV is transmitted by mosquitoes in Finland but thus far no virus has been isolated from mosquitoes. In this study, we report the isolation of the first SINV strain from mosquitoes in Finland and its full-length protein-coding sequence. We furthermore describe the full-length coding sequence of six SINV strains previously isolated from humans in Finland and from a mosquito in Russia. The strain isolated from mosquitoes (Ilomantsi-2005M) was very closely related to all the other Northern European SINV strains. We found 9 aa positions, of which five in the nsP3 protein C terminus, to be distinctive signatures for the Northern European strains that may be associated with vector or host species adaptation. Phylogenetic analyses further indicate that SINV has a local circulation in endemic regions in Northern Europe and no novel strains are frequently being introduced. PMID:22647374

  17. Comparative Sequence Analysis of the Non-Protein-Coding Mitochondrial DNA of Inbred Rat Strains

    PubMed Central

    Abhyankar, Avinash; Park, Hee-Bok; Tonolo, Giancarlo; Luthman, Holger

    2009-01-01

    The proper function of mammalian mitochondria necessitates a coordinated expression of both nuclear and mitochondrial genes, most likely due to the co-evolution of nuclear and mitochondrial genomes. The non-protein coding regions of mitochondrial DNA (mtDNA) including the D-loop, tRNA and rRNA genes form a major component of this regulated expression unit. Here we present comparative analyses of the non-protein-coding regions from 27 Rattus norvegicus mtDNA sequences. There were two variable positions in 12S rRNA, 20 in 16S rRNA, eight within the tRNA genes and 13 in the D-loop. Only one of the three neutrality tests used demonstrated statistically significant evidence for selection in 16S rRNA and tRNA-Cys. Based on our analyses of conserved sequences, we propose that some of the variable nucleotide positions identified in 16S rRNA and tRNA-Cys, and the D-loop might be important for mitochondrial function and its regulation. PMID:19997590

  18. Human phosphoribosylformylglycineamide amidotransferase (FGARAT): regional mapping, complete coding sequence, isolation of a functional genomic clone, and DNA sequence analysis.

    PubMed

    Patterson, D; Bleskan, J; Gardiner, K; Bowersox, J

    1999-11-01

    Purines play essential roles in many cellular functions, including DNA replication, transcription, intra- and extra-cellular signaling, energy metabolism, and as coenzymes for many biochemical reactions. The de-novo synthesis of purines requires 10 enzymatic steps for the production of inosine monophosphate (IMP). Defects in purine metabolism are associated with human diseases. Further, many anticancer agents function as inhibitors of the de-novo biosynthetic pathway. Genes or cDNAs for most of the enzymes comprising this pathway have been isolated from humans or other mammals. One notable exception is the phosphoribosylformylglycineamide amidotransferase (FGARAT) gene, which encodes the fourth step of this pathway. This gene has been cloned from numerous microorganisms and from Drosophila melanogaster and C. elegans. We report here the identification of a human cDNA containing the coding region of the FGARAT mRNA and the isolation of a P1 clone that contains an intact human FGARAT gene. The P1 clone corrects the purine auxotrophy and protein deficiency of Chinese hamster ovary (CHO) cell mutants (AdeB) deficient in both the activity and the protein for FGARAT. The P1 clone was used to regionally map the FGARAT gene to chromosome region 17p13, a location consistent with our prior assignment of this gene to chromosome 17. A comparison of the DNA sequence of the human FGARAT and FGARAT DNA sequence from 17 other organisms is reported. The isolation of this gene means that DNA clones for all the 10 steps of IMP synthesis have been isolated from humans or other mammals. PMID:10548741

  19. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  20. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach

    SciTech Connect

    Uberbacher, E.C.; Mural, R.J. Univ. of Tennessee, Oak Ridge )

    1991-12-15

    Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. The authors describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, the authors method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the coding recognition module identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which the authors are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.

  1. Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach.

    PubMed Central

    Uberbacher, E C; Mural, R J

    1991-01-01

    Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts. PMID:1763041

  2. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  3. Rat hepatic glutaminase: identification of the full coding sequence and characterization of a functional promoter.

    PubMed Central

    Chung-Bok, M I; Vincent, N; Jhala, U; Watford, M

    1997-01-01

    Glutamine catabolism in mammalian liver is catalysed by a unique isoenzyme of phosphate-activated glutaminase. The full coding and 5' untranslated sequence for rat hepatic glutaminase was isolated by screening lambda ZAP cDNA libraries and a Charon 4a rat genomic library. The sequence produces a mRNA 2225 nt in length, encoding a polypeptide of 535 amino acid residues with a calculated molecular mass of 59.2 kDa. The deduced amino acid sequence of rat liver glutaminase shows 86% similarity to that of rat kidney glutaminase and 65% similarity to a putative glutaminase from Caenorhabditis elegans. A genomic clone to rat liver glutaminase was isolated that contains 3.5 kb of the gene and 7.5 kb of the 5' flanking region. The 1 kb immediately upstream of the hepatic glutaminase gene (from -1022 to +48) showed functional promoter activity in HepG2 hepatoma cells. This promoter region did not respond to treatment with cAMP, but was highly responsive (10-fold stimulation) to the synthetic glucocorticoid dexamethasone. Subsequent 5' deletion analysis indicated that the promoter region between -103 and +48 was sufficient for basal promoter activity. This region does not contain an identifiable TATA element, indicating that transcription of the glutaminase gene is driven by a TATA-less promoter. The region responsive to glucocorticoids was mapped to -252 to -103 relative to the transcription start site. PMID:9164856

  4. A transcriptional regulatory element in the coding sequence of the human Bcl-2 gene

    PubMed Central

    Lang, Georgina; Gombert, Wendy M; Gould, Hannah J

    2005-01-01

    We investigated the protein-binding sites in a DNAse I hypersensitive site associated with bcl-2 gene expression in human B cells. We mapped this hypersensitive site to the coding sequence of exon 2 of the bcl-2 gene in the bcl-2-expressing REH B-cell line. Electrophoretic mobility shift assays (EMSAs) with extracts from REH cells revealed three previously unrecognized B-Myb-binding sites in this sequence. The protein was identified as B-Myb by using a specific antibody and EMSAs. Accordingly, the levels of B-Myb and bcl-2 proteins, and of Myb EMSA activity, were correlated over a wide range of cell lines, representing different stages of B-cell development. Transfection of REH cells with antisense B-myb down-regulated EMSA activity and the level of bcl-2, and led to the apoptosis of REH cells. Transfection of the bcl-2-non-expressing RPMI 8226 cell line with a B-Myb expression vector induced B-Myb EMSA activity and the expression of bcl-2. Reporter assays indicated that the HSS8 sequence containing the three B-Myb sites may act as an enhancer when it is linked to the bcl-2 gene promoter. Interaction of B-Myb with HSS8 may enhance bcl-2 gene expression by co-operating with positive regulatory elements (e.g. previously identified B-Myb response elements) or silencing negative response elements in the bcl-2 gene promoter. PMID:15606792

  5. A molecular code dictates sequence-specific DNA recognition by homeodomains.

    PubMed Central

    Damante, G; Pellizzari, L; Esposito, G; Fogolari, F; Viglino, P; Fabbro, D; Tell, G; Formisano, S; Di Lauro, R

    1996-01-01

    Most homeodomains bind to DNA sequences containing the motif 5'-TAAT-3'. The homeodomain of thyroid transcription factor 1 (TTF-1HD) binds to sequences containing a 5'-CAAG-3' core motif, delineating a new mechanism for differential DNA recognition by homeodomains. We investigated the molecular basis of the DNA binding specificity of TTF-1HD by both structural and functional approaches. As already suggested by the three-dimensional structure of TTF-1HD, the DNA binding specificities of the TTF-1, Antennapedia and Engrailed homeodomains, either wild-type or mutants, indicated that the amino acid residue in position 54 is involved in the recognition of the nucleotide at the 3' end of the core motif 5'-NAAN-3'. The nucleotide at the 5' position of this core sequence is recognized by the amino acids located in position 6, 7 and 8 of the TTF-1 and Antennapedia homeodomains. These data, together with previous suggestions on the role of amino acids in position 50, indicate that the DNA binding specificity of homeodomains can be determined by a combinatorial molecular code. We also show that some specific combinations of the key amino acid residues involved in DNA recognition do not follow a simple, additive rule. Images PMID:8890172

  6. CoRAL: predicting non-coding RNAs from small RNA-sequencing data.

    PubMed

    Leung, Yuk Yee; Ryvkin, Paul; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San

    2013-08-01

    The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms. PMID:23700308

  7. Two Lamprey Hedgehog Genes Share Non-Coding Regulatory Sequences and Expression Patterns with Gnathostome Hedgehogs

    PubMed Central

    Ekker, Marc; Hadzhiev, Yavor; Müller, Ferenc; Casane, Didier; Magdelenat, Ghislaine; Rétaux, Sylvie

    2010-01-01

    Hedgehog (Hh) genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE) with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional) changes in the intronic/regulatory sequences. PMID:20967201

  8. The importance of being genomic: Non-coding and coding sequences suggest different models of toxin multi-gene family evolution.

    PubMed

    Malhotra, Anita; Creer, Simon; Harris, John B; Thorpe, Roger S

    2015-12-01

    Studies of multi-gene protein families, including many toxins, are crucial for understanding the role of gene duplication in generating protein diversity in general. However, many evolutionary analyses of gene families are based on coding sequences, and do not take into account many potentially confounding evolutionary factors, such as recombination and convergence due to selection. We illustrate this using snake venom gene sequences from the Phospholipase A2 (PLA2) subfamily. Novel gene sequences from 20 species of understudied Asian pitvipers were analyzed alongside available genomic PLA2 sequences from another four crotaline and several viperine species. In contrast to previous analyses of this toxin family based on cDNA sequences, we find that duplication events are concentrated at the tips of the tree, suggesting that major functions such as presynaptic neurotoxicity have evolved convergently multiple times in pitvipers. We provide evidence that this discrepancy is due to differing evolutionary patterns between introns and exons. The effects of several well-known sources of bias on the phylogeny were small, compared to the effect of analyses based on different partitions of the gene (whole gene sequence, non-coding regions, cDNA sequence). Switches of function were found to be largely associated with strong selection, and with duplication events. Use of coding sequences for phylogeny estimation potentially produces incorrect inferences about the action of selection on individual lineages and sites. Our results have major implications for phylogenomic methods of functional inference as well as for our understanding of the evolution of multigene families. PMID:26359851

  9. EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes.

    PubMed

    Jeon, Yoon-Seong; Lee, Kihyun; Park, Sang-Cheol; Kim, Bong-Soo; Cho, Yong-Joon; Ha, Sung-Min; Chun, Jongsik

    2014-02-01

    EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/. PMID:24425826

  10. Expression in bacteria of gB-glycoprotein-coding sequences of Herpes simplex virus type 2.

    PubMed

    Person, S; Warner, S C; Bzik, D J; Debroy, C; Fox, B A

    1985-01-01

    A plasmid with an insert that encodes the glycoprotein B(gB) gene of Herpes simplex virus type 2 (HSV-2) has been isolated. DNA sequences coding for a portion of the HSV-2 gB peptide were cloned into a bacterial lacZ alpha expression vector and used to transform Escherichia coli. Upon induction of lacZpo-promoted transcription, some of the bacteria became filamentous and produced inclusion bodies containing a large amount of a 65-kDal peptide that was shown to be precipitated by broad-spectrum antibodies to HSV-2 and HSV-1. The HSV-2 insert of one of these clones specifies amino acid residues corresponding to 135 through 629 of the gB of HSV-1 [Bzik et al., Virology 133 (1984) 301-314]. PMID:2412940

  11. The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

    PubMed Central

    Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

    2002-01-01

    Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388

  12. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  13. An alternative strategy to generate coding sequence of macrophage migration inhibitory factor-2 of Wuchereria bancrofti

    PubMed Central

    Chauhan, Nikhil; Hoti, S.L.

    2016-01-01

    Background & objectives: Different developmental stages of Wuchereria bancrofti, the major causal organism of lymphatic filariasis (LF), are difficult to obtain. Beside this limitation, to obtain complete coding sequence (CDS) of a gene one has to isolate mRNA and perform subsequent cDNA synthesis which is laborious and not successful at times. In this study, an alternative strategy employing polymerase chain reaction (PCR) was optimized and validated, to generate CDS of Macrophage migration Inhibitory Factor-2 (wbMIF-2), a gene expressed in the transition stage between L3 to L4. Methods: The genomic DNA of W. bancrofti microfilariae was extracted and used to amplify the full length wbMIF-2 gene (4.275 kb). This amplified product was used as a template for amplifying the exons separately, using the overlapping primers, which were then assembled through another round of PCR. Results: A simple strategy was developed based on PCR, which is used routinely in molecular biology laboratories. The amplified CDS of 363 bp of wbMIF-2 generated using genomic DNA splicing technique was devoid of any intronic sequence. Interpretation & conclusions: The cDNA of wbMIF-2 gene was successfully amplified from genomic DNA of microfilarial stage of W. bancrofti thus circumventing the use of inaccessible L3-L4 transitional stage of this parasite. This strategy is useful for generating CDS of genes from parasites that have restricted availability. PMID:27121522

  14. Detection by real time PCR of walnut allergen coding sequences in processed foods.

    PubMed

    Linacero, Rosario; Ballesteros, Isabel; Sanchiz, Africa; Prieto, Nuria; Iniesto, Elisa; Martinez, Yolanda; Pedrosa, Mercedes M; Muzquiz, Mercedes; Cabanillas, Beatriz; Rovira, Mercè; Burbano, Carmen; Cuadrado, Carmen

    2016-07-01

    A quantitative real-time PCR (RT-PCR) method, employing novel primer sets designed on Jug r 1, Jug r 3, and Jug r 4 allergen-coding sequences, was set up and validated. Its specificity, sensitivity, and applicability were evaluated. The DNA extraction method based on CTAB-phenol-chloroform was best for walnut. RT-PCR allowed a specific and accurate amplification of allergen sequence, and the limit of detection was 2.5pg of walnut DNA. The method sensitivity and robustness were confirmed with spiked samples, and Jug r 3 primers detected up to 100mg/kg of raw walnut (LOD 0.01%, LOQ 0.05%). Thermal treatment combined with pressure (autoclaving) reduced yield and amplification (integrity and quality) of walnut DNA. High hydrostatic pressure (HHP) did not produce any effect on the walnut DNA amplification. This RT-PCR method showed greater sensitivity and reliability in the detection of walnut traces in commercial foodstuffs compared with ELISA assays. PMID:26920302

  15. Sequence-Based Analysis Uncovers an Abundance of Non-Coding RNA in the Total Transcriptome of Mycobacterium tuberculosis

    PubMed Central

    Arnvig, Kristine B.; Comas, Iñaki; Thomson, Nicholas R.; Houghton, Joanna; Boshoff, Helena I.; Croucher, Nicholas J.; Rose, Graham; Perkins, Timothy T.; Parkhill, Julian; Dougan, Gordon; Young, Douglas B.

    2011-01-01

    RNA sequencing provides a new perspective on the genome of Mycobacterium tuberculosis by revealing an extensive presence of non-coding RNA, including long 5’ and 3’ untranslated regions, antisense transcripts, and intergenic small RNA (sRNA) molecules. More than a quarter of all sequence reads mapping outside of ribosomal RNA genes represent non-coding RNA, and the density of reads mapping to intergenic regions was more than two-fold higher than that mapping to annotated coding sequences. Selected sRNAs were found at increased abundance in stationary phase cultures and accumulated to remarkably high levels in the lungs of chronically infected mice, indicating a potential contribution to pathogenesis. The ability of tubercle bacilli to adapt to changing environments within the host is critical to their ability to cause disease and to persist during drug treatment; it is likely that novel post-transcriptional regulatory networks will play an important role in these adaptive responses. PMID:22072964

  16. FOURTH SEMINAR TO THE MEMORY OF D.N. KLYSHKO: Algebraic solution of the synthesis problem for coded sequences

    NASA Astrophysics Data System (ADS)

    Leukhin, Anatolii N.

    2005-08-01

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.

  17. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    PubMed

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  18. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    PubMed Central

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  19. Biosynthesis of riboflavin: cloning, sequencing, mapping, and expression of the gene coding for GTP cyclohydrolase II in Escherichia coli.

    PubMed Central

    Richter, G; Ritz, H; Katzenmeier, G; Volk, R; Kohnle, A; Lottspeich, F; Allendorf, D; Bacher, A

    1993-01-01

    GTP cyclohydrolase II catalyzes the first committed step in the biosynthesis of riboflavin. The gene coding for this enzyme in Escherichia coli has been cloned by marker rescue. Sequencing indicated an open reading frame of 588 bp coding for a 21.8-kDa peptide of 196 amino acids. The gene was mapped to a position at 28.2 min on the E. coli chromosome and is identical with ribA. GTP cyclohydrolase II was overexpressed in a recombinant strain carrying a plasmid with the cloned gene. The enzyme was purified to homogeneity from the recombinant strain. The N-terminal sequence determined by Edman degradation was identical to the predicted sequence. The sequence is homologous to the 3' part of the central open reading frame in the riboflavin operon of Bacillus subtilis. PMID:8320220

  20. Molecular cloning and sequencing of mRNAs coding for minor adult globin polypeptides of Xenopus laevis.

    PubMed Central

    Knöchel, W; Meyerhof, W; Hummel, S; Grundmann, U

    1983-01-01

    Globin mRNA was isolated from immature red blood cells of an adult Xenopus laevis female. mRNA/cDNA hybrids were integrated in the Pst I cleavage site of pBR 322 by G/C tailing, and cloned in Escherichia coli strain HB 101. By restriction site analysis as well as hybridization behaviour we identified two clones coding for minor adult alpha and beta globin chains. Nucleotide sequence analysis and derived amino acid sequences are presented. PMID:6298748

  1. A Histone Deacetylase Adjusts Transcription Kinetics at Coding Sequences during Candida albicans Morphogenesis

    PubMed Central

    Hnisz, Denes; Bardet, Anaïs F.; Nobile, Clarissa J.; Petryshyn, Andriy; Glaser, Walter; Schöck, Ulrike; Stark, Alexander; Kuchler, Karl

    2012-01-01

    Despite their classical role as transcriptional repressors, several histone deacetylases, including the baker's yeast Set3/Hos2 complex (Set3C), facilitate gene expression. In the dimorphic human pathogen Candida albicans, the homologue of the Set3C inhibits the yeast-to-filament transition, but the precise molecular details of this function have remained elusive. Here, we use a combination of ChIP–Seq and RNA–Seq to show that the Set3C acts as a transcriptional co-factor of metabolic and morphogenesis-related genes in C. albicans. Binding of the Set3C correlates with gene expression during fungal morphogenesis; yet, surprisingly, deletion of SET3 leaves the steady-state expression level of most genes unchanged, both during exponential yeast-phase growth and during the yeast-filament transition. Fine temporal resolution of transcription in cells undergoing this transition revealed that the Set3C modulates transient expression changes of key morphogenesis-related genes. These include a transcription factor cluster comprising of NRG1, EFG1, BRG1, and TEC1, which form a regulatory circuit controlling hyphal differentiation. Set3C appears to restrict the factors by modulating their transcription kinetics, and the hyperfilamentous phenotype of SET3-deficient cells can be reverted by mutating the circuit factors. These results indicate that the chromatin status at coding regions represents a dynamic platform influencing transcription kinetics. Moreover, we suggest that transcription at the coding sequence can be transiently decoupled from potentially conflicting promoter information in dynamic environments. PMID:23236295

  2. Classifier assessment and feature selection for recognizing short coding sequences of human genes.

    PubMed

    Song, Kai; Zhang, Ze; Tong, Tuo-Peng; Wu, Fang

    2012-03-01

    With the ever-increasing pace of genome sequencing, there is a great need for fast and accurate computational tools to automatically identify genes in these genomes. Although great progress has been made in the development of gene-finding algorithms during the past decades, there is still room for further improvement. In particular, the issue of recognizing short exons in eukaryotes is still not solved satisfactorily. This article is devoted to assessing various linear and kernel-based classification algorithms and selecting the best combination of Z-curve features for further improvement of the issue. Eight state-of-the-art linear and kernel-based supervised pattern recognition techniques were used to identify the short (21-192 bp) coding sequences of human genes. By measuring the prediction accuracy, the tradeoff between sensitivity and specificity and the time consumption, partial least squares (PLS) and kernel partial least squares (KPLS) algorithms were verified to be the most optimal linear and kernel-based classifiers, respectively. A surprising result was that, by making good use of the interpretability of the PLS and the Z-curve methods, 93 Z-curve features were proved to be the best selective combination. Using them, the average recognition accuracy was improved as high as 7.7% by means of KPLS when compared with what was obtained by the Fisher discriminant analysis using 189 Z-curve variables (Gao and Zhang, 2004 ). The used codes are freely available from the following approaches (implemented in MATLAB and supported on Linux and MS Windows): (1) SVM: http://www.support-vector-machines.org/SVM_soft.html. (2) GP: http://www.gaussianprocess.org. (3) KPLS and KFDA: Taylor, J.S., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK. (4) PLS: Wise, B.M., and Gallagher, N.B. 2011. PLS-Toolbox for use with MATLAB: ver 1.5.2. Eigenvector Technologies, Manson, WA. Supplementary Material for this article is

  3. An Incomplete Paradigm

    ERIC Educational Resources Information Center

    Boulding, Kenneth E.

    1978-01-01

    Examines the role of sociobiology in explaining human behavior. Recommends that sociobiologists consider both biogenetics (DNA and information coded in the genes) and noogenetics (process by which learned structures are transmitted from one generation to the next). (Author/DB)

  4. Mice carrying a complete deletion of the talin2 coding sequence are viable and fertile

    SciTech Connect

    Debrand, Emmanuel; Conti, Francesco J.; Bate, Neil; Spence, Lorraine; Mazzeo, Daniela; Pritchard, Catrin A.; Monkley, Susan J.; Critchley, David R.

    2012-09-21

    Highlights: Black-Right-Pointing-Pointer Mice lacking talin2 are viable and fertile with only a mildly dystrophic phenotype. Black-Right-Pointing-Pointer Talin2 null fibroblasts show no major defects in proliferation, adhesion or migration. Black-Right-Pointing-Pointer Maintaining a colony of talin2 null mice is difficult indicating an underlying defect. -- Abstract: Mice homozygous for several Tln2 gene targeted alleles are viable and fertile. Here we show that although the expression of talin2 protein is drastically reduced in muscle from these mice, other tissues continue to express talin2 albeit at reduced levels. We therefore generated a Tln2 allele lacking the entire coding sequence (Tln2{sup cd}). Tln2{sup cd/cd} mice were viable and fertile, and the genotypes of Tln2{sup cd/+} intercrosses were at the expected Mendelian ratio. Tln2{sup cd/cd} mice showed no major difference in body mass or the weight of the major organs compared to wild-type, although they displayed a mildly dystrophic phenotype. Moreover, Tln2{sup cd/cd} mouse embryo fibroblasts showed no obvious defects in cell adhesion, migration or proliferation. However, the number of Tln2{sup cd/cd} pups surviving to adulthood was variable suggesting that such mice have an underlying defect.

  5. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes

    PubMed Central

    Reineke, Anna R.; Bornberg-Bauer, Erich; Gu, Jenny

    2011-01-01

    The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of ∼100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs. PMID:21470961

  6. Conserved Non-Coding Sequences are Associated with Rates of mRNA Decay in Arabidopsis

    PubMed Central

    Spangler, Jacob B.; Feltus, Frank Alex

    2013-01-01

    Steady-state mRNA levels are tightly regulated through a combination of transcriptional and post-transcriptional control mechanisms. The discovery of cis-acting DNA elements that encode these control mechanisms is of high importance. We have investigated the influence of conserved non-coding sequences (CNSs), DNA patterns retained after an ancient whole genome duplication event, on the breadth of gene expression and the rates of mRNA decay in Arabidopsis thaliana. The absence of CNSs near α duplicate genes was associated with a decrease in breadth of gene expression and slower mRNA decay rates while the presence CNSs near α duplicates was associated with an increase in breadth of gene expression and faster mRNA decay rates. The observed difference in mRNA decay rate was fastest in genes with CNSs in both non-transcribed and transcribed regions, albeit through an unknown mechanism. This study supports the notion that some Arabidopsis CNSs regulate the steady-state mRNA levels through post-transcriptional control mechanisms and that CNSs also play a role in controlling the breadth of gene expression. PMID:23675377

  7. Source coherence impairments in a direct detection direct sequence optical code-division multiple-access system.

    PubMed

    Fsaifes, Ihsan; Lepers, Catherine; Lourdiane, Mounia; Gallion, Philippe; Beugin, Vincent; Guignard, Philippe

    2007-02-01

    We demonstrate that direct sequence optical code- division multiple-access (DS-OCDMA) encoders and decoders using sampled fiber Bragg gratings (S-FBGs) behave as multipath interferometers. In that case, chip pulses of the prime sequence codes generated by spreading in time-coherent data pulses can result from multiple reflections in the interferometers that can superimpose within a chip time duration. We show that the autocorrelation function has to be considered as the sum of complex amplitudes of the combined chip as the laser source coherence time is much greater than the integration time of the photodetector. To reduce the sensitivity of the DS-OCDMA system to the coherence time of the laser source, we analyze the use of sparse and nonperiodic quadratic congruence and extended quadratic congruence codes. PMID:17230236

  8. Deep sequencing of the tobacco mitochondrial transcriptome reveals expressed ORFs and numerous editing sites outside coding regions

    PubMed Central

    2014-01-01

    Background The purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function. Results The assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome. Conclusion These results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts. PMID:24433288

  9. Detection of spurious interruptions of protein-coding regions in cloned cDNA sequences by GeneMark analysis.

    PubMed

    Hirosawa, M; Ishikawa, K; Nagase, T; Ohara, O

    2000-09-01

    cDNA is an artificial copy of mRNA and, therefore, no cDNA can be completely free from suspicion of cloning errors. Because overlooking these cloning errors results in serious misinterpretation of cDNA sequences, development of an alerting system targeting spurious sequences in cloned cDNAs is an urgent requirement for massive cDNA sequence analysis. We describe here the application of a modified GeneMark program, originally designed for prokaryotic gene finding, for detection of artifacts in cDNA clones. This program serves to provide a warning when any spurious split of protein-coding regions is detected through statistical analysis of cDNA sequences based on Markov models. In this study, 817 cDNA sequences deposited in public databases by us were subjected to analysis using this alerting system to assess its sensitivity and specificity. The results indicated that any spurious split of protein-coding regions in cloned cDNAs could be sensitively detected and systematically revised by means of this system after the experimental validation of the alerts. Furthermore, this study offered us, for the first time, statistical data regarding the rates and types of errors causing protein-coding splits in cloned cDNAs obtained by conventional cloning methods. PMID:10984451

  10. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  11. A novel all-optical label processing based on multiple optical orthogonal codes sequences for optical packet switching networks

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Xu, Bo; Ling, Yun

    2008-05-01

    This paper proposes an all-optical label processing scheme that uses the multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) networks. In this scheme, each MOOCS is a permutation or combination of the multiple optical orthogonal codes (MOOC) selected from the multiple-groups optical orthogonal codes (MGOOC). Following a comparison of different optical label processing (OLP) schemes, the principles of MOOCS-OPS network are given and analyzed. Firstly, theoretical analyses are used to prove that MOOCS is able to greatly enlarge the number of available optical labels when compared to the previous single optical orthogonal code (SOOC) for OPS (SOOC-OPS) network. Then, the key units of the MOOCS-based optical label packets, including optical packet generation, optical label erasing, optical label extraction and optical label rewriting etc., are given and studied. These results are used to verify that the proposed MOOCS-OPS scheme is feasible.

  12. Design and performance of Huffman sequences in medical ultrasound coded excitation.

    PubMed

    Polpetta, Alessandro; Banelli, Paolo

    2012-04-01

    This paper deals with coded-excitation techniques for ultrasound medical echography. Specifically, linear Huffman coding is proposed as an alternative approach to other widely established techniques, such as complementary Golay coding and linear frequency modulation. The code design is guided by an optimization procedure that boosts the signal-to-noise ratio gain (GSNR) and, interestingly, also makes the code robust in pulsed-Doppler applications. The paper capitalizes on a thorough analytical model that can be used to design any linear coded-excitation system. This model highlights that the performance in frequency-dependent attenuating media mostly depends on the pulse-shaping waveform when the codes are characterized by almost ideal (i.e., Kronecker delta) autocorrelation. In this framework, different pulse shapers and different code lengths are considered to identify coded signals that optimize the contrast resolution at the output of the receiver pulse compression. Computer simulations confirm that the proposed Huffman codes are particularly effective, and that there are scenarios in which they may be preferable to the other established approaches, both in attenuating and non-attenuating media. Specifically, for a single scatterer at 150 mm in a 0.7-dB/(MHz·cm) attenuating medium, the proposed Huffman design achieves a main-to-side lobe ratio (MSR) equal to 65 dB, whereas tapered linear frequency modulation and classical complementary Golay codes achieve 35 and 45 dB, respectively. PMID:22547275

  13. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing

    PubMed Central

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-01-01

    Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA

  14. Cloning and nucleotide sequence of the genes coding for the Sau96I restriction and modification enzymes.

    PubMed Central

    Szilák, L; Venetianer, P; Kiss, A

    1990-01-01

    The genes coding for the GGNCC specific Sau96I restriction and modification enzymes were cloned and expressed in E. coli. The DNA sequence predicts a 430 amino acid protein (Mr: 49,252) for the methyltransferase and a 261 amino acid protein (Mr: 30,486) for the endonuclease. No protein sequence similarity was detected between the Sau96I methyltransferase and endonuclease. The methyltransferase contains the sequence elements characteristic for m5C-methyltransferases. In addition to this, M.Sau96I shows similarity, also in the variable region, with one m5C-methyltransferase (M.SinI) which has closely related recognition specificity (GGA/TCC). M.Sau96I methylates the internal cytosine within the GGNCC recognition sequence. The Sau96I endonuclease appears to act as a monomer. Images PMID:2204026

  15. [A comparison of the knockout efficiencies of two codon-optimized Cas9 coding sequences in zebrafish embryos].

    PubMed

    Fenghua, Zhang; Houpeng, Wang; Siyu, Huang; Feng, Xiong; Zuoyan, Zhu; Yonghua, Sun

    2016-02-01

    Recent years have witnessed the rapid development of the clustered regularly interspaced short palindromic repeats/CRISPR-associated protein(CRISPR/Cas9)system. In order to realize gene knockout with high efficiency and specificity in zebrafish, several labs have synthesized distinct Cas9 cDNA sequences which were cloned into different vectors. In this study, we chose two commonly used zebrafish-codon-optimized Cas9 coding sequences (zCas9_bz, zCas9_wc) from two different labs, and utilized them to knockout seven genes in zebrafish embryos, including the exogenous egfp and six endogenous genes (chd, hbegfa, th, eef1a1b, tyr and tcf7l1a). We compared the knockout efficiencies resulting from the two zCas9 coding sequences, by direct sequencing of PCR products, colony sequencing and phenotypic analysis. The results showed that the knockout efficiency of zCas9_wc was higher than that of zCas9_bz in all conditions. PMID:26907778

  16. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation

    PubMed Central

    McLysaght, Aoife; Guerzoni, Daniele

    2015-01-01

    The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. PMID:26323763

  17. Sequence of the nifD gene coding for the α subunit of dinitrogenase from the cyanobacterium Anabaena

    PubMed Central

    Lammers, Peter J.; Haselkorn, Robert

    1983-01-01

    The nucleotide sequence of nifD, the structural gene for the α subunit of dinitrogenase from Anabaena 7120, has been determined. The coding sequence contains 1,440 nucleotides, which predict an amino acid sequence of 480 residues and Mr of 54,283. The predicted sequence contains eight cysteines, of which five are conserved with respect to adjoining sequences and position relative to the α subunits of dinitrogenase from Azotobacter, Clostridium, and Klebsiella. Because there are also five conserved cysteines in the β subunit of Anabaena dinitrogenase [Mazur, B. J. & Chiu, C.-F. (1982) Proc. Natl. Acad. Sci. USA 79, 6782-6786], the number of cysteine residues participating as ligands to FeS clusters is likely to be 20 per α2β2 tetramer. This number is sufficient to accommodate the known four Fe4S4 clusters, leaving at least four cysteines to be shared among the two FeMo cofactors and the more poorly characterized two-iron center. Although the α- and β-subunit gene sequences are not recognizably homologous, their secondary structures, predicted from the sequences, indicate similar domains around three of the conserved cysteine residues. PMID:16593347

  18. Cortical and subcortical contributions to sequence retrieval: Schematic coding of temporal context in the neocortical recollection network.

    PubMed

    Hsieh, Liang-Tien; Ranganath, Charan

    2015-11-01

    Episodic memory entails the ability to remember what happened when. Although the available evidence indicates that the hippocampus plays a role in structuring serial order information during retrieval of event sequences, information processed in the hippocampus must be conveyed to other cortical and subcortical areas in order to guide behavior. However, the extent to which other brain regions contribute to the temporal organization of episodic memory remains unclear. Here, we examined multivoxel activity pattern changes during retrieval of learned and random object sequences, focusing on a neocortical "core recollection network" that includes the medial prefrontal cortex, retrosplenial cortex, and angular gyrus, as well as on striatal areas including the caudate nucleus and putamen that have been implicated in processing of sequence information. The results demonstrate that regions of the core recollection network carry information about temporal positions within object sequences, irrespective of object information. This schematic coding of temporal information is in contrast to the putamen, which carried information specific to objects in learned sequences, and the caudate, which carried information about objects, irrespective of sequence context. Our results suggest a role for the cortical recollection network in the representation of temporal structure of events during episodic retrieval, and highlight the possible mechanisms by which the striatal areas may contribute to this process. More broadly, the results indicate that temporal sequence retrieval is a useful paradigm for dissecting the contributions of specific brain regions to episodic memory. PMID:26209802

  19. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia

  20. Genomic Locations of Conserved Noncoding Sequences and Their Proximal Protein-Coding Genes in Mammalian Expression Dynamics.

    PubMed

    Babarinde, Isaac Adeyemi; Saitou, Naruya

    2016-07-01

    Experimental studies have found the involvement of certain conserved noncoding sequences (CNSs) in the regulation of the proximal protein-coding genes in mammals. However, reported cases of long range enhancer activities and inter-chromosomal regulation suggest that proximity of CNSs to protein-coding genes might not be important for regulation. To test the importance of the CNS genomic location, we extracted the CNSs conserved between chicken and four mammalian species (human, mouse, dog, and cattle). These CNSs were confirmed to be under purifying selection. The intergenic CNSs are often found in clusters in gene deserts, where protein-coding genes are in paucity. The distribution pattern, ChIP-Seq, and RNA-Seq data suggested that the CNSs are more likely to be regulatory elements and not corresponding to long intergenic noncoding RNAs. Physical distances between CNS and their nearest protein coding genes were well conserved between human and mouse genomes, and CNS-flanking genes were often found in evolutionarily conserved genomic neighborhoods. ChIP-Seq signal and gene expression patterns also suggested that CNSs regulate nearby genes. Interestingly, genes with more CNSs have more evolutionarily conserved expression than those with fewer CNSs. These computationally obtained results suggest that the genomic locations of CNSs are important for their regulatory functions. In fact, various kinds of evolutionary constraints may be acting to maintain the genomic locations of CNSs and protein-coding genes in mammals to ensure proper regulation. PMID:27017584

  1. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  2. Estimation of incomplete multinomial data

    NASA Technical Reports Server (NTRS)

    Credeur, K. R.

    1980-01-01

    Program estimates cell probabilities for data observed to fall in one of two or more categories when exact category cannot be determined. Data are assumed to be randomly incomplete. Estimation minimizes risk of quadratic loss. Program should be useful in projects where multinomial data is analyzed, but where observations are sometimes incomplete. Program is in FORTRAN IV and Assembler for batch execution on CYBER 173.

  3. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence.

    PubMed

    McCarthy, Elizabeth W; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed. PMID:26697035

  4. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence

    PubMed Central

    McCarthy, Elizabeth W.; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed. PMID:26697035

  5. DNA sequence variation in a non-coding region of low recombination on the human X chromosome.

    PubMed

    Kaessmann, H; Heissig, F; von Haeseler, A; Pääbo, S

    1999-05-01

    DNA sequence variation has become a major source of insight regarding the origin and history of our species as well as an important tool for the identification of allelic variants associated with disease. Comparative sequencing of DNA has to date focused mainly on mitochondrial (mt) DNA, which due to its apparent lack of recombination and high evolutionary rate lends itself well to the study of human evolution. These advantages also entail limitations. For example, the high mutation rate of mtDNA results in multiple substitutions that make phylogenetic analysis difficult and, because mtDNA is maternally inherited, it reflects only the history of females. For the history of males, the non-recombining part of the paternally inherited Y chromosome can be studied. The extent of variation on the Y chromosome is so low that variation at particular sites known to be polymorphic rather than entire sequences are typically determined. It is currently unclear how some forms of analysis (such as the coalescent) should be applied to such data. Furthermore, the lack of recombination means that selection at any locus affects all 59 Mb of DNA. To gauge the extent and pattern of point substitutional variation in non-coding parts of the human genome, we have sequenced 10 kb of non-coding DNA in a region of low recombination at Xq13.3. Analysis of this sequence in 69 individuals representing all major linguistic groups reveals the highest overall diversity in Africa, whereas deep divergences also exist in Asia. The time elapsed since the most recent common ancestor (MRCA) is 535,000+/-119,000 years. We expect this type of nuclear locus to provide more answers about the genetic origin and history of humans. PMID:10319866

  6. Beta.-glucosidase coding sequences and protein from orpinomyces PC-2

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong; Ximenes, Eduardo A.

    2001-02-06

    Provided is a novel .beta.-glucosidase from Orpinomyces sp. PC2, nucleotide sequences encoding the mature protein and the precursor protein, and methods for recombinant production of this .beta.-glucosidase.

  7. The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins

    PubMed Central

    de Leon, Miguel Ponce; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas

    2014-01-01

    For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional

  8. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago. PMID:15354359

  9. Characterization and differential expression analysis of artichoke phenylalanine ammonia-lyase-coding sequences.

    PubMed

    De Paolis, Angelo; Pignone, Domenico; Morgese, Anita; Sonnante, Gabriella

    2008-01-01

    Sequences encoding phenylalanine ammonia-lyase were isolated from artichoke, by using a sequence homology strategy, by screening a genomic library and by 3'-rapid amplification of cDNA end (RACE) technology. These analyses and Southern blots suggested that, in artichoke, phenylalanine ammonia-lyase (PAL) is encoded by a small gene family. The sequences isolated from genomic DNA possess two exons and one intron at the conserved position as in most plant pal characterized to date. The 3'-RACE analysis also indicated that each member of the artichoke pal gene family was present as a pool of transcripts, different in the length of 3'-untranslated region. The deduced amino acid sequences were highly similar to those of PAL from lettuce and sunflower. One of the artichoke pal genes was completely sequenced, and its 5' upstream region contained TATA, CAAT box and cis regulatory elements identified in other phenylpropanoid pathway genes as playing a role in UV and elicitor induction. The expression of three of the identified artichoke pal sequences was evaluated in different plant parts, in developmental stages and after wounding, using gene-specific primers/probe combinations in real-time polymerase chain reaction assays. The three putative genes were differentially expressed in the plant parts analysed and were developmentally regulated. Moreover, after leaf mechanical injury, all of them were differentially regulated. The possible involvement of the single pal genes in different physiological processes is discussed. PMID:18251868

  10. Sequence Variability in Viral Genome Non-coding Regions Likely Contribute to Observed Differences in Viral Replication Amongst MARV Strains

    PubMed Central

    ALONSO, JESUS A.; PATTERSON, JEAN L.

    2013-01-01

    The Marburg viruses Musoke (MARV-Mus) and Angola (MARV-Ang) have highly similar genomic sequences. Analysis of viral replication using various assays consistently identified MARV-Ang as the faster replicating virus. Non-coding genomic regions of negative sense RNA viruses are known to play a role in viral gene expression. A comparison of the six non-coding regions using bicistronic minigenomes revealed that the first two non-coding regions (NP / VP35 and VP35 / VP40) differed significantly in their transcriptional regulation. Deletion mutation analysis of the MARV-Mus NP / VP35 region further revealed that the MARV polymerase (L) is able to initiate production of the downstream gene without the presence of highly conserved regulatory signals. Bicistronic minigenome assays also identified the VP30 mRNA 5′ untranslated region as an rZAP-targeted RNA motif. Overall, our studies indicate that the high variation of MARV non-coding regions may play a significant role in observed differences in transcription and/or replication. PMID:23510675

  11. Next-Generation Sequencing of Protein-Coding and Long Non-protein-Coding RNAs in Two Types of Exosomes Derived from Human Whole Saliva.

    PubMed

    Ogawa, Yuko; Tsujimoto, Masafumi; Yanoshita, Ryohei

    2016-01-01

    Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators. PMID:27582331

  12. Sequence analysis and identification of new variations in the coding sequence of melatonin receptor gene (MTNR1A) of Indian Chokla sheep breed

    PubMed Central

    Saxena, Vijay Kumar; Jha, Bipul Kumar; Meena, Amar Singh; Naqvi, S.M.K.

    2014-01-01

    Melatonin receptor 1A gene is the prime receptor mediating the effect of melatonin at the neuroendocrine level for control of seasonal reproduction in sheep. The aims of this study were to examine the polymorphism pattern of coding sequence of MTNR1A gene in Chokla sheep, a breed of Indian arid tract and to identify new variations in relation to its aseasonal status. Genomic DNAs of 101 Chokla sheep were collected and an 824 bp coding sequence of Exon II was amplified. RFLP was performed with enzyme RsaI and MnlI to assess the presence of polymorphism at position C606T and G612A, respectively. Genotyping revealed significantly higher frequency of M and R alleles than m and r alleles. RR and MM were found to be dominantly present in the group of studied population. Cloning and sequencing of Exon II followed by mutation/polymorphism analysis revealed ten mutations of which three were non-synonymous mutations (G706A, C893A, G931C). G706A leads to substitution of valine by isoleucine Val125I (U14109) in the fifth transmembrane domain. C893A leads to substitution of alanine by aspartic acid in the third extracellular loop. G931C mutation brings about substitution of amino acid alanine by proline in the seventh transmembrane helix, can affect the conformational stability of the molecule. Polyphen-2 analysis revealed that the polymorphism at position 931 is potentially damaging while the mutations at positions 706 and 893 were benign. It is concluded that G931C mutation of MTNR 1A gene, may explain, in part, the importance of melatonin structure integrity in influencing seasonality in sheep. PMID:25606429

  13. ADAR2 affects mRNA coding sequence edits with only modest effects on gene expression or splicing in vivo.

    PubMed

    Dillman, Allissa A; Cookson, Mark R; Galter, Dagmar

    2016-01-01

    Adenosine deaminases bind double stranded RNA and convert adenosine to inosine. Editing creates multiple isoforms of neurotransmitter receptors, such as with Gria2. Adar2 KO mice die of seizures shortly after birth, but if the Gria2 Q/R editing site is mutated to mimic the edited version then the animals are viable. We performed RNA-Seq on frontal cortices of Adar2(-/-) Gria2(R/R) mice and littermates. We found 56 editing sites with significantly diminished editing levels in Adar2 deficient animals with the majority in coding regions. Only two genes and 3 exons showed statistically significant differences in expression levels. This work illustrates that ADAR2 is important in site-specific changes of protein coding sequences but has relatively modest effects on gene expression and splicing in the adult mouse frontal cortex. PMID:26669816

  14. Cloning and nucleotide sequence of the simian rotavirus gene 6 that codes for the major inner capsid protein.

    PubMed Central

    Estes, M K; Mason, B B; Crawford, S; Cohen, J

    1984-01-01

    The nucleotide sequence of the gene that codes for the major inner capsid protein of the simian rotavirus SA11 has been determined. A DNA copy of mRNA from gene 6 was cloned in the E. coli plasmid pBR322. The full-length gene is 1357 nucleotides long with a 5'-noncoding region of 23 nucleotides and a 3'-noncoding region of 140 nucleotides. The gene contains a single, long, open reading-frame of 1194 nucleotides capable of coding for a protein of 397 amino acids with a molecular weight of 44,816. The predicted protein product is relatively proline-rich with a net charge at neutral pH of -3.5. One stretch of 53 amino acids (encoded by nucleotides 327-485) is basic. Images PMID:6322125

  15. Construction and Analysis of a Novel 2-D Optical Orthogonal Codes Based on Modified One-coincidence Sequence

    NASA Astrophysics Data System (ADS)

    Ji, Jianhua; Wang, Yanfen; Wang, Ke; Xu, Ming; Zhang, Zhipeng; Yang, Shuwen

    2013-09-01

    A new two-dimensional OOC (optical orthogonal codes) named PC/MOCS is constructed, using PC (prime code) for time spreading and MOCS (modified one-coincidence sequence) for wavelength hopping. Compared with PC/PC, the number of wavelengths for PC/MOCS is not limited to a prime number. Compared with PC/OCS, the length of MOCS need not be expanded to the same length of PC. PC/MOCS can be constructed flexibly, and also can use available wavelengths effectively. Theoretical analysis shows that PC/MOCS can reduce the bit error rate (BER) of OCDMA system, and can support more users than PC/PC and PC/OCS.

  16. Bean yellow mosaic, clover yellow vein, and pea mosaic are distinct potyviruses: evidence from coat protein gene sequences and molecular hybridization involving the 3' non-coding regions.

    PubMed

    Tracy, S L; Frenkel, M J; Gough, K H; Hanna, P J; Shukla, D D

    1992-01-01

    The sequences of the 3' 1019 nucleotides of the genome of an atypical strain of bean yellow mosaic virus (BYMV-S) and of the 3' 1018 nucleotides of the clover yellow vein virus (CYVV-B) genome have been determined. These sequences contain the complete coding region of the viral coat protein followed by a 3' non-coding region of 173 and 178 nucleotides for BYMV-S and CYVV-B, respectively. When the deduced amino acid sequences of the coat protein coding regions were compared, a sequence identity of 77% was found between the two viruses, and optimal alignment of the 3' untranslated regions of BYMV-S and CYVV-B gave a 65% identity. However, the degree of homology of the amino acid sequences of coat proteins of BYMV-S with the published sequences for three other strains of BYMV ranged from 88% to 94%, while the sequence homology of the 3' untranslated regions between the four strains of BYMV ranged between 86% and 95%. Amplified DNA probes corresponding to the 3' non-coding regions of BYMV-S and CYVV-B showed strong hybridization only with the strains of their respective viruses and not with strains of other potyviruses, including pea mosaic virus (PMV). The relatively low sequence identities between the BYMV-S and CYVV-B coat proteins and their 3' non-coding regions, together with the hybridization results, indicate that BYMV, CYVV, and PMV are distinct potyviruses. PMID:1731696

  17. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  18. Coding Local and Global Binary Visual Features Extracted From Video Sequences.

    PubMed

    Baroffio, Luca; Canclini, Antonio; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2015-11-01

    Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the bag-of-visual word model. Several applications, including, for example, visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget while attaining a target level of efficiency. In this paper, we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can conveniently be adopted to support the analyze-then-compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs the visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the compress-then-analyze (CTA) paradigm. In this paper, we experimentally compare the ATC and the CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: 1) homography estimation and 2) content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with the CTA, especially in bandwidth limited scenarios. PMID:26080384

  19. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  20. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences.

    PubMed

    McGuire, Leah M; Telian, Gregory; Laboy-Juárez, Keven J; Miyashita, Toshio; Lee, Daniel J; Smith, Katherine A; Feldman, Daniel E

    2016-08-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5-20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5-10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  1. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences

    PubMed Central

    Miyashita, Toshio; Lee, Daniel J.; Smith, Katherine A.; Feldman, Daniel E.

    2016-01-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5–20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5–10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  2. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  3. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives

    PubMed Central

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  4. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives.

    PubMed

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; Salvatore, Francesco; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  5. Molecular differentiation of Nosema apis and Nosema ceranae based on species-specific sequence differences in a protein coding gene.

    PubMed

    Gisder, Sebastian; Genersch, Elke

    2013-05-01

    Nosema apis and Nosema ceranae are two microsporidian pathogens of the European honey bee, Apis mellifera. There is evidence that N. ceranae is more virulent than N. apis subject to environmental factors like climate. This makes N. ceranae one of the suspects in the increasing colony losses recently observed in many regions of the world. Correct differentiation between N. apis and N. ceranae is important and best accomplished by molecular methods. So far only protocols based on species-specific sequence differences in the 16S rRNA gene are available. However, recent studies indicated that these methods may lead to confusing results due to polymorphisms in and recombination between the multi-copy 16S rRNA genes. To solve this problem and to provide a reliable molecular tool for the differentiation between the two bee pathogenic microsporidia we here present and evaluate a duplex-PCR protocol based on species-specific sequence differences in the highly conserved gene coding for the DNA-dependent RNA polymerase II largest subunit. A total of 102 honey bee samples were analyzed by the novel PCR protocol and the results were compared with the results of the originally published PCR-RFLP analysis and two recently published differentiation protocols, based on 16S rRNA sequence differences. Although the novel PCR protocol proved to be as reliable as the 16S rRNA gene based PCR-RFLP it was superior to simple 16S rRNA based PCR protocols which tended to overestimate the rate of N. ceranae infections. Therefore, we propose that species-specific sequence differences of highly conserved protein coding genes should become the preferred molecular tool for differentiation of Nosema spp. PMID:23352902

  6. cDNA sequence coding for the alpha'-chain of the third complement component in the African lungfish.

    PubMed

    Sato, A; Sültmann, H; Mayer, W E; Figueroa, F; Tichy, H; Klein, J

    1999-04-01

    cDNA clones coding for almost the entire C3 alpha-chain of the African lungfish (Protopterus aethiopicus), a representative of the Sarcopterygii (lobe-finned fishes), were sequenced and characterized. From the sequence it is deduced that the lungfish C3 molecule is probably a disulphide-bonded alpha:beta dimer similar to that of the C3 components of other jawed vertebrates. The deduced sequence contains conserved sites presumably recognized by proteolytic enzymes (e.g. factor I) involved in the activation and inactivation of the component. It also contains the conserved thioester region and the putative site for binding properdin. However, the site for the interaction with complement receptor 2 and factor H are poorly conserved. Either complement receptor 2 and factor H are not present in the lungfish or they bind to different residues at the same or a different site than mammalian complement receptor 2 and factor H. The C3 alpha-chain sequences faithfully reflect the phylogenetic relationships among vertebrate classes and can therefore be used to help to resolve the long-standing controversy concerning the origin of the tetrapods. PMID:10219761

  7. Cloning and DNA sequence of the gene coding for Clostridium thermocellum cellulase Ss (CelS), a major cellulosome component.

    PubMed Central

    Wang, W K; Kruus, K; Wu, J H

    1993-01-01

    Clostridium thermocellum ATCC 27405 produces an extracellular cellulase system capable of hydrolyzing crystalline cellulose. The enzyme system involves a multicomponent protein aggregate (the cellulosome) with a total molecular weight in the millions, impeding mechanistic studies. However, two major components of the aggregate, SS (M(r) = 82,000) and SL (M(r) = 250,000), which act synergistically to hydrolyze crystalline cellulose, have been identified (J. H. D. Wu, W. H. Orme-Johnson, and A. L. Demain, Biochemistry 27:1703-1709, 1988). To further study this synergism, we cloned and sequenced the gene (celS) coding for the SS (CelS) protein by using a degenerate, inosine-containing oligonucleotide probe whose sequence was derived from the N-terminal amino acid sequence of the CelS protein. The open reading frame of celS consisted of 2,241 bp encoding 741 amino acid residues. It encoded the N-terminal amino acid sequence and two internal peptide sequences determined for the native CelS protein. A putative ribosome binding site was identified at the 5' end of the gene. A putative signal peptide of 27 amino acid residues was adjacent to the N terminus of the CelS protein. The predicted molecular weight of the secreted protein was 80,670. The celS gene contained a conserved reiterated sequence encoding 24 amino acid residues found in proteins encoded by many other clostridial cel or xyn genes. A palindromic structure was found downstream from the open reading frame. The celS gene is unique among the known cel genes of C. thermocellum. However, it is highly homologous to the partial open reading frame found in C. cellulolyticum and in Caldocellum saccharolyticum, indicating that these genes belong to a new family of cel genes. Images PMID:8444792

  8. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  9. Cloning and DNA sequence of the gene coding for Bacillus stearothermophilus T-6 xylanase.

    PubMed Central

    Gat, O; Lapidot, A; Alchanati, I; Regueros, C; Shoham, Y

    1994-01-01

    Bacillus stearothermophilus T-6 produces an extracellular thermostable xylanase. Affinity-purified polyclonal serum raised against the enzyme was used to screen a genomic library of B. stearothermophilus T-6 constructed in lambda-EMBL3. Two positive phages were isolated, both containing similar 13-kb inserts, and their lysates exhibited xylanase activity. A 3,696-bp SalI-BamHI fragment containing the xylanase gene was subcloned in Escherichia coli and subsequently sequenced. The open reading frame of xylanase T-6 consists of 1,236 bp. On the basis of sequence similarity, two possible -10 and -35 regions, a ribosome-binding site at the 5' end of the gene and a potential transcriptional termination motif at the 3' end of the gene, were identified. From the previously known N-terminal amino acid sequence of xylanase T-6 and the possible ribosome-binding site, a putative 28-amino-acid signal peptide was deduced. The mature xylanase T-6 consists of 379 amino acids with a calculated molecular weight and pI of 43,808 and 6.88, respectively. Multiple alignment of beta-glycanase amino acid sequences revealed highly conserved regions. Northern (RNA) blot analysis indicated that the xylanase T-6 transcript is about 1.4 kb and that the induction of this enzyme synthesis by xylose is on the transcriptional level. Images PMID:8031084

  10. Molecular phylogenetic analysis in Hammondia-like organisms based on partial Hsp70 coding sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The 70-kDa heat shock protein (Hsp70) sequences are considered one of the most conserved proteins in all domain of life from Archaea to eukaryotes. Hammondia heydorni, H. hammondi, Toxoplasma gondii, Neospora hughesi and N. caninum (Hammondia-like organisms) are closely related tissue cyst-forming c...

  11. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  12. Integration of expressed sequence tag data flanking predicted RNA secondary structures facilitates novel non-coding RNA discovery.

    PubMed

    Krzyzanowski, Paul M; Price, Feodor D; Muro, Enrique M; Rudnicki, Michael A; Andrade-Navarro, Miguel A

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  13. Tn9 and IS1 inserts in a ribosomal ribonucleic acid operon of Escherichia coli are incompletely polar.

    PubMed Central

    Brewster, J M; Morgan, E A

    1981-01-01

    Transcription is known to be coupled to translation in many or all bacterial operons which code for proteins. In these operons, nonsense codons which prevent normal translation often result in premature termination of transcription (polarity). However, efficient transcription of ribosomal ribonucleic acid operons (rrn operons) occurs, although rrn transcripts are not translated. It therefore seemed possible that insertion sequences and transposable elements which are polar in protein-coding operons might not be polar in rrn operons. Previously, it has been shown (E. A. Morgan, Cell 21:257-265, 1980) that Tn10 is incompletely polar in the rrnX operon. Here we show that the transposon Tn9 and the insertion sequence IS1 also incompletely polar in rrnX. In normal cells expression of sequences distal to the insertions can be detected by genetic methods. In ultraviolet-irradiated cells expression of distal sequences is about 80% of that observed in uninterrupted rrnX operons. These observations provide evidence that ribonucleic acid polymerase molecules beginning at rrnX promoters can read through Tn9 and IS1 and that, at least in ultraviolet-irradiated cells, read-through is very efficient. Images PMID:6171559

  14. Complete Coding Sequences of One H9 and Three H7 Low-Pathogenic Influenza Viruses Circulating in Wild Birds in Belgium, 2009 to 2012

    PubMed Central

    Rosseel, Toon; Marché, Sylvie; Steensels, Mieke; Vangeluwe, Didier; Linden, Annick; van den Berg, Thierry; Lambrecht, Bénédicte

    2016-01-01

    The complete coding sequences of four avian influenza A viruses (two H7N7, one H7N1, and one H9N2) circulating in wild waterfowl in Belgium from 2009 to 2012 were determined using Illumina sequencing. All viral genome segments represent viruses circulating in the Eurasian wild bird population. PMID:27284153

  15. Complete Coding Sequences of One H9 and Three H7 Low-Pathogenic Influenza Viruses Circulating in Wild Birds in Belgium, 2009 to 2012.

    PubMed

    Van Borm, Steven; Rosseel, Toon; Marché, Sylvie; Steensels, Mieke; Vangeluwe, Didier; Linden, Annick; van den Berg, Thierry; Lambrecht, Bénédicte

    2016-01-01

    The complete coding sequences of four avian influenza A viruses (two H7N7, one H7N1, and one H9N2) circulating in wild waterfowl in Belgium from 2009 to 2012 were determined using Illumina sequencing. All viral genome segments represent viruses circulating in the Eurasian wild bird population. PMID:27284153

  16. DNA sequencing and bar-coding using solid-state nanopores.

    PubMed

    Atas, Evrim; Singer, Alon; Meller, Amit

    2012-12-01

    Nanopores have emerged as a prominent single-molecule analytic tool with particular promise for genomic applications. In this review, we discuss two potential applications of the nanopore sensors: First, we present a nanopore-based single-molecule DNA sequencing method that utilizes optical detection for massively parallel throughput. Second, we describe a method by which nanopores can be used as single-molecule genotyping tools. For DNA sequencing, the distinction among the four types of DNA nucleobases is achieved by employing a biochemical procedure for DNA expansion. In this approach, each nucleobase in each DNA strand is converted into one of four predefined unique 16-mers in a process that preserves the nucleobase sequence. The resulting converted strands are then hybridized to a library of four molecular beacons, each carrying a unique fluorophore tag, that are perfect complements to the 16-mers used for conversion. Solid-state nanopores are then used to sequentially remove these beacons, one after the other, leading to a series of photon bursts in four colors that can be optically detected. Single-molecule genotyping is achieved by tagging the DNA fragments with γ-modified synthetic peptide nucleic acid probes coupled to an electronic characterization of the complexes using solid-state nanopores. This method can be used to identify and differentiate genes with a high level of sequence similarity at the single-molecule level, but different pathology or response to treatment. We will illustrate this method by differentiating the pol gene for two highly similar human immunodeficiency virus subtypes, paving the way for a novel diagnostics platform for viral classification. PMID:23109189

  17. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences

    PubMed Central

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096

  18. Sequencing rare and common APOL1 coding variants to determine kidney disease risk.

    PubMed

    Limou, Sophie; Nelson, George W; Lecordier, Laurence; An, Ping; O'hUigin, Colm S; David, Victor A; Binns-Roemer, Elizabeth A; Guiblet, Wilfried M; Oleksyk, Taras K; Pays, Etienne; Kopp, Jeffrey B; Winkler, Cheryl A

    2015-10-01

    A third of African Americans with sporadic focal segmental glomerulosclerosis (FSGS) or HIV-associated nephropathy (HIVAN) do not carry APOL1 renal risk genotypes. This raises the possibility that other APOL1 variants may contribute to kidney disease. To address this question, we sequenced all APOL1 exons in 1437 Americans of African and European descent, including 464 patients with biopsy-proven FSGS/HIVAN. Testing for association with 33 common and rare variants with FSGS/HIVAN revealed no association independent of strong recessive G1 and G2 effects. Seeking additional variants that might have been under selection by pathogens and could represent candidates for kidney disease risk, we also sequenced an additional 1112 individuals representing 53 global populations. Except for G1 and G2, none of the 7 common codon-altering variants showed evidence of selection or could restore lysis against trypanosomes causing human African trypanosomiasis. Thus, only APOL1 G1 and G2 confer renal risk, and other common and rare APOL1 missense variants, including the archaic G3 haplotype, do not contribute to sporadic FSGS and HIVAN in the US population. Hence, in most potential clinical or screening applications, our study suggests that sequencing APOL1 exons is unlikely to bring additional information compared to genotyping only APOL1 G1 and G2 risk alleles. PMID:25993319

  19. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences.

    PubMed

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096

  20. Cloning and sequencing of a gene coding for an actin binding protein of Saccharomyces exiguus.

    PubMed

    Lange, U; Steiner, S; Grolig, F; Wagner, G; Philippsen, P

    1994-03-01

    The actin binding protein Abp1p of the yeast Saccharomyces cervisiae is thought to be involved in the spatial organisation of cell surface growth. It contains a potential actin binding domain and an SH-3 region, a common motif of many signal transduction proteins [1]. We have cloned and sequenced an ABP1 homologous gene of Saccharomyces exiguus, a yeast which is only distantly related to S. cerevisiae. The protein encoded by this gene is slightly larger than the respective S. cerevisiae protein (617 versus 592 amino acids). The two genes are 67.4% identical and the deduced amino acid sequences share an overall identity of 59.8%. The most conserved regions are the 148 N-terminal amino acids containing the potential actin binding site and the 58 C-terminal amino acids including the SH3 domain. In addition, both proteins contain a repeated motif of unknown function which is rich in glutamic acids with the sequence EEEEEEEAPAPSLPSR in the S. exiguus Abp1p. PMID:8110838

  1. MiR-10a* up-regulates coxsackievirus B3 biosynthesis by targeting the 3D-coding sequence

    PubMed Central

    Tong, Lei; Lin, Lexun; Wu, Shuo; Guo, Zhiwei; Wang, Tianying; Qin, Ying; Wang, Ruixue; Zhong, Xiaoyan; Wu, Xia; Wang, Yan; Luan, Tian; Wang, Qiang; Li, Yunxia; Chen, Xiaofeng; Zhang, Fengmin; Zhao, Wenran; Zhong, Zhaohua

    2013-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs that can posttranscriptionally regulate gene expression by targeting messenger RNAs. During miRNA biogenesis, the star strand (miRNA*) is generally degraded to a low level in the cells. However, certain miRNA* express abundantly and can be recruited into the silencing complex to regulate gene expression. Most miRNAs function as suppressive regulators on gene expression. Group B coxsackieviruses (CVB) are the major pathogens of human viral myocarditis and dilated cardiomyopathy. CVB genome is a positive-sense, single-stranded RNA. Our previous study shows that miR-342-5p can suppress CVB biogenesis by targeting its 2C-coding sequence. In this study, we found that the miR-10a duplex could significantly up-regulate the biosynthesis of CVB type 3 (CVB3). Further study showed that it was the miR-10a star strand (miR-10a*) that augmented CVB3 biosynthesis. Site-directed mutagenesis showed that the miR-10a* target was located in the nt6818–nt6941 sequence of the viral 3D-coding region. MiR-10a* was detectable in the cardiac tissues of suckling Balb/c mice, suggesting that miR-10a* may impact CVB3 replication during its cardiac infection. Taken together, these data for the first time show that miRNA* can positively modulate gene expression. MiR-10a* might be involved in the CVB3 cardiac pathogenesis. PMID:23389951

  2. A Unified Mathematical Framework for Coding Time, Space, and Sequences in the Hippocampal Region

    PubMed Central

    MacDonald, Christopher J.; Tiganj, Zoran; Shankar, Karthik H.; Du, Qian; Hasselmo, Michael E.; Eichenbaum, Howard

    2014-01-01

    The medial temporal lobe (MTL) is believed to support episodic memory, vivid recollection of a specific event situated in a particular place at a particular time. There is ample neurophysiological evidence that the MTL computes location in allocentric space and more recent evidence that the MTL also codes for time. Space and time represent a similar computational challenge; both are variables that cannot be simply calculated from the immediately available sensory information. We introduce a simple mathematical framework that computes functions of both spatial location and time as special cases of a more general computation. In this framework, experience unfolding in time is encoded via a set of leaky integrators. These leaky integrators encode the Laplace transform of their input. The information contained in the transform can be recovered using an approximation to the inverse Laplace transform. In the temporal domain, the resulting representation reconstructs the temporal history. By integrating movements, the equations give rise to a representation of the path taken to arrive at the present location. By modulating the transform with information about allocentric velocity, the equations code for position of a landmark. Simulated cells show a close correspondence to neurons observed in various regions for all three cases. In the temporal domain, novel secondary analyses of hippocampal time cells verified several qualitative predictions of the model. An integrated representation of spatiotemporal context can be computed by taking conjunctions of these elemental inputs, leading to a correspondence with conjunctive neural representations observed in dorsal CA1. PMID:24672015

  3. Acoustic radiation force impulse (ARFI) imaging of zebrafish embryo by high-frequency coded excitation sequence.

    PubMed

    Park, Jinhyoung; Lee, Jungwoo; Lau, Sien Ting; Lee, Changyang; Huang, Ying; Lien, Ching-Ling; Kirk Shung, K

    2012-04-01

    Acoustic radiation force impulse (ARFI) imaging has been developed as a non-invasive method for quantitative illustration of tissue stiffness or displacement. Conventional ARFI imaging (2-10 MHz) has been implemented in commercial scanners for illustrating elastic properties of several organs. The image resolution, however, is too coarse to study mechanical properties of micro-sized objects such as cells. This article thus presents a high-frequency coded excitation ARFI technique, with the ultimate goal of displaying elastic characteristics of cellular structures. Tissue mimicking phantoms and zebrafish embryos are imaged with a 100-MHz lithium niobate (LiNbO₃) transducer, by cross-correlating tracked RF echoes with the reference. The phantom results show that the contrast of ARFI image (14 dB) with coded excitation is better than that of the conventional ARFI image (9 dB). The depths of penetration are 2.6 and 2.2 mm, respectively. The stiffness data of the zebrafish demonstrate that the envelope is harder than the embryo region. The temporal displacement change at the embryo and the chorion is as large as 36 and 3.6 μm. Consequently, this high-frequency ARFI approach may serve as a remote palpation imaging tool that reveals viscoelastic properties of small biological samples. PMID:22101757

  4. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese

    PubMed Central

    Tang, Clara S.; Zhang, He; Cheung, Chloe Y. Y.; Xu, Ming; Ho, Jenny C. Y.; Zhou, Wei; Cherny, Stacey S.; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M.; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H. Y.; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S. M.; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C. B.; Hveem, Kristian; Cheung, Bernard M. Y.; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K.; Huo, Yong; Sham, Pak C.; Lam, Karen S. L.; Willer, Cristen J.; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10−7), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci—PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser—also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  5. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.

    PubMed

    Rodrigue, Nicolas; Philippe, Hervé; Lartillot, Nicolas

    2010-03-01

    Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory. PMID:20176949

  6. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese.

    PubMed

    Tang, Clara S; Zhang, He; Cheung, Chloe Y Y; Xu, Ming; Ho, Jenny C Y; Zhou, Wei; Cherny, Stacey S; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H Y; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S M; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C B; Hveem, Kristian; Cheung, Bernard M Y; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K; Huo, Yong; Sham, Pak C; Lam, Karen S L; Willer, Cristen J; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10(-7)), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci-PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser-also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  7. Non-Coding RNA: Sequence-Specific Guide for Chromatin Modification and DNA Damage Signaling

    PubMed Central

    Francia, Sofia

    2015-01-01

    Chromatin conformation shapes the environment in which our genome is transcribed into RNA. Transcription is a source of DNA damage, thus it often occurs concomitantly to DNA damage signaling. Growing amounts of evidence suggest that different types of RNAs can, independently from their protein-coding properties, directly affect chromatin conformation, transcription and splicing, as well as promote the activation of the DNA damage response (DDR) and DNA repair. Therefore, transcription paradoxically functions to both threaten and safeguard genome integrity. On the other hand, DNA damage signaling is known to modulate chromatin to suppress transcription of the surrounding genetic unit. It is thus intriguing to understand how transcription can modulate DDR signaling while, in turn, DDR signaling represses transcription of chromatin around the DNA lesion. An unexpected player in this field is the RNA interference (RNAi) machinery, which play roles in transcription, splicing and chromatin modulation in several organisms. Non-coding RNAs (ncRNAs) and several protein factors involved in the RNAi pathway are well known master regulators of chromatin while only recent reports show their involvement in DDR. Here, we discuss the experimental evidence supporting the idea that ncRNAs act at the genomic loci from which they are transcribed to modulate chromatin, DDR signaling and DNA repair. PMID:26617633

  8. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway. PMID:26025428

  9. Identification of internal transcribed spacer sequence motifs in truffles: a first step toward their DNA bar coding.

    PubMed

    El Karkouri, Khalid; Murat, Claude; Zampieri, Elisa; Bonfante, Paola

    2007-08-01

    This work presents DNA sequence motifs from the internal transcribed spacer (ITS) of the nuclear rRNA repeat unit which are useful for the identification of five European and Asiatic truffles (Tuber magnatum, T. melanosporum, T. indicum, T. aestivum, and T. mesentericum). Truffles are edible mycorrhizal ascomycetes that show similar morphological characteristics but that have distinct organoleptic and economic values. A total of 36 out of 46 ITS1 or ITS2 sequence motifs have allowed an accurate in silico distinction of the five truffles to be made (i.e., by pattern matching and/or BLAST analysis on downloaded GenBank sequences and directly against GenBank databases). The motifs considered the intraspecific genetic variability of each species, including rare haplotypes, and assigned their respective species from either the ascocarps or ectomycorrhizas. The data indicate that short ITS1 or ITS2 motifs (< or = 50 bp in size) can be considered promising tools for truffle species identification. A dot blot hybridization analysis of T. magnatum and T. melanosporum compared with other close relatives or distant lineages allowed at least one highly specific motif to be identified for each species. These results were confirmed in a blind test which included new field isolates. The current work has provided a reliable new tool for a truffle oligonucleotide bar code and identification in ecological and evolutionary studies. PMID:17601808

  10. Genomic DNA sequence of a rice gene coding for a pullulanase-type of starch debranching enzyme.

    PubMed

    Francisco, P B; Zhang, Y; Park, S Y; Ogata, N; Yamanouchi, H; Nakamura, Y

    1998-09-01

    A genomic DNA containing a rice (Oryza sativa L., cv. Norin-8) gene coding for a pullulanase-type starch debranching enzyme (EC 3.2.1. 41) was sequenced (EMBL/GenBank/DDBJ accession number AB012915). Along the 15, 248 bp DNA, the pullulanase gene is split into 26 exons. The four pullulanase consensus regions are positioned in the middle portion of the sequence and are separated by long introns and 1-3 exons. Comparison of the rice cv. Norin-8 pullulanase genomic structure with that of barley pullulanase (limit dextrinase) (F. Lok et al., EMBL/GenBank/DDBJ accession number AF022725) indicates that most of the pullulanase exons are highly conserved. Alignment of the nucleotide bases of rice exon 8 with those of barley exon 8-intron 8-exon 9 fragment suggests that the 85 bp internal sequence of rice exon 8 was originally an intron, a possibility further indicated by the absence in barley and spinach (A. Renz et al., EMBL/GenBank/DDBJ accession number X83969) pullulanases of amino acid residues encoded by the 85 bp fragment. PMID:9748665

  11. Phylogenetic relationships among insect orders based on three nuclear protein-coding gene sequences.

    PubMed

    Ishiwata, Keisuke; Sasaki, Go; Ogawa, Jiro; Miyata, Takashi; Su, Zhi-Hui

    2011-02-01

    Many attempts to resolve the phylogenetic relationships of higher groups of insects have been made based on both morphological and molecular evidence; nonetheless, most of the interordinal relationships of insects remain unclear or are controversial. As a new approach, in this study we sequenced three nuclear genes encoding the catalytic subunit of DNA polymerase delta and the two largest subunits of RNA polymerase II from all insect orders. The predicted amino acid sequences (In total, approx. 3500 amino acid sites) of these proteins were subjected to phylogenetic analyses based on the maximum likelihood and Bayesian analysis methods with various models. The resulting trees strongly support the monophyly of Palaeoptera, Neoptera, Polyneoptera, and Holometabola, while within Polyneoptera, the groupings of Isoptera/"Blattaria"/Mantodea (Superorder Dictyoptera), Dictyoptera/Zoraptera, Dermaptera/Plecoptera, Mantophasmatodea/Grylloblattodea, and Embioptera/Phasmatodea are supported. Although Paraneoptera is not supported as a monophyletic group, the grouping of Phthiraptera/Psocoptera is robustly supported. The interordinal relationships within Holometabola are well resolved and strongly supported that the order Hymenoptera is the sister lineage to all other holometabolous insects. The other orders of Holometabola are separated into two large groups, and the interordinal relationships of each group are (((Siphonaptera, Mecoptera), Diptera), (Trichoptera, Lepidoptera)) and ((Coleoptera, Strepsiptera), (Neuroptera, Raphidioptera, Megaloptera)). The sister relationship between Strepsiptera and Diptera are significantly rejected by all the statistical tests (AU, KH and wSH), while the affinity between Hymenoptera and Mecopterida are significantly rejected only by AU and KH tests. Our results show that the use of amino acid sequences of these three nuclear genes is an effective approach for resolving the relationships of higher groups of insects. PMID:21075208

  12. Incomplete

    ERIC Educational Resources Information Center

    Stauffer, Sandra L.

    2011-01-01

    Elizabeth Parker's reflection on her experience as a musician educator working with children in an urban non-profit context is an uncomfortable read for me. In a courageous act, Parker makes public her private misgivings about her past experience and allows scrutiny of them in the form of two public commentaries as well as the private musings of…

  13. Matriculation Research Report: Incomplete Grades; Data & Analysis.

    ERIC Educational Resources Information Center

    Gerda, Joe

    The policy on incomplete grades at California's College of the Canyons states that incompletes may only be given under circumstances beyond students' control and that students must make arrangements with faculty prior to the end of the semester to clear the incomplete. Failure to complete an incomplete may result in an "F" grade. While incompletes…

  14. Identification of small non-coding RNAs in the planarian Dugesia japonica via deep sequencing.

    PubMed

    Qin, Yun-Fei; Zhao, Jin-Mei; Bao, Zhen-Xia; Zhu, Zhao-Yu; Mai, Jia; Huang, Yi-Bo; Li, Jian-Biao; Chen, Ge; Lu, Ping; Chen, San-Jun; Su, Lin-Lin; Fang, Hui-Min; Lu, Ji-Ke; Zhang, Yi-Zhe; Zhang, Shou-Tao

    2012-05-01

    Freshwater planarian flatworm possesses an extraordinary ability to regenerate lost body parts after amputation; it is perfect organism model in regeneration and stem cell biology. Recently, small RNAs have been an increasing concern and studied in many aspects, including regeneration and stem cell biology, among others. In the current study, the large-scale cloning and sequencing of sRNAs from the intact and regenerative planarian Dugesia japonica are reported. Sequence analysis shows that sRNAs between 18nt and 40nt are mainly microRNAs and piRNAs. In addition, 209 conserved miRNAs and 12 novel miRNAs are identified. Especially, a better screening target method, negative-correlation relationship of miRNAs and mRNA, is adopted to improve target prediction accuracy. Similar to miRNAs, a diverse population of piRNAs and changes in the two samples are also listed. The present study is the first to report on the important role of sRNAs during planarian Dugesia japonica regeneration. PMID:22425900

  15. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences

    PubMed Central

    2013-01-01

    Background Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Results Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. Conclusions mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly. PMID:24451012

  16. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  17. Polymorphism and haplotype structure in River Buffalo (Bubalus bubalis) toll-like receptor 5 (TLR5) coding sequence.

    PubMed

    Jones, B C; Womack, J E

    2012-04-01

    Most of the 160 million river buffalo in the world are in Asia where they are used extensively, both as a food source and for draught power. Only recently have investigations begun exploring the buffalo genome for variation that might influence health and productivity of these economically important animals. This paper describes the sequence variability of the toll-like receptor 5 (TLR5) gene, which recognizes bacterial flagellin and is a key player in the immune system. TLR5 is comprised of a single exon that is 2577 bp and codes 858 amino acids. We examined single-nucleotide polymorphisms (SNPs) located within the coding region. Overall, 17 SNPs were discovered, seven of which are non-synonymous. Our study population yielded four different haplotypes. We examined predicted protein domain structure and found that river buffalo, swamp buffalo, and African Forest buffalo shared the same protein domain structure and are more similar to each other than they are to cattle and American bison, which are similar to each other. PolyPhen 2 analysis revealed one amino acid substitution in the river buffalo population with potential functional significance. PMID:22537062

  18. A base-sequence-modulated Golay code improves the excitation and measurement of ultrasonic guided waves in long bones.

    PubMed

    Song, Xiaojun; Ta, Dean; Wang, Weiqi

    2012-11-01

    Researchers are interested in using ultrasonic guided waves (GWs) to assess long bones. However, GWs suffer high attenuation when they propagate in long bones, resulting in a low SNR. To overcome this limitation, this paper introduces a base-sequence-modulated Golay code (BSGC) to produce larger amplitude and improve the SNR in the ultrasound evaluation of long bones. A 16-bit Golay code was used for excitation in computer simulation. The decoded GWs and the traditional GWs, which were generated by a single pulse, agreed well after decoding the received signals, and the SNR was improved by 26.12 dB. In the experiments using bovine bones, the BSGC excitation produced the amplitudes which were at least 237 times greater than those produced by a single pulse excitation. The BSGC excitation also allowed the GWs to be received over a longer distance between two transducers. The results suggest the BSGC excitation has the potential to measure GWs and assess long bones. PMID:23192823

  19. Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression.

    PubMed Central

    Duret, L; Dorkeld, F; Gautier, C

    1993-01-01

    Comparison of nucleotide sequences from different classes of vertebrates that diverged more than 300 million years ago, revealed the existence of highly conserved regions (HCRs) with more than 70% similarity over 100 to 1450 nt in non-coding parts of genes. Such a conservation is unexpected because it is much longer and stronger than what is necessary for specifying the binding of a regulatory protein. HCRs are relatively frequent, particularly in genes that are essential to cell life. In multigene families, conserved regions are specific of each isotype and are probably involved in the control of their specific pattern of expression. Studying HCRs distribution within genes showed that functional constraints are generally much stronger in 3'-non-coding regions than in promoters or introns. The 3'-HCRs are particularly A + T-rich and are always located in the transcribed untranslated regions of genes, which suggests that they are involved in post-transcriptional processes. However, current knowledge of mechanisms that regulate mRNA export, localisation, translation, or degradation is not sufficient to explain the strong functional constraints that we have characterised. PMID:8506129

  20. The non-coding RNA composition of the mitotic chromosome by 5'-tag sequencing.

    PubMed

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M; Shao, Zhifeng

    2016-06-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5'-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  1. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins

    PubMed Central

    Miura, Sayaka; Tate, Stephanie; Kumar, Sudhir

    2015-01-01

    Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome. PMID:26604664

  2. The non-coding RNA composition of the mitotic chromosome by 5′-tag sequencing

    PubMed Central

    Meng, Yicong; Yi, Xianfu; Li, Xinhui; Hu, Chuansheng; Wang, Ju; Bai, Ling; Czajkowsky, Daniel M.; Shao, Zhifeng

    2016-01-01

    Mitotic chromosomes are one of the most commonly recognized sub-cellular structures in eukaryotic cells. Yet basic information necessary to understand their structure and assembly, such as their composition, is still lacking. Recent proteomic studies have begun to fill this void, identifying hundreds of RNA-binding proteins bound to mitotic chromosomes. However, by contrast, there are only two RNA species (U3 snRNA and rRNA) that are known to be associated with the mitotic chromosome, suggesting that there are many mitotic chromosome-associated RNAs (mCARs) not yet identified. Here, using a targeted protocol based on 5′-tag sequencing to profile the mammalian mCAR population, we report the identification of 1279 mCARs, the majority of which are ncRNAs, including lncRNAs that exhibit greater conservation across 60 vertebrate species than the entire population of lncRNAs. There is also a significant enrichment of snoRNAs and specific SINE RNAs. Finally, ∼40% of the mCARs are presently unannotated, many of which are as abundant as the annotated mCARs, suggesting that there are also many novel ncRNAs in the mCARs. Overall, the mCARs identified here, together with the previous proteomic and genomic data, constitute the first comprehensive catalogue of the molecular composition of the eukaryotic mitotic chromosomes. PMID:27016738

  3. Bipartite geminivirus host adaptation determined cooperatively by coding and noncoding sequences of the genome.

    PubMed

    Petty, I T; Carter, S C; Morra, M R; Jeffrey, J L; Olivey, H E

    2000-11-25

    Bipartite geminiviruses are small, plant-infecting viruses with genomes composed of circular, single-stranded DNA molecules, designated A and B. Although they are closely related genetically, individual bipartite geminiviruses frequently exhibit host-specific adaptation. Two such viruses are bean golden mosaic virus (BGMV) and tomato golden mosaic virus (TGMV), which are well adapted to common bean (Phaseolus vulgaris) and Nicotiana benthamiana, respectively. In previous studies, partial host adaptation was conferred on BGMV-based or TGMV-based hybrid viruses by separately exchanging open reading frames (ORFs) on DNA A or DNA B. Here we analyzed hybrid viruses in which all of the ORFs on both DNAs were exchanged except for AL1, which encodes a protein with strictly virus-specific activity. These hybrid viruses exhibited partial transfer of host-adapted phenotypes. In contrast, exchange of noncoding regions (NCRs) upstream from the AR1 and BR1 ORFs did not confer any host-specific gain of function on hybrid viruses. However, when the exchangeable ORFs and NCRs from TGMV were combined in a single BGMV-based hybrid virus, complete transfer of TGMV-like adaptation to N. benthamiana was achieved. Interestingly, the reciprocal TGMV-based hybrid virus displayed only partial gain of function in bean. This may be, in part, the result of defective virus-specific interactions between TGMV and BGMV sequences present in the hybrid, although a potential role in adaptation to bean for additional regions of the BGMV genome cannot be ruled out. PMID:11080490

  4. Nonsense mutation in the glycoprotein Ib. alpha. coding sequence associated with Bernard-Soulier syndrome

    SciTech Connect

    Ware, J.; Russell, S.R.; Vicente, V.; Scharf, R.E.; Tomer, A.; McMillian, R.; Ruggeri, Z.M. )

    1990-03-01

    Three distinct gene products, the {alpha} and {beta} chains of glycoprotein (GP) Ib and GP IX, constitute the platelet membrane GP Ib-IX complex, a receptor for von Willebrand factor and thrombin involved in platelet adhesion and aggregation. Defective function of the GP Ib-IX complex is the hallmark of a rare congenital bleeding disorder of still undefined pathogenesis, the Bernard-Soulier syndrome. The authors have analyzed the molecular basis of the disease in one patient in whom immunoblotting of solubilized platelets demonstrated absence of normal GP Ib{alpha} but presence of a smaller immunoreactive species. The truncated polypeptide was also present, along with normal protein, in platelets from the patient's mother and two of his four children. Genetic characterization identified a nucleotide transition changing the Trp-343 codon (TGG) to a nonsense codon (TGA). Such a mutation explains the origin of the smaller GP Ib{alpha}, which by lacking half of the sequence on the carboxyl-terminal side, including the transmembrane domain, cannot be properly inserted in the platelet membrane. Both normal and mutant codons were found in the patient, suggesting that he is a compound heterozygote with a still unidentified defect in the other GP Ib{alpha} allele. Nonsense mutation and truncated GP Ib{alpha} polypeptide were found to cosegregate in four individuals through three generations and were associated with either Bernard-Soulier syndrome or carrier state phenotype. The molecular abnormality demonstrated in this family provides evidence that defective synthesis of GP Ib{alpha} alters the membrane expression of the GP Ib-IX complex and may be responsible for Bernard-Soulier syndrome.

  5. Rate-dependent incompleteness of earthquake catalogs

    NASA Astrophysics Data System (ADS)

    Hainzl, Sebastian

    2016-04-01

    Important information about the earthquake generation process can be gained from instrumental earthquake catalogs, but this requires complete recordings to avoid biased results. The local completeness magnitude Mc is known to depend on general conditions such as the seismographic network and the environmental noise, which generally limit the possibility to detect small events. The detectability can be additionally reduced by an earthquake-induced increase of the noise-level leading to short-term variations of Mc, which cannot be resolved by traditional methods relying on the analysis of the frequency-magnitude distribution. Based on simple assumptions, I propose a new method to estimate such temporal excursions of Mc solely based on the estimation of the earthquake rate resulting in a high temporal resolution of Mc. The approach is shown to be in agreement with the apparent decrease of the estimated Gutenberg-Richter b-value in high-activity phases of recorded data sets and the observed incompleteness periods after mainshocks. Furthermore, an algorithm to estimate temporal changes of Mc is introduced and applied to empirical aftershock and swarm sequences from California and central Europe, indicating that observed b-value fluctuations are often related to rate-dependent incompleteness of the earthquake catalogs.

  6. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available. PMID:24896825

  7. Incomplete Kochen-Specker coloring

    SciTech Connect

    Granstroem, Helena

    2007-09-15

    A particular incomplete Kochen-Specker coloring, suggested by Appleby [Stud. Hist. Philos. Mod. Phys. 36, 1 (2005)] in dimension three, is generalized to arbitrary dimension. We investigate its effectivity as a function of dimension, using two different measures. A limit is derived for the fraction of the sphere that can be colored using the generalized Appleby construction as the number of dimensions approaches infinity. The second, and physically more relevant measure of effectivity, is to look at the fraction of properly colored ON bases. Using this measure, we derive a ''lower bound for the upper bound'' in three and four real dimensions.

  8. Observation of incomplete fusion reactions at l < l {sub crit}

    SciTech Connect

    Yadav, Abhishek Sharma, Vijay R. Singh, Devendra P. Unnati,; Singh, B. P.; Prasad, R.; Singh, Pushpendra P.; Bala, Indu; Kumar, R.; Muralithar, S.; Singh, R. P.; Sharma, M. K.

    2014-08-14

    In order to understand the presence of incomplete fusion at low energies i.e. 4-7MeV/nucleon and also to study its dependence on various entrance-channel parameters, the two type of measurements (i) excitation function for {sup 12}C+{sup 159}Tb, and (ii) forward recoil ranges for {sup 12}C+{sup 159}Tb systems have been performed. The experimentally measured excitation functions have been analyzed within the framework of compound nucleus decay using statistical model code PACE4. Analysis of data suggests the production of xn/px)n-channels via complete fusion, as these are found to be well reproduced by PACE4 predictions, while, a significant enhancement in the excitation functions of α-emitting channels has been observed over the theoretical ones, which has been attributed due to the incomplete fusion processes. Further, the incomplete fusion events observed in case of forward recoil range measurements have been explained on the basis of the breakup fusion model, where these events may be attributed to the fusion of {sup 8}Be and/or {sup 4}He from {sup 12}C projectile to the target nucleus. In the present work, the SUMRULE model calculations are found to highly underestimate the observed incomplete fusion cross-sections which indicate that the l-values lower than l {sub crit} (limit of complete fusion) significantly contribute to the incomplete fusion reactions.

  9. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate.

    PubMed

    Mangold, Elisabeth; Böhmer, Anne C; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E; Nöthen, Markus M; Borck, Guntram; Aldhorae, Khalid A; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U

    2016-04-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10(-2)). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10(-5); ORallelic = 2.46 [95% CI 1.6-3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10(-9)). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  10. Evolutionary and sequence-based relationships in bacterial AdoMet-dependent non-coding RNA methyltransferases

    PubMed Central

    2014-01-01

    Background RNA post-transcriptional modification is an exciting field of research that has evidenced this editing process as a sophisticated epigenetic mechanism to fine tune the ribosome function and to control gene expression. Although tRNA modifications seem to be more relevant for the ribosome function and cell physiology as a whole, some rRNA modifications have also been seen to play pivotal roles, essentially those located in central ribosome regions. RNA methylation at nucleobases and ribose moieties of nucleotides appear to frequently modulate its chemistry and structure. RNA methyltransferases comprise a superfamily of highly specialized enzymes that accomplish a wide variety of modifications. These enzymes exhibit a poor degree of sequence similarity in spite of using a common reaction cofactor and modifying the same substrate type. Results Relationships and lineages of RNA methyltransferases have been extensively discussed, but no consensus has been reached. To shed light on this topic, we performed amino acid and codon-based sequence analyses to determine phylogenetic relationships and molecular evolution. We found that most Class I RNA MTases are evolutionarily related to protein and cofactor/vitamin biosynthesis methyltransferases. Additionally, we found that at least nine lineages explain the diversity of RNA MTases. We evidenced that RNA methyltransferases have high content of polar and positively charged amino acid, which coincides with the electrochemistry of their substrates. Conclusions After studying almost 12,000 bacterial genomes and 2,000 patho-pangenomes, we revealed that molecular evolution of Class I methyltransferases matches the different rates of synonymous and non-synonymous substitutions along the coding region. Consequently, evolution on Class I methyltransferases selects against amino acid changes affecting the structure conformation. PMID:25012753

  11. Semantic Borders and Incomplete Understanding.

    PubMed

    Silva-Filho, Waldomiro J; Dazzani, Maria Virgínia

    2016-03-01

    In this article, we explore a fundamental issue of Cultural Psychology, that is our "capacity to make meaning", by investigating a thesis from contemporary philosophical semantics, namely, that there is a decisive relationship between language and rationality. Many philosophers think that for a person to be described as a rational agent he must understand the semantic content and meaning of the words he uses to express his intentional mental states, e.g., his beliefs and thoughts. Our argument seeks to investigate the thesis developed by Tyler Burge, according to which our mastery or understanding of the semantic content of the terms which form our beliefs and thoughts is an "incomplete understanding". To do this, we discuss, on the one hand, the general lines of anti-individualism or semantic externalism and, on the other, criticisms of the Burgean notion of incomplete understanding - one radical and the other moderate. We defend our understanding that the content of our beliefs must be described in the light of the limits and natural contingencies of our cognitive capacities and the normative nature of our rationality. At heart, anti-individualism leads us to think about the fact that we are social creatures, living in contingent situations, with important, but limited, cognitive capacities, and that we receive the main, and most important, portion of our knowledge simply from what others tell us. Finally, we conclude that this discussion may contribute to the current debate about the notion of borders. PMID:26111737

  12. Full-length coding sequence for 12 bovine viral diarrhea virus isolates from persistently infected cattle in a feedyard in Kansas

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We report here the full-length coding sequence of 12 bovine viral diarrhea virus (BVDV) isolates from persistently infected cattle from a feedyard in southwest Kansas, USA. These 12 genomes represent the three major genotypes (BVDV 1a, 1b, and 2a) of BVDV currently circulating in the United States....

  13. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  14. DNA polymorphism in morels: complete sequences of the internal transcribed spacer of genes coding for rRNA in Morchella esculenta (yellow morel) and Morchella conica (black morel).

    PubMed Central

    Wipf, D; Munch, J C; Botton, B; Buscot, F

    1996-01-01

    The internal transcribed spacer (ITS) of the gene coding for rRNA was sequenced in both directions with the gene walking technique in a black morel (Morchella conica) and a yellow morel (M. esculenta) to elucidate the ITS length discrepancy between the two species groups (750-bp ITS in black morels and 1,150-bp ITS in yellow morels. PMID:8795250

  15. Biosynthesis of riboflavin: cloning, sequencing, and expression of the gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase of Escherichia coli.

    PubMed Central

    Richter, G; Volk, R; Krieger, C; Lahm, H W; Röthlisberger, U; Bacher, A

    1992-01-01

    3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized from ribulose 5-phosphate and serves as the biosynthetic precursor for the xylene ring of riboflavin. The gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase of Escherichia coli has been cloned and sequenced. The gene codes for a protein of 217 amino acid residues with a calculated molecular mass of 23,349.6 Da. The enzyme was purified to near homogeneity from a recombinant E. coli strain and had a specific activity of 1,700 nmol mg-1 h-1. The N-terminal amino acid sequence and the amino acid composition of the protein were in agreement with the deduced sequence. The molecular mass as determined by ion spray mass spectrometry was 23,351 +/- 2 Da, which is in agreement with the predicted mass. The previously reported loci htrP, "luxH-like," and ribB at 66 min of the E. coli chromosome are all identical to the gene coding for 3,4-dihydroxy-2-butanone 4-phosphate synthase, but their role had not been hitherto determined. Sequence homology indicates that gene luxH of Vibrio harveyi and the central open reading frame of the Bacillus subtilis riboflavin operon code for 3,4-dihydroxy-2-butanone 4-phosphate synthase. Images PMID:1597419

  16. Adenovirus E1A coding sequences that enable ras and pmt oncogenes to transform cultured primary cells.

    PubMed Central

    Zerler, B; Moran, B; Maruyama, K; Moomaw, J; Grodzicker, T; Ruley, H E

    1986-01-01

    Plasmids expressing partial adenovirus early region 1A (E1A) coding sequences were tested for activities which facilitate in vitro establishment (immortalization) of primary baby rat kidney cells and which enable the T24 Harvey ras-related oncogene and the polyomavirus middle T antigen (pmt) gene to transform primary baby rat kidney cells. E1A cDNAs expressing the 289- and 243-amino acid proteins expressed both E1A transforming functions. Mutant hrA, which encodes a 140-amino acid protein derived from the amino-terminal domain shared by the 289- and 243-amino acid proteins, enabled ras (but not pmt) to transform and facilitated in vitro establishment to a low, but detectable, extent. These studies suggest that E1A functions which collaborate with ras oncogenes and those which facilitate establishment are linked. Furthermore, E1A transforming functions are not associated with activities of the 289-amino acid E1A proteins required for efficient transcriptional activation of viral early region promoters. Images PMID:3022137

  17. Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L-strand coding genes

    PubMed Central

    2013-01-01

    Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome. PMID:23962312

  18. Population Genomic Analysis of 962 Whole Genome Sequences of Humans Reveals Natural Selection in Non-Coding Regions

    PubMed Central

    Gazave, Elodie; Chang, Diana; Raj, Srilakshmi; Hunter-Zinck, Haley; Blekhman, Ran; Arbiza, Leonardo; Van Hout, Cris; Morrison, Alanna; Johnson, Andrew D.; Bis, Joshua; Cupples, L. Adrienne; Psaty, Bruce M.; Muzny, Donna; Yu, Jin; Gibbs, Richard A.; Keinan, Alon; Clark, Andrew G.; Boerwinkle, Eric

    2015-01-01

    Whole genome analysis in large samples from a single population is needed to provide adequate power to assess relative strengths of natural selection across different functional components of the genome. In this study, we analyzed next-generation sequencing data from 962 European Americans, and found that as expected approximately 60% of the top 1% of positive selection signals lie in intergenic regions, 33% in intronic regions, and slightly over 1% in coding regions. Several detailed functional annotation categories in intergenic regions showed statistically significant enrichment in positively selected loci when compared to the null distribution of the genomic span of ENCODE categories. There was a significant enrichment of purifying selection signals detected in enhancers, transcription factor binding sites, microRNAs and target sites, but not on lincRNA or piRNAs, suggesting different evolutionary constraints for these domains. Loci in “repressed or low activity regions” and loci near or overlapping the transcription start site were the most significantly over-represented annotations among the top 1% of signals for positive selection. PMID:25807536

  19. Transactivation specificity is conserved among p53 family proteins and depends on a response element sequence code

    PubMed Central

    Ciribilli, Yari; Monti, Paola; Bisio, Alessandra; Nguyen, H. Thien; Ethayathulla, Abdul S.; Ramos, Ana; Foggetti, Giorgia; Menichini, Paola; Menendez, Daniel; Resnick, Michael A.; Viadiu, Hector; Fronza, Gilberto; Inga, Alberto

    2013-01-01

    Structural and biochemical studies have demonstrated that p73, p63 and p53 recognize DNA with identical amino acids and similar binding affinity. Here, measuring transactivation activity for a large number of response elements (REs) in yeast and human cell lines, we show that p53 family proteins also have overlapping transactivation profiles. We identified mutations at conserved amino acids of loops L1 and L3 in the DNA-binding domain that tune the transactivation potential nearly equally in p73, p63 and p53. For example, the mutant S139F in p73 has higher transactivation potential towards selected REs, enhanced DNA-binding cooperativity in vitro and a flexible loop L1 as seen in the crystal structure of the protein–DNA complex. By studying, how variations in the RE sequence affect transactivation specificity, we discovered a RE-transactivation code that predicts enhanced transactivation; this correlation is stronger for promoters of genes associated with apoptosis. PMID:23892287

  20. Sequence Evaluation of FGF and FGFR Gene Conserved Non-Coding Elements in Non-Syndromic Cleft Lip and Palate Cases

    PubMed Central

    Riley, Bridget M.; Murray, Jeffrey C.

    2009-01-01

    Non-syndromic cleft lip and palate (NS CLP) is a complex birth defect resulting from multiple genetic and environmental factors. We have previously reported the sequencing of the coding region of genes in the fibroblast growth factor (FGF) signaling pathway, in which missense and non-sense mutations contribute to approximately 5%–6% NS CLP cases. In this article we report the sequencing of conserved non-coding elements (CNEs) in and around 11 of the FGF and FGFR genes, which identified 55 novel variants. Seven of variants are highly conserved among ≥8 species and 31 variants alter transcription factor binding sites, 8 of which are important for craniofacial development. Additionally, 15 NS CLP patients had a combination of coding mutations and CNE variants, suggesting that an accumulation of variants in the FGF signaling pathway may contribute to clefting. PMID:17963255

  1. Common sequence motifs coding for higher-plant and prokaryotic O-acetylserine (thiol)-lyases: bacterial origin of a chloroplast transit peptide?

    PubMed

    Rolland, N; Job, D; Douce, R

    1993-08-01

    A comparison of the amino acid sequence of O-acetylserine (thiol)-lyase (EC 4.2.99.8) from Escherichia coli and the isoforms of this enzyme found in the cytosolic and chloroplastic compartments of spinach (Spinacia oleracea) leaf cells allows the essential lysine residue involved in the binding of the pyridoxal 5'-phosphate cofactor to be identified. The results of further sequence comparison of cDNAs coding for these proteins are discussed in the frame of the endosymbiotic theory of chloroplast evolution. The results are compatible with a mechanism in which the chloroplast enzyme originated from the cytosolic enzyme and both plant genes originated from a common prokaryotic ancestor. The comparison also suggests that the 5'-non-coding sequence of the bacterial gene was transferred to the plant cell nucleus and that it has been used to create the N-terminal portions of both plant enzymes, and possibly the transit peptide of the chloroplast enzyme. PMID:7916619

  2. Hfq assists small RNAs in binding to the coding sequence of ompD mRNA and in rearranging its structure

    PubMed Central

    Wroblewska, Zuzanna; Olejniczak, Mikolaj

    2016-01-01

    The bacterial protein Hfq participates in the regulation of translation by small noncoding RNAs (sRNAs). Several mechanisms have been proposed to explain the role of Hfq in the regulation by sRNAs binding to the 5′-untranslated mRNA regions. However, it remains unknown how Hfq affects those sRNAs that target the coding sequence. Here, the contribution of Hfq to the annealing of three sRNAs, RybB, SdsR, and MicC, to the coding sequence of Salmonella ompD mRNA was investigated. Hfq bound to ompD mRNA with tight, subnanomolar affinity. Moreover, Hfq strongly accelerated the rates of annealing of RybB and MicC sRNAs to this mRNA, and it also had a small effect on the annealing of SdsR. The experiments using truncated RNAs revealed that the contributions of Hfq to the annealing of each sRNA were individually adjusted depending on the structures of interacting RNAs. In agreement with that, the mRNA structure probing revealed different structural contexts of each sRNA binding site. Additionally, the annealing of RybB and MicC sRNAs induced specific conformational changes in ompD mRNA consistent with local unfolding of mRNA secondary structure. Finally, the mutation analysis showed that the long AU-rich sequence in the 5′-untranslated mRNA region served as an Hfq binding site essential for the annealing of sRNAs to the coding sequence. Overall, the data showed that the functional specificity of Hfq in the annealing of each sRNA to the ompD mRNA coding sequence was determined by the sequence and structure of the interacting RNAs. PMID:27154968

  3. A computer program for estimation from incomplete multinomial data

    NASA Technical Reports Server (NTRS)

    Credeur, K. R.

    1978-01-01

    Coding is given for maximum likelihood and Bayesian estimation of the vector p of multinomial cell probabilities from incomplete data. Also included is coding to calculate and approximate elements of the posterior mean and covariance matrices. The program is written in FORTRAN 4 language for the Control Data CYBER 170 series digital computer system with network operating system (NOS) 1.1. The program requires approximately 44000 octal locations of core storage. A typical case requires from 72 seconds to 92 seconds on CYBER 175 depending on the value of the prior parameter.

  4. Gene control in eukaryotes and the c-value paradox "excess" DNA as an impediment to transcription of coding sequences.

    PubMed

    Zuckerkandl, E

    1976-12-31

    Ways in which control of gene activity may lead to the observed high DNA content per haploid eukaryote genome are examined. It is proposed that deoxyribonucleoprotein (DNP) acts as a barrier to transcription at two distinct structural levels. At the lower level, melting of the nucleosome supercoil (quaternary structure) and of the nucleosomes (tertiary structure) might be brought about by the process of transcription itself. After unwinding the barrier section, the polymerase would eventually reach the structural gene. The transcripts of noncoding sequences, at least as far as their "unique" sequence components are concerned, may thus have filled their main function through the very process of transcription. The possibility of an inverse relationship between the length of the DNP barrier and the rates of transcription of the coding sequences is to some extent supported by available data. Different modes of coordination between the transcription of mRNA and of hnRNA from a single functional unit of gene action (funga) are considered. An analysis of gene control at high structural levels of DNP is made on the basis of other data, in relation to the concepts of eurygenic and stenogenic control. The concept of a euryon is introduced, namely of a set of linked fugas under common eurygenic control. Structure of order higher than quaternary can be inferred to exist in larger chromomeres of polytene chromosomes and in corresponding sections of ordinary chromosomes. Only moderate amounts of highest order interphase euchromatic structure are likely to be able to be accomodated in average chromomeres and none in very thin chromomeres. Puffs are interpreted as the melting of highest order interphase structure, and the absence of puffs during transcription as the absence of this highest order structure in the resting state of the chromomeres. Genes that are constantly active in all tissues may dispense with highest order interphase structure and with the corresponding control

  5. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. PMID:25293589

  6. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. PMID:26844917

  7. Incompletely compacted equilibrated ordinary chondrites

    SciTech Connect

    Sasso, M.R.; Macke, R.J.; Boesenberg, J.S.; Britt, D.T.; Rovers, M.L.; Ebel, D.S.; Friedrich, J.M.

    2010-01-22

    We document the size distributions and locations of voids present within five highly porous equilibrated ordinary chondrites using high-resolution synchrotron X-ray microtomography ({mu}CT) and helium pycnometry. We found total porosities ranging from {approx}10 to 20% within these chondrites, and with {mu}CT we show that up to 64% of the void space is located within intergranular voids within the rock. Given the low (S1-S2) shock stages of the samples and the large voids between mineral grains, we conclude that these samples experienced unusually low amounts of compaction and shock loading throughout their entire post accretionary history. With Fe metal and FeS metal abundances and grain size distributions, we show that these chondrites formed naturally with greater than average porosities prior to parent body metamorphism. These materials were not 'fluffed' on their parent body by impact-related regolith gardening or events caused by seismic vibrations. Samples of all three chemical types of ordinary chondrites (LL, L, H) are represented in this study and we conclude that incomplete compaction is common within the asteroid belt.

  8. Nucleotide sequence of the nifH gene coding for nitrogen reductase in the acetic acid bacterium Acetobacter diazotrophicus.

    PubMed

    Franke, I H; Fegan, M; Hayward, A C; Sly, L I

    1998-01-01

    The nifH gene sequence of the nitrogen-fixing bacterium Acetobacter diazotrophicus was determined with the use of the polymerase chain reaction and universal degenerate oligonucleotide primers. The gene shows highest pair-wise similarity to the nifH gene of Azospirillum brasilense. The phylogenetic relationships of the nifH gene sequences were compared with those inferred from 16S rRNA gene sequences. Knowledge of the sequence of the nifH gene contributes to the growing database of nifH gene sequences, and will allow the detection of Acet. diazotrophicus from environmental samples with nifH gene-based primers. PMID:9489028

  9. Cloning and sequence analysis of the coding sequence of β-actin cDNA from the Chinese alligator and suitable internal reference primers from the β-actin gene.

    PubMed

    Zhu, H N; Zhang, S Z; Zhou, Y K; Wang, C L; Wu, X B

    2015-01-01

    β-Actin is an essential component of the cytoskeleton and is stably expressed in various tissues of animals, thus, it is commonly used as an internal reference for gene expression studies. In this study, a 1731-bp fragment of β-actin cDNA from Alligator sinensis was obtained using the homology cloning technique. Sequence analysis showed that this fragment contained the complete coding sequence of the β-actin gene (1128 bp), encoding 375 amino acids. The amino acid sequence of β-actin is highly conserved and its nucleotide sequence is slightly variable. Multiple alignment analyses showed that the nucleotide sequence of the β-actin gene from A. sinensis is very similar to sequences from birds, with 94-95% identity. Ten pairs of primers with different product sizes and different annealing temperatures were screened by PCR amplification, agarose gel electrophoresis, and DNA sequencing, and could be used as internal reference primers in gene expression studies. This study expands our knowledge of β-actin gene phylogenetic evolution and provides a basis for quantitative gene expression studies in A. sinensis. PMID:26505364

  10. Variations in the coding and regulatory sequences of the angiogenin (ANG) gene are not associated to ALS (amyotrophic lateral sclerosis) in the Italian population.

    PubMed

    Corrado, Lucia; Battistini, Stefania; Penco, Silvana; Bergamaschi, Laura; Testa, Lucia; Ricci, Claudia; Giannini, Fabio; Greco, Giuseppe; Patrosso, Maria Cristina; Pileggi, Simona; Causarano, Renzo; Mazzini, Letizia; Momigliano-Richiardi, Patricia; D'Alfonso, Sandra

    2007-07-15

    Potentially causative missense variations in the ANG gene and a positive association with the synonymous rs11701-G substitution was detected mainly in Irish and Scottish ALS patients. We screened 262 Italian SOD1 negative ALS patients (250 sporadic) and 415 matched controls for sequence variations in the coding, 3'/5' UTR and 5' flanking (642 bp) regions of the ANG gene. We identified 53 sequence variations of which 46 new, 20 with a minor allele frequency (MAF) >or=0.01 and only three localised in the coding sequence, namely the missense I46V, identified in one patient and two controls, and the synonymous G86G and T97T corresponding to rs11701 and rs2228653. None of the detected SNPs or of their haplotypic combinations was significantly associated with ALS susceptibility or clinical features. In conclusion, we did not detect the association with rs11701-G or with any other newly detected variation in the ANG regulatory region. Furthermore we did not identify potentially causal mutations in the coding region. PMID:17462671