Science.gov

Sample records for coding sequence incompleteness

  1. Cellulases and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  2. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  3. Lichenase and coding sequences

    SciTech Connect

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  4. Visual pattern image sequence coding

    NASA Technical Reports Server (NTRS)

    Silsbee, Peter; Bovik, Alan C.; Chen, Dapang

    1990-01-01

    The visual pattern image coding (VPIC) configurable digital image-coding process is capable of coding with visual fidelity comparable to the best available techniques, at compressions which (at 30-40:1) exceed all other technologies. These capabilities are associated with unprecedented coding efficiencies; coding and decoding operations are entirely linear with respect to image size and entail a complexity that is 1-2 orders of magnitude faster than any previous high-compression technique. The visual pattern image sequence coding to which attention is presently given exploits all the advantages of the static VPIC in the reduction of information from an additional, temporal dimension, to achieve unprecedented image sequence coding performance.

  5. Physics and numerics of the tensor code (incomplete preliminary documentation)

    SciTech Connect

    Burton, D.E.; Lettis, L.A. Jr.; Bryan, J.B.; Frary, N.R.

    1982-07-15

    The present TENSOR code is a descendant of a code originally conceived by Maenchen and Sack and later adapted by Cherry. Originally, the code was a two-dimensional Lagrangian explicit finite difference code which solved the equations of continuum mechanics. Since then, implicit and arbitrary Lagrange-Euler (ALE) algorithms have been added. The code has been used principally to solve problems involving the propagation of stress waves through earth materials, and considerable development of rock and soil constitutive relations has been done. The code has been applied extensively to the containment of underground nuclear tests, nuclear and high explosive surface and subsurface cratering, and energy and resource recovery. TENSOR is supported by a substantial array of ancillary routines. The initial conditions are set up by a generator code TENGEN. ZON is a multipurpose code which can be used for zoning, rezoning, overlaying, and linking from other codes. Linking from some codes is facilitated by another code RADTEN. TENPLT is a fixed time graphics code which provides a wide variety of plotting options and output devices, and which is capable of producing computer movies by postprocessing problem dumps. Time history graphics are provided by the TIMPLT code from temporal dumps produced during production runs. While TENSOR can be run as a stand-alone controllee, a special controller code TCON is available to better interface the code with the LLNL computer system during production jobs. In order to standardize compilation procedures and provide quality control, a special compiler code BC is used. A number of equation of state generators are available among them ROC and PMUGEN.

  6. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  7. HIFI: a computer code for projectile fragmentation accompanied by incomplete fusion

    SciTech Connect

    Wu, J.R.

    1980-07-01

    A brief summary of a model proposed to describe projectile fragmentation accompanied by incomplete fusion and the instructions for the use of the computer code HIFI are given. The code HIFI calculates single inclusive spectra, coincident spectra and excitation functions resulting from particle-induced reactions. It is a multipurpose program which can calculate any type of coincident spectra as long as the reaction is assumed to take place in two steps.

  8. SOME CODES WHICH ARE INVARIENT UNDER A DOUBLY-TRANSITIVE PERMUTATION GROUP AND THEIR CONNECTION WITH BALANCED INCOMPLETE BLOCK DESIGNS

    DTIC Science & Technology

    If a binary code is invariant under a doubly-transitive permutation group, then the set of all code words of weight j forms a balanced incomplete...codes are properly arranged, and if the first digit is omitted, then all Reed-Muller codes are cyclic.

  9. Fingerprinting Codes for Internet-Based Live Pay-TV System Using Balanced Incomplete Block Designs

    NASA Astrophysics Data System (ADS)

    Hou, Shuhui; Uehara, Tetsutaro; Satoh, Takashi; Morimura, Yoshitaka; Minoh, Michihiko

    In recent years, with the rapid growth of the Internet as well as the increasing demand for broadband services, live pay-television broadcasting via the Internet has become a promising business. To get this implemented, it is necessary to protect distributed contents from illegal copying and redistributing after they are accessed. Fingerprinting system is a useful tool for it. This paper shows that the anti-collusion code has advantages over other existing fingerprinting codes in terms of efficiency and effectivity for live pay-television broadcasting. Next, this paper presents how to achieve efficient and effective anti-collusion codes based on unital and affine plane, which are two known examples of balanced incomplete block design (BIBD). Meanwhile, performance evaluations of anti-collusion codes generated from unital and affine plane are conducted. Their practical explicit constructions are given last.

  10. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal

  11. Hierarchical morphological segmentation for image sequence coding.

    PubMed

    Salembier, P; Pardas, M

    1994-01-01

    This paper deals with a hierarchical morphological segmentation algorithm for image sequence coding. Mathematical morphology is very attractive for this purpose because it efficiently deals with geometrical features such as size, shape, contrast, or connectivity that can be considered as segmentation-oriented features. The algorithm follows a top-down procedure. It first takes into account the global information and produces a coarse segmentation, that is, with a small number of regions. Then, the segmentation quality is improved by introducing regions corresponding to more local information. The algorithm, considering sequences as being functions on a 3-D space, directly segments 3-D regions. A 3-D approach is used to get a segmentation that is stable in time and to directly solve the region correspondence problem. Each segmentation stage relies on four basic steps: simplification, marker extraction, decision, and quality estimation. The simplification removes information from the sequence to make it easier to segment. Morphological filters based on partial reconstruction are proven to be very efficient for this purpose, especially in the case of sequences. The marker extraction identifies the presence of homogeneous 3-D regions. It is based on constrained flat region labeling and morphological contrast extraction. The goal of the decision is to precisely locate the contours of regions detected by the marker extraction. This decision is performed by a modified watershed algorithm. Finally, the quality estimation concentrates on the coding residue, all the information about the 3-D regions that have not been properly segmented and therefore coded. The procedure allows the introduction of the texture and contour coding schemes within the segmentation algorithm. The coding residue is transmitted to the next segmentation stage to improve the segmentation and coding quality. Finally, segmentation and coding examples are presented to show the validity and interest of

  12. Short sequence motifs, overrepresented in mammalian conservednon-coding sequences

    SciTech Connect

    Minovitsky, Simon; Stegmaier, Philip; Kel, Alexander; Kondrashov,Alexey S.; Dubchak, Inna

    2007-02-21

    Background: A substantial fraction of non-coding DNAsequences of multicellular eukaryotes is under selective constraint. Inparticular, ~;5 percent of the human genome consists of conservednon-coding sequences (CNSs). CNSs differ from other genomic sequences intheir nucleotide composition and must play important functional roles,which mostly remain obscure.Results: We investigated relative abundancesof short sequence motifs in all human CNSs present in the human/mousewhole-genome alignments vs. three background sets of sequences: (i)weakly conserved or unconserved non-coding sequences (non-CNSs); (ii)near-promoter sequences (located between nucleotides -500 and -1500,relative to a start of transcription); and (iii) random sequences withthe same nucleotide composition as that of CNSs. When compared tonon-CNSs and near-promoter sequences, CNSs possess an excess of AT-richmotifs, often containing runs of identical nucleotides. In contrast, whencompared to random sequences, CNSs contain an excess of GC-rich motifswhich, however, lack CpG dinucleotides. Thus, abundance of short sequencemotifs in human CNSs, taken as a whole, is mostly determined by theiroverall compositional properties and not by overrepresentation of anyspecific short motifs. These properties are: (i) high AT-content of CNSs,(ii) a tendency, probably due to context-dependent mutation, of A's andT's to clump, (iii) presence of short GC-rich regions, and (iv) avoidanceof CpG contexts, due to their hypermutability. Only a small number ofshort motifs, overrepresented in all human CNSs are similar to bindingsites of transcription factors from the FOX family.Conclusion: Human CNSsas a whole appear to be too broad a class of sequences to possess strongfootprints of any short sequence-specific functions. Such footprintsshould be studied at the level of functional subclasses of CNSs, such asthose which flank genes with a particular pattern of expression. Overallproperties of CNSs are affected by patterns in

  13. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  14. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409(T) with an incomplete denitrification pathway.

    PubMed

    Zhou, En-Min; Murugapiran, Senthil K; Mefferd, Chrisabelle C; Liu, Lan; Xian, Wen-Dong; Yin, Yi-Rui; Ming, Hong; Yu, Tian-Tian; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T B K; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Spunde, Alexander; Kyrpides, Nikos; Woyke, Tanja; Li, Wen-Jun; Hedlund, Brian P

    2016-01-01

    Thermus amyloliquefaciens type strain YIM 77409(T) is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409(T) together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transporters and enzymes for heterotrophy highlight the broad heterotrophic lifestyle of this organism. A denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain.

  15. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    DTIC Science & Technology

    2008-07-01

    COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES ... sequences which are generalizations of the Fibonacci sequences . 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16...coding bound on the rate of DNA codes is proved. To obtain the bound, we use some ensembles of DNA sequences which are generalizations of the Fibonacci

  16. Coding sequence density estimation via topological pressure.

    PubMed

    Koslicki, David; Thompson, Daniel J

    2015-01-01

    We give a new approach to coding sequence (CDS) density estimation in genomic analysis based on the topological pressure, which we develop from a well known concept in ergodic theory. Topological pressure measures the 'weighted information content' of a finite word, and incorporates 64 parameters which can be interpreted as a choice of weight for each nucleotide triplet. We train the parameters so that the topological pressure fits the observed coding sequence density on the human genome, and use this to give ab initio predictions of CDS density over windows of size around 66,000 bp on the genomes of Mus Musculus, Rhesus Macaque and Drososphilia Melanogaster. While the differences between these genomes are too great to expect that training on the human genome could predict, for example, the exact locations of genes, we demonstrate that our method gives reasonable estimates for the 'coarse scale' problem of predicting CDS density. Inspired again by ergodic theory, the weightings of the nucleotide triplets obtained from our training procedure are used to define a probability distribution on finite sequences, which can be used to distinguish between intron and exon sequences from the human genome of lengths between 750 and 5,000 bp. At the end of the paper, we explain the theoretical underpinning for our approach, which is the theory of Thermodynamic Formalism from the dynamical systems literature. Mathematica and MATLAB implementations of our method are available at http://sourceforge.net/projects/topologicalpres/ .

  17. Program generator for the Incomplete Cholesky Conjugate Gradient (ICCG) method with a symmetrizing preprocessor. [GENIC code package

    SciTech Connect

    Kuo-Petravic, G.; Petravic, M.

    1980-03-01

    This paper is an extension of the previous paper, A Program Generator for the Incomplete LU-Decomposition-Conjugate Gradient (ILUCG) Method which appeared in Computer Physics Communications. In that paper a generator program was presented which produced a code package to solve the system of equations Ax/sub approx./ = b/sub approx./, where A is an arbitrary nonsingular matrix, by the ILUCG method. In the present paper an alternative generator program is offered which produces a code package applicable to the case where A is symmetric and positive definite. The numerical algorithm used is the Incomplete Cholesky Conjugate Gradient (ICCG) method of Meijerink and Van der Vorst, which executes approximately twice as fast per iteration as the ILUCG method. In addition, an optional preprocessor is provided to treat the case of a not diagonally dominant nonsymmetric and nonsingular matrix A by solving the equation A/sup T/Ax/sub approx./ = A/sup T/b/sub approx./.

  18. Efficient Quantum Private Communication Based on Dynamic Control Code Sequence

    NASA Astrophysics Data System (ADS)

    Cao, Zheng-Wen; Feng, Xiao-Yi; Peng, Jin-Ye; Zeng, Gui-Hua; Qi, Jin

    2017-04-01

    Based on chaos and quantum properties, we propose a quantum private communication scheme with dynamic control code sequence. The initial sequence is obtained via chaotic systems, and the control code sequence is derived by grouping, XOR and extracting. A shift cycle algorithm is designed to enable the dynamic change of control code sequence. Analysis shows that transmission efficiency could reach 100 % with high dynamics and security.

  19. Efficient Quantum Private Communication Based on Dynamic Control Code Sequence

    NASA Astrophysics Data System (ADS)

    Cao, Zheng-Wen; Feng, Xiao-Yi; Peng, Jin-Ye; Zeng, Gui-Hua; Qi, Jin

    2016-12-01

    Based on chaos and quantum properties, we propose a quantum private communication scheme with dynamic control code sequence. The initial sequence is obtained via chaotic systems, and the control code sequence is derived by grouping, XOR and extracting. A shift cycle algorithm is designed to enable the dynamic change of control code sequence. Analysis shows that transmission efficiency could reach 100 % with high dynamics and security.

  20. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409T with an incomplete denitrification pathway

    DOE PAGES

    Zhou, En -Min; Murugapiran, Senthil K.; Mefferd, Chrisabelle C.; ...

    2016-02-27

    Thermus amyloliquefaciens type strain YIM 77409T is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409T together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transporters and enzymesmore » for heterotrophy highlight the broad heterotrophic lifestyle of this organism. Furthermore, a denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain.« less

  1. Hybrid ARQ schemes employing coded modulation and sequence combining

    NASA Astrophysics Data System (ADS)

    Deng, Robert H.

    1994-06-01

    We propose and analyze two hybrid automatic-repeat-request (ARQ) schemes employing bandwidth efficient coded modulation and coded sequence combining. In the first scheme, a trellis-coded modulation (TCM) is used to control channel noise; while in the second scheme a concatenated coded modulation is employed. The concatenated coded modulation is formed by cascading a Reed-Solomon (RS) outer code and a coded modulation (BCM) inner code. In both schemes, the coded modulation decoder, by performing sequence combining and soft-decision maximum likelihood decoding, makes full use of the information available in all received sequences corresponding to a given information message. It is shown, by means of analysis as well as computer simulations, that both schemes are capable of providing high throughput efficiencies over a wide range of signal-to-noise ratios. The schemes are suitable for large file transfers over satellite communication links where high throughput and high reliability are required.

  2. Orpinomyces cellulase celf protein and coding sequences

    DOEpatents

    Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.

    2000-09-05

    A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.

  3. SEQassembly: A Practical Tools Program for Coding Sequences Splicing

    NASA Astrophysics Data System (ADS)

    Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

    CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.

  4. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  5. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  6. Full-length HLA-DRB1 coding sequences generated by a hemizygous RNA-SBT approach.

    PubMed

    Gerritsen, K E H; Groeneweg, M; Meertens, C M H; Voorter, C E M; Tilanus, M G J

    2015-11-01

    Currently 1582 HLA-DRB1 alleles have been identified in the IMGT/HLA database (v3.18). Among those alleles, more than 90% have incomplete allele sequences, which complicates the analysis of the functional relevance of polymorphism beyond exon 2. The polymorphic index of each individual exon of the currently known allele sequences, shows that polymorphism is present in all exons, albeit not equally abundant. Full-length HLA-DRB1 RNA sequencing identifies polymorphism of the complete coding region. Here we describe a hemizygous full-length RNA sequence-based typing (SBT) approach based on group-specific HLA-DRB1 amplification and subsequent sequencing. RNA full-length sequences can easily be accessed because of the short amplicon length (801 bp). The RNA-SBT approach was successfully validated on a panel of DRB1 alleles having fully known coding sequences according to the IMGT/HLA database, and cover all serological equivalents. Subsequently, the approach was applied on a panel of 54 alleles with incomplete allele sequences, resulting in full-length coding sequences and the identification of one new and one corrected allele. This study shows the universal applicability of the RNA-based sequencing approach to identify full-length coding sequences and to define the polymorphic content of HLA-DRB1 alleles.

  7. Designedly Incomplete Utterances: A Pedagogical Practice for Eliciting Knowledge Displays in Error Correction Sequences.

    ERIC Educational Resources Information Center

    Koshik, Irene

    2002-01-01

    Uses a conversation analytic framework to analyze a practice used by teachers in 1-0-1, second language writing conferences when eliciting self-correction of students' written language errors. This type of turn used to elicit a knowledge display from the student is labeled designedly incomplete utterance (DIU). Teachers use DIUs made up of…

  8. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  9. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  10. Three Ingredients for Improved Global Aftershock Forecasts: Tectonic Region, Time-Dependent Catalog Incompleteness, and Inter-Sequence Variability

    NASA Astrophysics Data System (ADS)

    Page, M. T.; Hardebeck, J.; Felzer, K. R.; Michael, A. J.; van der Elst, N.

    2015-12-01

    Following a large earthquake, seismic hazard can be orders of magnitude higher than the long-term average as a result of aftershock triggering. Due to this heightened hazard, there is a demand from emergency managers and the public for rapid, authoritative, and reliable aftershock forecasts. In the past, USGS aftershock forecasts following large, global earthquakes have been released on an ad-hoc basis with inconsistent methods, and in some cases, aftershock parameters adapted from California. To remedy this, we are currently developing an automated aftershock product that will generate more accurate forecasts based on the Reasenberg and Jones (Science, 1989) method. To better capture spatial variations in aftershock productivity and decay, we estimate regional aftershock parameters for sequences within the Garcia et al. (BSSA, 2012) tectonic regions. We find that regional variations for mean aftershock productivity exceed a factor of 10. The Reasenberg and Jones method combines modified-Omori aftershock decay, Utsu productivity scaling, and the Gutenberg-Richter magnitude distribution. We additionally account for a time-dependent magnitude of completeness following large events in the catalog. We generalize the Helmstetter et al. (2005) equation for short-term aftershock incompleteness and solve for incompleteness levels in the global NEIC catalog following large mainshocks. In addition to estimating average sequence parameters within regions, we quantify the inter-sequence parameter variability. This allows for a more complete quantification of the forecast uncertainties and Bayesian updating of the forecast as sequence-specific information becomes available.

  11. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  12. The Coding and Inter-Manual Transfer of Movement Sequences

    PubMed Central

    Shea, Charles H.; Kovacs, Attila J.; Panzer, Stefan

    2011-01-01

    The manuscript reviews recent experiments that use inter-manual transfer and inter-manual practice paradigms to determine the coordinate system (visual–spatial or motor) used in the coding of movement sequences during physical and observational practice. The results indicated that multi-element movement sequences are more effectively coded in visual–spatial coordinates even following extended practice, while very early in practice movement sequences with only a few movement elements and relatively short durations are coded in motor coordinates. Likewise, inter-manual practice of relatively simple movement sequences show benefits of right and left limb practice that involves the same motor coordinates while the opposite is true for more complex sequences. The results suggest that the coordinate system used to code the sequence information is linked to both the task characteristics and the control processes used to produce the sequence. These findings have the potential to greatly enhance our understanding of why in some conditions participants following practice with one limb or observation of one limb practice can effectively perform the task with the contralateral limb while in other (often similar) conditions cannot. PMID:21716583

  13. Nucleotide sequence alignment using sparse coding and belief propagation.

    PubMed

    Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang; Jiang, Xiaoqian; Ohno-Machado, Lucila; Cheng, Samuel

    2013-01-01

    Advances in DNA information extraction techniques have led to huge sequenced genomes from organisms spanning the tree of life. This increasing amount of genomic information requires tools for comparison of the nucleotide sequences. In this paper, we propose a novel nucleotide sequence alignment method based on sparse coding and belief propagation to compare the similarity of the nucleotide sequences. We used the neighbors of each nucleotide as features, and then we employed sparse coding to find a set of candidate nucleotides. To select optimum matches, belief propagation was subsequently applied to these candidate nucleotides. Experimental results show that the proposed approach is able to robustly align nucleotide sequences and is competitive to SOAPaligner [1] and BWA [2].

  14. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  15. Algebraic solution of the synthesis problem for coded sequences

    SciTech Connect

    Leukhin, Anatolii N

    2005-08-31

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups. (fourth seminar to the memory of d.n. klyshko)

  16. Improving mRNA 5' coding sequence determination in the mouse genome.

    PubMed

    Piovesan, Allison; Caracausi, Maria; Pelleri, Maria Chiara; Vitale, Lorenza; Martini, Silvia; Bassani, Chiara; Gurioli, Annalisa; Casadei, Raffaella; Soldà, Giulia; Strippoli, Pierluigi

    2014-04-01

    The incomplete determination of the mRNA 5' end sequence may lead to the incorrect assignment of the first AUG codon and to errors in the prediction of the encoded protein product. Due to the significance of the mouse as a model organism in biomedical research, we performed a systematic identification of coding regions at the 5' end of all known mouse mRNAs, using an automated expressed sequence tag (EST)-based approach which we have previously described. By parsing almost 4 million BLAT alignments we found 351 mouse loci, out of 20,221 analyzed, in which an extension of the mRNA 5' coding region was identified. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for Apc2 and Mknk2 cDNAs. We also generated a list of 16,330 mouse mRNAs where the presence of an in-frame stop codon upstream of the known start codon indicates completeness of the coding sequence at 5' end in the current form. Systematic searches in the main mouse genome databases and genome browsers showed that 82% of our results are original and have not been identified by their annotation pipelines. Moreover, the same information is not easily derivable from RNA-Seq data, due to short sequence length and laboriousness in building full-length transcript structures. In conclusion, our results improve the determination of full-length 5' coding sequences and might be useful in order to reduce errors when studying mouse gene structure and function in biomedical research.

  17. Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences.

    PubMed

    Marino, Maria Francesca; Tzavidis, Nikos; Alfò, Marco

    2016-01-01

    Quantile regression provides a detailed and robust picture of the distribution of a response variable, conditional on a set of observed covariates. Recently, it has be been extended to the analysis of longitudinal continuous outcomes using either time-constant or time-varying random parameters. However, in real-life data, we frequently observe both temporal shocks in the overall trend and individual-specific heterogeneity in model parameters. A benchmark dataset on HIV progression gives a clear example. Here, the evolution of the CD4 log counts exhibits both sudden temporal changes in the overall trend and heterogeneity in the effect of the time since seroconversion on the response dynamics. To accommodate such situations, we propose a quantile regression model, where time-varying and time-constant random coefficients are jointly considered. Since observed data may be incomplete due to early drop-out, we also extend the proposed model in a pattern mixture perspective. We assess the performance of the proposals via a large-scale simulation study and the analysis of the CD4 count data.

  18. Coding of stimulus sequences by population responses in visual cortex

    PubMed Central

    Benucci, Andrea; Ringach, Dario L; Carandini, Matteo

    2009-01-01

    Neuronal populations in sensory cortex represent the time-changing sensory input through a spatiotemporal code. What are the rules that govern this code? We measured membrane potentials and spikes from neuronal populations in cat visual cortex (V1), through voltage-sensitive dyes and electrode arrays. We first characterized the population response to a single orientation. As response amplitude grew, population tuning width remained constant for membrane potential responses and became progressively sharper for spike responses. We then asked how these single-orientation responses combine to code for successive orientations. We found that they combine through simple linear summation. Linearity, however, is violated after stimulus offset, when responses exhibit an unexplained persistence. Thanks to linearity, the interactions between responses to successive stimuli are minimal. We demonstrate that higher cortical areas may reconstruct the stimulus sequence from V1 population responses through a simple instantaneous decoder. In area V1, therefore, spatial and temporal coding operate largely independently. PMID:19749748

  19. RNAcentral: a comprehensive database of non-coding RNA sequences

    PubMed Central

    2017-01-01

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/. PMID:27794554

  20. RNAcentral: A comprehensive database of non-coding RNA sequences

    DOE PAGES

    Williams, Kelly Porter; Lau, Britney Yan

    2016-10-28

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less

  1. RNAcentral: A comprehensive database of non-coding RNA sequences

    SciTech Connect

    Williams, Kelly Porter; Lau, Britney Yan

    2016-10-28

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality.

  2. Transcriptome Sequencing Reveals the Character of Incomplete Dosage Compensation across Multiple Tissues in Flycatchers

    PubMed Central

    Uebbing, Severin; Künstner, Axel; Mäkinen, Hannu; Ellegren, Hans

    2013-01-01

    Sex chromosome divergence, which follows the cessation of recombination and degeneration of the sex-limited chromosome, can cause a reduction in expression level for sex-linked genes in the heterozygous sex, unless some mechanisms of dosage compensation develops to counter the reduction in gene dose. Because large-scale perturbations in expression levels arising from changes in gene dose might have strong deleterious effects, the evolutionary response should be strong. However, in birds and in at least some other female heterogametic organisms, wholesale sex chromosome dosage compensation does not seem to occur. Using RNA-seq of multiple tissues and individuals, we investigated male and female expression levels of Z-linked and autosomal genes in the collared flycatcher, a bird for which a draft genome sequence recently has been reported. We found that male expression of Z-linked genes was on average 50% higher than female expression, although there was considerable variation in the male-to-female ratio among genes. The ratio for individual genes was well correlated among tissues and there was also a correlation in the extent of compensation between flycatcher and chicken orthologs. The relative excess of male expression was positively correlated with expression breadth, expression level, and number of interacting proteins (protein connectivity), and negatively correlated with variance in expression. These observations lead to a model of compensation occurring on a gene-by-gene basis, supported by an absence of clustering of genes on the Z chromosome with respect to the extent of compensation. Equal mean expression level of autosomal and Z-linked genes in males, and 50% higher expression of autosomal than Z-linked genes in females, is compatible with that partial compensation is achieved by hypertranscription from females’ single Z chromosome. A comparison with male-to-female expression ratios in orthologous Z-linked genes of ostriches, where Z–W recombination

  3. Multifractal detrended cross-correlation analysis of coding and non-coding DNA sequences through chaos-game representation

    NASA Astrophysics Data System (ADS)

    Pal, Mayukha; Satish, B.; Srinivas, K.; Rao, P. Madhusudana; Manimaran, P.

    2015-10-01

    We propose a new approach combining the chaos game representation and the two dimensional multifractal detrended cross correlation analysis methods to examine multifractal behavior in power law cross correlation between any pair of nucleotide sequences of unequal lengths. In this work, we analyzed the characteristic behavior of coding and non-coding DNA sequences of eight prokaryotes. The results show the presence of strong multifractal nature between coding and non-coding sequences of all data sets. We found that this integrative approach helps us to consider complete DNA sequences for characterization, and further it may be useful for classification, clustering, identification of class affiliation of nucleotide sequences etc. with high precision.

  4. Sequence and Structural Analyses for Functional Non-coding RNAs

    NASA Astrophysics Data System (ADS)

    Sakakibara, Yasubumi; Sato, Kengo

    Analysis and detection of functional RNAs are currently important topics in both molecular biology and bioinformatics research. Several computational methods based on stochastic context-free grammars (SCFGs) have been developed for modeling and analysing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNAs and are used for structural alignments of RNA sequences. Such stochastic models, however, are not sufficient to discriminate member sequences of an RNA family from non-members, and hence to detect non-coding RNA regions from genome sequences. Recently, the support vector machine (SVM) and kernel function techniques have been actively studied and proposed as a solution to various problems in bioinformatics. SVMs are trained from positive and negative samples and have strong, accurate discrimination abilities, and hence are more appropriate for the discrimination tasks. A few kernel functions that extend the string kernel to measure the similarity of two RNA sequences from the viewpoint of secondary structures have been proposed. In this article, we give an overview of recent progress in SCFG-based methods for RNA sequence analysis and novel kernel functions tailored to measure the similarity of two RNA sequences and developed for use with support vector machines (SVM) in discriminating members of an RNA family from non-members.

  5. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  6. Differential direct coding: a compression algorithm for nucleotide sequence data

    PubMed Central

    Vey, Gregory

    2009-01-01

    While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations. PMID:20157486

  7. The Cipher Code of Simple Sequence Repeats in "Vampire Pathogens".

    PubMed

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-07-28

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like "vampire pathogens" (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation.

  8. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409T with an incomplete denitrification pathway

    SciTech Connect

    Zhou, En -Min; Murugapiran, Senthil K.; Mefferd, Chrisabelle C.; Liu, Lan; Xian, Wen -Dong; Yin, Yi -Rui; Ming, Hong; Yu, Tian -Tian; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T. B. K.; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Spunde, Alexander; Kyrpides, Nikos; Woyke, Tanja; Li, Wen -Jun; Hedlund, Brian P.

    2016-02-27

    Thermus amyloliquefaciens type strain YIM 77409T is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409T together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transporters and enzymes for heterotrophy highlight the broad heterotrophic lifestyle of this organism. Furthermore, a denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain.

  9. Coding Deficits in Noise-Induced Hidden Hearing Loss May Stem from Incomplete Repair of Ribbon Synapses in the Cochlea

    PubMed Central

    Shi, Lijuan; Chang, Yin; Li, Xiaowei; Aiken, Steven J.; Liu, Lijie; Wang, Jian

    2016-01-01

    Recent evidence has shown that noise-induced damage to the synapse between inner hair cells (IHCs) and type I afferent auditory nerve fibers (ANFs) may occur in the absence of permanent threshold shift (PTS), and that synapses connecting IHCs with low spontaneous rate (SR) ANFs are disproportionately affected. Due to the functional importance of low-SR ANF units for temporal processing and signal coding in noisy backgrounds, deficits in cochlear coding associated with noise-induced damage may result in significant difficulties with temporal processing and hearing in noise (i.e., “hidden hearing loss”). However, significant noise-induced coding deficits have not been reported at the single unit level following the loss of low-SR units. We have found evidence to suggest that some aspects of neural coding are not significantly changed with the initial loss of low-SR ANFs, and that further coding deficits arise in association with the subsequent reestablishment of the synapses. This suggests that synaptopathy in hidden hearing loss may be the result of insufficient repair of disrupted synapses, and not simply due to the loss of low-SR units. These coding deficits include decreases in driven spike rate for intensity coding as well as several aspects of temporal coding: spike latency, peak-to-sustained spike ratio and the recovery of spike rate as a function of click-interval. PMID:27252621

  10. Prevalence of transcription promoters within archaeal operons and coding sequences

    PubMed Central

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements. PMID:19536208

  11. Genetic algorithms with permutation coding for multiple sequence alignment.

    PubMed

    Ben Othman, Mohamed Tahar; Abdel-Azim, Gamil

    2013-08-01

    Multiple sequence alignment (MSA) is one of the topics of bio informatics that has seriously been researched. It is known as NP-complete problem. It is also considered as one of the most important and daunting tasks in computational biology. Concerning this a wide number of heuristic algorithms have been proposed to find optimal alignment. Among these heuristic algorithms are genetic algorithms (GA). The GA has mainly two major weaknesses: it is time consuming and can cause local minima. One of the significant aspects in the GA process in MSA is to maximize the similarities between sequences by adding and shuffling the gaps of Solution Coding (SC). Several ways for SC have been introduced. One of them is the Permutation Coding (PC). We propose a hybrid algorithm based on genetic algorithms (GAs) with a PC and 2-opt algorithm. The PC helps to code the MSA solution which maximizes the gain of resources, reliability and diversity of GA. The use of the PC opens the area by applying all functions over permutations for MSA. Thus, we suggest an algorithm to calculate the scoring function for multiple alignments based on PC, which is used as fitness function. The time complexity of the GA is reduced by using this algorithm. Our GA is implemented with different selections strategies and different crossovers. The probability of crossover and mutation is set as one strategy. Relevant patents have been probed in the topic.

  12. Ribosomal S27a coding sequences upstream of ubiquitin coding sequences in the genome of a pestivirus.

    PubMed

    Becher, P; Orlich, M; Thiel, H J

    1998-11-01

    Molecular characterization of cytopathogenic (cp) bovine viral diarrhea virus (BVDV) strain CP Rit, a temperature-sensitive strain widely used for vaccination, revealed that the viral genomic RNA is about 15.2 kb long, which is about 2.9 kb longer than the one of noncytopathogenic (noncp) BVDV strains. Molecular cloning and nucleotide sequencing of parts of the genome resulted in the identification of a duplication of the genomic region encoding nonstructural proteins NS3, NS4A, and part of NS4B. In addition, a nonviral sequence was found directly upstream of the second copy of the NS3 gene. The 3' part of this inserted sequence encodes an N-terminally truncated ubiquitin monomer. This is remarkable since all described cp BVDV strains with ubiquitin coding sequences contain at least one complete ubiquitin monomer. The 5' region of the nonviral sequence did not show any homology to cellular sequences identified thus far in cp BVDV strains. Databank searches revealed that this second cellular insertion encodes part of ribosomal protein S27a. Further analyses included molecular cloning and nucleotide sequencing of the cellular recombination partner. Sequence comparisons strongly suggest that the S27a and the ubiquitin coding sequences found in the genome of CP Rit were both derived from a bovine mRNA encoding a hybrid protein with the structure NH2-ubiquitin-S27a-COOH. Polyprotein processing in the genomic region encoding the N-terminal part of NS4B, the two cellular insertions, and NS3 was studied by a transient-expression assay. The respective analyses showed that the S27a-derived polypeptide, together with the truncated ubiquitin, served as processing signal to yield NS3, whereas the truncated ubiquitin alone was not capable of mediating the cleavage. Since the expression of NS3 is strictly correlated with the cp phenotype of BVDV, the altered genome organization leading to expression of NS3 most probably represents the genetic basis of cytopathogenicity of CP Rit.

  13. Code-Time Diversity for Direct Sequence Spread Spectrum Systems

    PubMed Central

    Hassan, A. Y.

    2014-01-01

    Time diversity is achieved in direct sequence spread spectrum by receiving different faded delayed copies of the transmitted symbols from different uncorrelated channel paths when the transmission signal bandwidth is greater than the coherence bandwidth of the channel. In this paper, a new time diversity scheme is proposed for spread spectrum systems. It is called code-time diversity. In this new scheme, N spreading codes are used to transmit one data symbol over N successive symbols interval. The diversity order in the proposed scheme equals to the number of the used spreading codes N multiplied by the number of the uncorrelated paths of the channel L. The paper represents the transmitted signal model. Two demodulators structures will be proposed based on the received signal models from Rayleigh flat and frequency selective fading channels. Probability of error in the proposed diversity scheme is also calculated for the same two fading channels. Finally, simulation results are represented and compared with that of maximal ration combiner (MRC) and multiple-input and multiple-output (MIMO) systems. PMID:24982925

  14. Current status and new features of the Consensus Coding Sequence database

    PubMed Central

    Farrell, Catherine M.; O’Leary, Nuala A.; Harte, Rachel A.; Loveland, Jane E.; Wilming, Laurens G.; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen M. J.; Aken, Bronwen; Hiatt, Susan M.; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles A.; Brown, Garth R.; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike P.; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian D.; Hunt, Toby; Webb, David; Thomas, Mark; Tamez, Pamela; Rangwala, Sanjida H.; McGarvey, Kelly M.; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan M.; Gonzalez, Jose M.; Gilbert, James G. R.; Trevanion, Stephen J.; Baertsch, Robert; Harrow, Jennifer L.; Hubbard, Tim; Ostell, James M.; Haussler, David; Pruitt, Kim D.

    2014-01-01

    The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets. PMID:24217909

  15. Current status and new features of the Consensus Coding Sequence database.

    PubMed

    Farrell, Catherine M; O'Leary, Nuala A; Harte, Rachel A; Loveland, Jane E; Wilming, Laurens G; Wallin, Craig; Diekhans, Mark; Barrell, Daniel; Searle, Stephen M J; Aken, Bronwen; Hiatt, Susan M; Frankish, Adam; Suner, Marie-Marthe; Rajput, Bhanu; Steward, Charles A; Brown, Garth R; Bennett, Ruth; Murphy, Michael; Wu, Wendy; Kay, Mike P; Hart, Jennifer; Rajan, Jeena; Weber, Janet; Snow, Catherine; Riddick, Lillian D; Hunt, Toby; Webb, David; Thomas, Mark; Tamez, Pamela; Rangwala, Sanjida H; McGarvey, Kelly M; Pujar, Shashikant; Shkeda, Andrei; Mudge, Jonathan M; Gonzalez, Jose M; Gilbert, James G R; Trevanion, Stephen J; Baertsch, Robert; Harrow, Jennifer L; Hubbard, Tim; Ostell, James M; Haussler, David; Pruitt, Kim D

    2014-01-01

    The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

  16. Optimal coding of vectorcardiographic sequences using spatial prediction.

    PubMed

    Augustyniak, Piotr

    2007-05-01

    This paper discusses principles, implementation details, and advantages of sequence coding algorithm applied to the compression of vectocardiograms (VCG). The main novelty of the proposed method is the automatic management of distortion distribution controlled by the local signal contents in both technical and medical aspects. As in clinical practice, the VCG loops representing P, QRS, and T waves in the three-dimensional (3-D) space are considered here as three simultaneous sequences of objects. Because of the similarity of neighboring loops, encoding the values of prediction error significantly reduces the data set volume. The residual values are de-correlated with the discrete cosine transform (DCT) and truncated at certain energy threshold. The presented method is based on the irregular temporal distribution of medical data in the signal and takes advantage of variable sampling frequency for automatically detected VCG loops. The features of the proposed algorithm are confirmed by the results of the numerical experiment carried out for a wide range of real records. The average data reduction ratio reaches a value of 8.15 while the percent root-mean-square difference (PRD) distortion ratio for the most important sections of signal does not exceed 1.1%.

  17. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  18. Image sequence coding using 3D scene models

    NASA Astrophysics Data System (ADS)

    Girod, Bernd

    1994-09-01

    The implicit and explicit use of 3D models for image sequence coding is discussed. For implicit use, a 3D model can be incorporated into motion compensating prediction. A scheme that estimates the displacement vector field with a rigid body motion constraint by recovering epipolar lines from an unconstrained displacement estimate and then repeating block matching along the epipolar line is proposed. Experimental results show that an improved displacement vector field can be obtained with a rigid body motion constraint. As an example for explicit use, various results with a facial animation model for videotelephony are discussed. A 13 X 16 B-spline mask can be adapted automatically to individual faces and is used to generate facial expressions based on FACS. A depth-from-defocus range camera suitable for real-time facial motion tracking is described. Finally, the real-time facial animation system `Traugott' is presented that has been used to generate several hours of broadcast video. Experiments suggest that a videophone system based on facial animation might require a transmission bitrate of 1 kbit/s or below.

  19. In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy.

    PubMed

    Zhang, Jin; Zhang, Wenqing; Yang, Huijie

    2016-01-01

    Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).

  20. Coding and 3' non-coding nucleotide sequence of chalcone synthase mRNA and assignment of amino acid sequence of the enzyme

    PubMed Central

    Reimold, Ursula; Kröger, Manfred; Kreuzaler, Fritz; Hahlbrock, Klaus

    1983-01-01

    The nucleotide sequence of an almost complete cDNA copy of chalcone synthase mRNA from cultured parsley cells (Petroselinum hortense) has been determined. The cDNA copy comprised the complete coding sequence for chalcone synthase, a short A-rich stretch of the 5' non-coding region and the complete 3' non-coding region including a poly(A) tail. The amino acid sequence deduced from the nucleotide sequence of the cDNA is consistent with a partial N-terminal sequence analysis, the total amino acid composition, the cyanogen bromide cleavage pattern, and the apparent mol. wt. of the subunit of the purified enzyme. PMID:16453477

  1. Non-extensive trends in the size distribution of coding and non-coding DNA sequences in the human genome

    NASA Astrophysics Data System (ADS)

    Oikonomou, Th.; Provata, A.

    2006-03-01

    We study the primary DNA structure of four of the most completely sequenced human chromosomes (including chromosome 19 which is the most dense in coding), using non-extensive statistics. We show that the exponents governing the spatial decay of the coding size distributions vary between 5.2 ≤r ≤5.7 for the short scales and 1.45 ≤q ≤1.50 for the large scales. On the contrary, the exponents governing the spatial decay of the non-coding size distributions in these four chromosomes, take the values 2.4 ≤r ≤3.2 for the short scales and 1.50 ≤q ≤1.72 for the large scales. These results, in particular the values of the tail exponent q, indicate the existence of correlations in the coding and non-coding size distributions with tendency for higher correlations in the non-coding DNA.

  2. Diversity in isochore structure among cold-blooded vertebrates based on GC content of coding and non-coding sequences.

    PubMed

    Fortes, Gloria G; Bouza, Carmen; Martínez, Paulino; Sánchez, Laura

    2007-03-01

    To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.

  3. A convolutional code-based sequence analysis model and its application.

    PubMed

    Liu, Xiao; Geng, Xiaoli

    2013-04-16

    A new approach for encoding DNA sequences as input for DNA sequence analysis is proposed using the error correction coding theory of communication engineering. The encoder was designed as a convolutional code model whose generator matrix is designed based on the degeneracy of codons, with a codon treated in the model as an informational unit. The utility of the proposed model was demonstrated through the analysis of twelve prokaryote and nine eukaryote DNA sequences having different GC contents. Distinct differences in code distances were observed near the initiation and termination sites in the open reading frame, which provided a well-regulated characterization of the DNA sequences. Clearly distinguished period-3 features appeared in the coding regions, and the characteristic average code distances of the analyzed sequences were approximately proportional to their GC contents, particularly in the selected prokaryotic organisms, presenting the potential utility as an added taxonomic characteristic for use in studying the relationships of living organisms.

  4. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.

    PubMed

    Ranwez, Vincent; Harispe, Sébastien; Delsuc, Frédéric; Douzery, Emmanuel J P

    2011-01-01

    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.

  5. OrfPredictor: predicting protein-coding regions in EST-derived sequences

    PubMed Central

    Min, Xiang Jia; Butler, Gregory; Storms, Reginald; Tsang, Adrian

    2005-01-01

    OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at . PMID:15980561

  6. Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains.

    PubMed

    Lavoie, Hugo; Debeane, Francois; Trinh, Quoc-Dien; Turcotte, Jean-Francois; Corbeil-Girard, Louis-Philippe; Dicaire, Marie-Josée; Saint-Denis, Anik; Pagé, Martin; Rouleau, Guy A; Brais, Bernard

    2003-11-15

    Mutations causing expansions of polyalanine domains are responsible for nine hereditary diseases. Other GC-rich sequences coding for some polyalanine domains were found to be polymorphic in human. These observations prompted us to identify all sequences in the human genome coding for polyalanine stretches longer than four alanines and establish their degree of polymorphism. We identified 494 annotated human proteins containing 604 polyalanine domains. Thirty-two percent (31/98) of tested sequences coding for more than seven alanines were polymorphic. The length of the polyalanine-coding sequence and its GCG or GCC repeat content are the major predictors of polymorphism. GCG codons are over-represented in human polyalanine coding sequences. Our data suggest that GCG and GCC codons play a key role in polyalanine-coding sequence appearance and polymorphism. The grouping by shared function of polyalanine-containing proteins in Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans shows that the majority are involved in transcriptional regulation. Phylogenetic analyses of HOX, GATA and EVX protein families demonstrate that polyalanine domains arose independently in different members of these families, suggesting that convergent molecular evolution may have played a role. Finally polyalanine domains in vertebrates are conserved between mammals and are rarer and shorter in Gallus gallus and Danio rerio. Together our results show that the polymorphic nature of sequences coding for polyalanine domains makes them prime candidates for mutations in hereditary diseases and suggests that they have appeared in many different protein families through convergent evolution.

  7. Nucleotide sequence from the coding region of rabbit β-globin messenger RNA

    PubMed Central

    Proudfoot, N.J.

    1976-01-01

    A sequence of 89 nucleotides from rabbit β-globin mRNA has been determined and is shown to code for residues 107 to 137 of the β-globin protein. In addition, a sequence heterogeneity has been identified within this 89 nucleotide long sequence which corresponds to a known polymorphic variant of rabbit β-globin. Images PMID:61580

  8. RNA sequencing of transcriptomes in human brain regions: protein-coding and non-coding RNAs, isoforms and alleles.

    PubMed

    Webb, Amy; Papp, Audrey C; Curtis, Amanda; Newman, Leslie C; Pietrzak, Maciej; Seweryn, Michal; Handelman, Samuel K; Rempala, Grzegorz A; Wang, Daqing; Graziosa, Erica; Tyndale, Rachel F; Lerman, Caryn; Kelsoe, John R; Mash, Deborah C; Sadee, Wolfgang

    2015-11-23

    We used RNA sequencing to analyze transcript profiles of ten autopsy brain regions from ten subjects. RNA sequencing techniques were designed to detect both coding and non-coding RNA, splice isoform composition, and allelic expression. Brain regions were selected from five subjects with a documented history of smoking and five non-smokers. Paired-end RNA sequencing was performed on SOLiD instruments to a depth of >40 million reads, using linearly amplified, ribosomally depleted RNA. Sequencing libraries were prepared with both poly-dT and random hexamer primers to detect all RNA classes, including long non-coding (lncRNA), intronic and intergenic transcripts, and transcripts lacking poly-A tails, providing additional data not previously available. The study was designed to generate a database of the complete transcriptomes in brain region for gene network analyses and discovery of regulatory variants. Of 20,318 protein coding and 18,080 lncRNA genes annotated from GENCODE and lncipedia, 12 thousand protein coding and 2 thousand lncRNA transcripts were detectable at a conservative threshold. Of the aligned reads, 52 % were exonic, 34 % intronic and 14 % intergenic. A majority of protein coding genes (65 %) was expressed in all regions, whereas ncRNAs displayed a more restricted distribution. Profiles of RNA isoforms varied across brain regions and subjects at multiple gene loci, with neurexin 3 (NRXN3) a prominent example. Allelic RNA ratios deviating from unity were identified in > 400 genes, detectable in both protein-coding and non-coding genes, indicating the presence of cis-acting regulatory variants. Mathematical modeling was used to identify RNAs stably expressed in all brain regions (serving as potential markers for normalizing expression levels), linked to basic cellular functions. An initial analysis of differential expression analysis between smokers and nonsmokers implicated a number of genes, several previously associated with nicotine exposure. RNA

  9. Sequences encoding identical peptides for the analysis and manipulation of coding DNA

    PubMed Central

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression. PMID:23861567

  10. Sequences encoding identical peptides for the analysis and manipulation of coding DNA.

    PubMed

    Sánchez, Joaquín

    2013-01-01

    The use of sequences encoding identical peptides (SEIP) for the in silico analysis of coding DNA from different species has not been reported; the study of such sequences could directly reveal properties of coding DNA that are independent of peptide sequences. For practical purposes SEIP might also be manipulated for e.g. heterologous protein expression. We extracted 1,551 SEIP from human and E. coli and 2,631 SEIP from human and D. melanogaster. We then analyzed codon usage and intercodon dinucleotide tendencies and found differences in both, with more conspicuous disparities between human and E. coli than between human and D. melanogaster. We also briefly manipulated SEIP to find out if they could be used to create new coding sequences. We hence attempted replacement of human by E. coli codons via dicodon exchange but found that full replacement was not possible, this indicated robust species-specific dicodon tendencies. To test another form of codon replacement we isolated SEIP from human and the jellyfish green fluorescent protein (GFP) and we then re-constructed the GFP coding DNA with human tetra-peptide-coding sequences. Results provide proof-of-principle that SEIP may be used to reveal differences in the properties of coding DNA and to reconstruct in pieces a protein coding DNA with sequences from a different organism, the latter might be exploited in heterologous protein expression.

  11. Revisiting the Physico-Chemical Hypothesis of Code Origin: An Analysis Based on Code-Sequence Coevolution in a Finite Population

    NASA Astrophysics Data System (ADS)

    Bandhu, Ashutosh Vishwa; Aggarwal, Neha; Sengupta, Supratim

    2013-12-01

    The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examine the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explore two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.

  12. Indoor Mobile Positioning Based on Lidar Data and Coded Sequence Pattern

    NASA Astrophysics Data System (ADS)

    Wang, Z.; Dong, B.; Chen, D.

    2016-10-01

    This paper proposed a coded sequence pattern for automatic matching of LiDAR point data, the methods including SIFT features, Otsu segmentation and Fast Hough transformation for the identification, positioning and interpret of the coded sequence patterns, the POSIT model for fast computing the translation and rotation parameters of LiDAR point data, so as to achieve fast matching of LiDAR point data and automatic 3D mapping of indoor shafts and tunnels.

  13. Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

    PubMed

    Pastor, D; Amaya, W; García-Olcina, R; Sales, S

    2007-07-01

    We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.

  14. The Coding and Effector Transfer of Movement Sequences

    ERIC Educational Resources Information Center

    Kovacs, Attila J.; Muhlbauer, Thomas; Shea, Charles H.

    2009-01-01

    Three experiments utilizing a 14-element arm movement sequence were designed to determine if reinstating the visual-spatial coordinates, which require movements to the same spatial locations utilized during acquisition, results in better effector transfer than reinstating the motor coordinates, which require the same pattern of homologous muscle…

  15. RNA-DNA sequence differences spell genetic code ambiguities

    PubMed Central

    Nielsen, Michael L.

    2011-01-01

    A recent paper in Science by Li et al. 20111 reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized.  PMID:22567189

  16. Nanopore Sequencing: Electrical Measurements of the Code of Life

    PubMed Central

    Timp, Winston; Mirsaidov, Utkur M.; Wang, Deqiang; Comer, Jeff; Aksimentiev, Aleksei; Timp, Gregory

    2011-01-01

    Sequencing a single molecule of deoxyribonucleic acid (DNA) using a nanopore is a revolutionary concept because it combines the potential for long read lengths (>5 kbp) with high speed (1 bp/10 ns), while obviating the need for costly amplification procedures due to the exquisite single molecule sensitivity. The prospects for implementing this concept seem bright. The cost savings from the removal of required reagents, coupled with the speed of nanopore sequencing places the $1000 genome within grasp. However, challenges remain: high fidelity reads demand stringent control over both the molecular configuration in the pore and the translocation kinetics. The molecular configuration determines how the ions passing through the pore come into contact with the nucleotides, while the translocation kinetics affect the time interval in which the same nucleotides are held in the constriction as the data is acquired. Proteins like α-hemolysin and its mutants offer exquisitely precise self-assembled nanopores and have demonstrated the facility for discriminating individual nucleotides, but it is currently difficult to design protein structure ab initio, which frustrates tailoring a pore for sequencing genomic DNA. Nanopores in solid-state membranes have been proposed as an alternative because of the flexibility in fabrication and ease of integration into a sequencing platform. Preliminary results have shown that with careful control of the dimensions of the pore and the shape of the electric field, control of DNA translocation through the pore is possible. Furthermore, discrimination between different base pairs of DNA may be feasible. Thus, a nanopore promises inexpensive, reliable, high-throughput sequencing, which could thrust genomic science into personal medicine. PMID:21572978

  17. Three ingredients for Improved global aftershock forecasts: Tectonic region, time-dependent catalog incompleteness, and inter-sequence variability

    USGS Publications Warehouse

    Page, Morgan T.; Van Der Elst, Nicholas; Hardebeck, Jeanne L.; Felzer, Karen; Michael, Andrew J.

    2016-01-01

    Following a large earthquake, seismic hazard can be orders of magnitude higher than the long‐term average as a result of aftershock triggering. Because of this heightened hazard, emergency managers and the public demand rapid, authoritative, and reliable aftershock forecasts. In the past, U.S. Geological Survey (USGS) aftershock forecasts following large global earthquakes have been released on an ad hoc basis with inconsistent methods, and in some cases aftershock parameters adapted from California. To remedy this, the USGS is currently developing an automated aftershock product based on the Reasenberg and Jones (1989) method that will generate more accurate forecasts. To better capture spatial variations in aftershock productivity and decay, we estimate regional aftershock parameters for sequences within the García et al. (2012) tectonic regions. We find that regional variations for mean aftershock productivity reach almost a factor of 10. We also develop a method to account for the time‐dependent magnitude of completeness following large events in the catalog. In addition to estimating average sequence parameters within regions, we develop an inverse method to estimate the intersequence parameter variability. This allows for a more complete quantification of the forecast uncertainties and Bayesian updating of the forecast as sequence‐specific information becomes available.

  18. Detection of multiple, novel reverse transcriptase coding sequences in human nucleic acids: relation to primate retroviruses

    SciTech Connect

    Shih, A.; Misra, R.; Rush, M.G.

    1989-01-01

    A variety of chemically synthesized oligonucleotides designed on the basis of amino acid and/or nucleotide sequence data were used to detect a large number of novel reverse transcriptase coding sequences in human and mouse DNAs. Procedures involving Southern blotting, library screening, and the polymerase chain reaction were all used to detect such sequences; the polymerase chain reaction was the most rapid and productive approach. In the polymerase chain reaction, oligonucleotide mixtures based on consensus sequence homologies to reverse transcriptase coding sequences and unique oligonucleotides containing perfect homology to the coding sequences of human T-cell leukemia virus types I and II were both effective in amplifying reverse transcriptase-related DNA. It is shown that human DNA contains a wide spectrum of retrovirus-related reverse transcriptase coding sequences, including some that are clearly related to human T-cell leukemia virus types I and II, some that are related to the L-1 family of long interspersed nucleotide sequences, and others that are related to previously described human endogenous proviral DNAs. In addition, human T-cell leukemia virus type I-related sequences appear to be transcribed in both normal human T cells and in a cell line derived from a human teratocarcinoma.

  19. Correcting sequencing errors in DNA coding regions using a dynamic programming approach.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1995-04-01

    This paper presents an algorithm for detecting and 'correcting' sequencing errors that occur in DNA coding regions. The types of sequencing errors addressed are insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. This would permit improved sequencing efficiency and reduce genome sequencing costs. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of 'neutral' bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. We have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. Preliminary test results have shown the usefulness of this algorithm and also exhibited some of its weakness, providing possible directions for further improvement. On a test set consisting of 68 human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the 'corrected' sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the 'corrupted' sequences using standard GRAIL II method (version 1.2).(ABSTRACT TRUNCATED AT 250 WORDS)

  20. The primordial sequence, ribosomes, and the genetic code.

    NASA Technical Reports Server (NTRS)

    Fox, S. W.; Yuki, A.; Waehneldt, T. V.; Lacey, J. C., Jr.

    1971-01-01

    Experimental investigation of the key question of the origin of life concerning the chronological order in the primordial sequence of nucleic acid, protein, and cell. It is pointed out that, when viewed against the background of experiments on the selective reaction of basic homopolyamine acids with mononucleotides (Lacey and Pruitt, 1969; Woese, 1968), the experiments made help to establish a basis for understanding how information originally flowed from proteins to nucleic acids.

  1. The primordial sequence, ribosomes, and the genetic code.

    NASA Technical Reports Server (NTRS)

    Fox, S. W.; Yuki, A.; Waehneldt, T. V.; Lacey, J. C., Jr.

    1971-01-01

    Experimental investigation of the key question of the origin of life concerning the chronological order in the primordial sequence of nucleic acid, protein, and cell. It is pointed out that, when viewed against the background of experiments on the selective reaction of basic homopolyamine acids with mononucleotides (Lacey and Pruitt, 1969; Woese, 1968), the experiments made help to establish a basis for understanding how information originally flowed from proteins to nucleic acids.

  2. Correcting sequencing errors in DNA coding regions using a dynamic programming approach

    SciTech Connect

    Xu, Y.; Mural, R.J.; Uberbacher, E.C.

    1994-12-01

    This paper presents an algorithm for detecting and ``correcting`` sequencing errors that occur in DNA coding regions. The types of sequencing error addressed include insertions and deletions (indels) of DNA bases. The goal is to provide a capability which makes single-pass or low-redundancy sequence data more informative, reducing the need for high-redundancy sequencing for gene identification and characterization purposes. The algorithm detects sequencing errors by discovering changes in the statistically preferred reading frame within a putative coding region and then inserts a number of ``neutral`` bases at a perceived reading frame transition point to make the putative exon candidate frame consistent. The authors have implemented the algorithm as a front-end subsystem of the GRAIL DNA sequence analysis system to construct a version which is very error tolerant and also intend to use this as a testbed for further development of sequencing error-correction technology. On a test set consisting of 68 Human DNA sequences with 1% randomly generated indels in coding regions, the algorithm detected and corrected 76% of the indels. The average distance between the position of an indel and the predicted one was 9.4 bases. With this subsystem in place, GRAIL correctly predicted 89% of the coding messages with 10% false message on the ``corrected`` sequences, compared to 69% correctly predicted coding messages and 11% falsely predicted messages on the ``corrupted`` sequences using standard GRAIL II method. The method uses a dynamic programming algorithm, and runs in time and space linear to the size of the input sequence.

  3. Divergence of conserved non-coding sequences: rate estimates and relative rate tests.

    PubMed

    Wagner, Günter P; Fried, Claudia; Prohaska, Sonja J; Stadler, Peter F

    2004-11-01

    In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important non-coding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative

  4. Identification of a Polyketide Synthase Coding Sequence Specific for Anatoxin-a-Producing Oscillatoria Cyanobacteria▿ †

    PubMed Central

    Cadel-Six, Sabrina; Iteman, Isabelle; Peyraud-Thomas, Caroline; Mann, Stéphane; Ploux, Olivier; Méjean, Annick

    2009-01-01

    We report the identification of a sequence from the genome of Oscillatoria sp. strain PCC 6506 coding for a polyketide synthase. Using 50 axenic cyanobacteria, we found this sequence only in the genomes of Oscillatoria strains producing anatoxin-a or homoanatoxin-a, indicating its likely involvement in the biosynthesis of these toxins. PMID:19447947

  5. Protection of the genome and central protein-coding sequences by non-coding DNA against DNA damage from radiation.

    PubMed

    Qiu, Guo-Hua

    2015-01-01

    Non-coding DNA comprises a very large proportion of the total genomic content in higher organisms, but its function remains largely unclear. Non-coding DNA sequences constitute the majority of peripheral heterochromatin, which has been hypothesized to be the genome's 'bodyguard' against DNA damage from chemicals and radiation for almost four decades. The bodyguard protective function of peripheral heterochromatin in genome defense has been strengthened by the results from numerous recent studies, which are summarized in this review. These data have suggested that cells and/or organisms with a higher level of heterochromatin and more non-coding DNA sequences, including longer telomeric DNA and rDNAs, exhibit a lower frequency of DNA damage, higher radioresistance and longer lifespan after IR exposure. In addition, the majority of heterochromatin is peripherally located in the three-dimensional structure of genome organization. Therefore, the peripheral heterochromatin with non-coding DNA could play a protective role in genome defense against DNA damage from ionizing radiation by both absorbing the radicals from water radiolysis in the cytosol and reducing the energy of IR. However, the bodyguard protection by heterochromatin has been challenged by the observation that DNA damage is less frequently detected in peripheral heterochromatin than in euchromatin, which is inconsistent with the expectation and simulation results. Previous studies have also shown that the DNA damage in peripheral heterochromatin is rarely repaired and moves more quickly, broadly and outwardly to approach the nuclear pore complex (NPC). Additionally, it has been shown that extrachromosomal circular DNAs (eccDNAs) are formed in the nucleus, highly detectable in the cytoplasm (particularly under stress conditions) and shuttle between the nucleus and the cytoplasm. Based on these studies, this review speculates that the sites of DNA damage in peripheral heterochromatin could occur more

  6. Muscle coding sequences and their regulation during myogenesis: cloning of muscle actin cDNA probes.

    PubMed

    Minty, A; Caravatti, M; Robert, B; Cohen, A; Daubas, P; Weydert, A; Gros, F; Buckingham, M

    1981-01-01

    For a number of years our group has been mainly interested in the regulation of muscle gene expression during myogenesis. Using primary cultures and cell lines we have tried to find out whether the coding sequences for muscle proteins are already present in an unexpressed form or if there is a transcriptional switch at the onset of differentiation. Metabolic studies on pulse-labelled RNA, together with translation and molecular hybridization experiments have given a certain number of indications. More recently the development of genetic engineering techniques has made it possible to answer these questions directly with probes which are complementary to specific muscle coding sequences. We have identified a plasmid which contains a coding sequence for muscle actin. Other recombinant plasmids are being characterized. Such plasmids, used as probes, will permit us to study the organization and expression of the genes coding for the contractile proteins in muscle cells.

  7. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  8. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein-Coding Regions.

    PubMed

    Lelieveld, Stefan H; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A; Gilissen, Christian

    2015-08-01

    For next-generation sequencing technologies, sufficient base-pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole-genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole-exome sequencing (WES) platforms, and compared single-base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x-160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87-fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose.

  9. Comparison of Exome and Genome Sequencing Technologies for the Complete Capture of Protein‐Coding Regions

    PubMed Central

    Lelieveld, Stefan H.; Spielmann, Malte; Mundlos, Stefan; Veltman, Joris A.

    2015-01-01

    ABSTRACT For next‐generation sequencing technologies, sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole‐genome sequencing (WGS) platforms offer improved coverage of coding regions compared with whole‐exome sequencing (WES) platforms, and compared single‐base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose. PMID:25973577

  10. Evaluation of correlation property of linear-frequency-modulated signals coded by maximum-length sequences

    NASA Astrophysics Data System (ADS)

    Yamanaka, Kota; Hirata, Shinnosuke; Hachiya, Hiroyuki

    2016-07-01

    Ultrasonic distance measurement for obstacles has been recently applied in automobiles. The pulse-echo method based on the transmission of an ultrasonic pulse and time-of-flight (TOF) determination of the reflected echo is one of the typical methods of ultrasonic distance measurement. Improvement of the signal-to-noise ratio (SNR) of the echo and the avoidance of crosstalk between ultrasonic sensors in the pulse-echo method are required in automotive measurement. The SNR of the reflected echo and the resolution of the TOF are improved by the employment of pulse compression using a maximum-length sequence (M-sequence), which is one of the binary pseudorandom sequences generated from a linear feedback shift register (LFSR). Crosstalk is avoided by using transmitted signals coded by different M-sequences generated from different LFSRs. In the case of lower-order M-sequences, however, the number of measurement channels corresponding to the pattern of the LFSR is not enough. In this paper, pulse compression using linear-frequency-modulated (LFM) signals coded by M-sequences has been proposed. The coding of LFM signals by the same M-sequence can produce different transmitted signals and increase the number of measurement channels. In the proposed method, however, the truncation noise in autocorrelation functions and the interference noise in cross-correlation functions degrade the SNRs of received echoes. Therefore, autocorrelation properties and cross-correlation properties in all patterns of combinations of coded LFM signals are evaluated.

  11. Incomplete invention of drugs.

    PubMed

    Hisa, Tomoyuki

    2007-02-01

    Scientists seldom know the differences between "rejected invention", "non-invention", "incomplete invention", "invention yet to be completed" and "defective invention". The Japanese Supreme Court appointed me as a specialist member (Article 92-2, Code of Civil Procedure) of intellectual property division for medical and biological patents. Herein, I present scientists to the differences and which of them are patentable. In order to prevent oneself from being taken for granted for the scientists' noblesse oblige by clever business administrations, the scientists must know the borderline between patentable or non-patentable.

  12. PrimeIndel: four-prime-number genetic code for indel decryption and sequence read alignment.

    PubMed

    Lam, Ching-Wan

    2014-09-25

    To decrypt a doubly heterozygous sequence (DHS) in order to define the indel mutation for mutation reporting, an algorithm recursively searching the overlapped nucleotide using an offset of nucleotide positions can decrypt the indel without using a reference sequence. However, as genetic code is letter-based, special computer programs are required to run the decryption algorithm. The previous text-based algorithm was converted to a number-based algorithm by expressing DNA sequence from a 4-letter genetic code to a 4-prime-number genetic code, i.e., converting A, C, G, T to 2, 3, 5, and 7. This algorithm based on prime-number genetic code is called PrimeIndel and is executable by spreadsheet. Using prime number coded DNA sequence, the overlapped nucleotide between any 2 positions of the DHS is represented by the greatest common divisor (GCD) of the multiplication product of 2 prime numbers. This algorithm can also be used for aligning multiple overlapping sequence reads by in-silico DHS formation. The indel size of the in-silico formed DHS indicates the positions in the paired sequences for correct alignment. DHSs were successfully decrypted by the prime number-based algorithm and sequence reads were aligned correctly. DNA sequence expressed in prime numbers can be used for the decryption of DHS and the alignment of sequence reads using a well-known mathematical function GCD of a spreadsheet program. PrimeIndel is a useful tool for mutation reporting in clinical laboratories. The software is downloadable from http://www.patho.hku.hk/staff/list/cwlam.htm. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. The nucleotide sequence of the human int-1 mammary oncogene; evolutionary conservation of coding and non-coding sequences.

    PubMed Central

    van Ooyen, A; Kwee, V; Nusse, R

    1985-01-01

    The mouse mammary tumor virus can induce mammary tumors in mice by proviral activation of an evolutionarily conserved cellular oncogene called int-1. Here we present the nucleotide sequence of the human homologue of int-1, and compare it with the mouse gene. Like the mouse gene, the human homologue contains a reading frame of 370 amino acids, with only four substitutions. The amino acid changes are all in the hydrophobic leader domain of the int-1 encoded protein, and do not significantly alter its hydropathic index. The conservation between the mouse and the human int-1 genes is not restricted to exons; extensive parts of the introns are also homologous. Thus, int-1 ranks among the most conserved genes known, a property shared with other oncogenes. PMID:2998762

  14. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

    PubMed Central

    Pruitt, Kim D.; Harrow, Jennifer; Harte, Rachel A.; Wallin, Craig; Diekhans, Mark; Maglott, Donna R.; Searle, Steve; Farrell, Catherine M.; Loveland, Jane E.; Ruef, Barbara J.; Hart, Elizabeth; Suner, Marie-Marthe; Landrum, Melissa J.; Aken, Bronwen; Ayling, Sarah; Baertsch, Robert; Fernandez-Banet, Julio; Cherry, Joshua L.; Curwen, Val; DiCuccio, Michael; Kellis, Manolis; Lee, Jennifer; Lin, Michael F.; Schuster, Michael; Shkeda, Andrew; Amid, Clara; Brown, Garth; Dukhanina, Oksana; Frankish, Adam; Hart, Jennifer; Maidak, Bonnie L.; Mudge, Jonathan; Murphy, Michael R.; Murphy, Terence; Rajan, Jeena; Rajput, Bhanu; Riddick, Lillian D.; Snow, Catherine; Steward, Charles; Webb, David; Weber, Janet A.; Wilming, Laurens; Wu, Wenyu; Birney, Ewan; Haussler, David; Hubbard, Tim; Ostell, James; Durbin, Richard; Lipman, David

    2009-01-01

    Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions. PMID:19498102

  15. Identification of a conserved sequence in the non-coding regions of many human genes.

    PubMed Central

    Donehower, L A; Slagle, B L; Wilde, M; Darlington, G; Butel, J S

    1989-01-01

    We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome. Images PMID:2536922

  16. A lossless compression method for medical image sequences using JPEG-LS and interframe coding.

    PubMed

    Miaou, Shaou-Gang; Ke, Fu-Sheng; Chen, Shu-Ching

    2009-09-01

    Hospitals and medical centers produce an enormous amount of digital medical images every day, especially in the form of image sequences, which requires considerable storage space. One solution could be the application of lossless compression. Among available methods, JPEG-LS has excellent coding performance. However, it only compresses a single picture with intracoding and does not utilize the interframe correlation among pictures. Therefore, this paper proposes a method that combines the JPEG-LS and an interframe coding with motion vectors to enhance the compression performance of using JPEG-LS alone. Since the interframe correlation between two adjacent images in a medical image sequence is usually not as high as that in a general video image sequence, the interframe coding is activated only when the interframe correlation is high enough. With six capsule endoscope image sequences under test, the proposed method achieves average compression gains of 13.3% and 26.3% over the methods of using JPEG-LS and JPEG2000 alone, respectively. Similarly, for an MRI image sequence, coding gains of 77.5% and 86.5% are correspondingly obtained.

  17. Coding-complete sequencing classifies parrot bornavirus 5 into a novel virus species.

    PubMed

    Marton, Szilvia; Bányai, Krisztián; Gál, János; Ihász, Katalin; Kugler, Renáta; Lengyel, György; Jakab, Ferenc; Bakonyi, Tamás; Farkas, Szilvia L

    2015-11-01

    In this study, we determined the sequence of the coding region of an avian bornavirus detected in a blue-and-yellow macaw (Ara ararauna) with pathological/histopathological changes characteristic of proventricular dilatation disease. The genomic organization of the macaw bornavirus is similar to that of other bornaviruses, and its nucleotide sequence is nearly identical to the available partial parrot bornavirus 5 (PaBV-5) sequences. Phylogenetic analysis showed that these strains formed a monophyletic group distinct from other mammalian and avian bornaviruses and in calculations performed with matrix protein coding sequences, the PaBV-5 and PaBV-6 genotypes formed a common cluster, suggesting that according to the recently accepted classification system for bornaviruses, these two genotypes may belong to a new species, provisionally named Psittaciform 2 bornavirus.

  18. Synthetic neomycin-kanamycin phosphotransferase, type II coding sequence for gene targeting in mammalian cells.

    PubMed

    Jin, Seung-Gi; Mann, Jeffrey R

    2005-07-01

    The bacterial neomycin-kanamycin phosphotransferase, type II enzyme is encoded by the neo gene and confers resistance to aminoglycoside drugs such as neomycin and kanamycin-bacterial selection and G418-eukaryotic cell selection. Although widely used in gene targeting in mouse embryonic stem cells, the neo coding sequence contains numerous cryptic splice sites and has a high CpG content. At least the former can cause unwanted effects in cis at the targeted locus. We describe a synthetic sequence, sneo, which encodes the same protein as that encoded by neo. This synthetic sequence has no predicted splice sites in either strand, low CpG content, and increased mammalian codon usage. In mouse embryonic stem cells sneo expressability is similar to neo. The use of sneo in gene targeting experiments should substantially reduce the probability of unwanted effects in cis due to splicing, and perhaps CpG methylation, within the coding sequence of the selectable marker.

  19. Severe accident source term characteristics for selected Peach Bottom sequences predicted by the MELCOR Code

    SciTech Connect

    Carbajo, J.J.

    1993-09-01

    The purpose of this report is to compare in-containment source terms developed for NUREG-1159, which used the Source Term Code Package (STCP), with those generated by MELCOR to identify significant differences. For this comparison, two short-term depressurized station blackout sequences (with a dry cavity and with a flooded cavity) and a Loss-of-Coolant Accident (LOCA) concurrent with complete loss of the Emergency Core Cooling System (ECCS) were analyzed for the Peach Bottom Atomic Power Station (a BWR-4 with a Mark I containment). The results indicate that for the sequences analyzed, the two codes predict similar total in-containment release fractions for each of the element groups. However, the MELCOR/CORBH Package predicts significantly longer times for vessel failure and reduced energy of the released material for the station blackout sequences (when compared to the STCP results). MELCOR also calculated smaller releases into the environment than STCP for the station blackout sequences.

  20. A distributed coding approach for stereo sequences in the tree structured Haar transform domain

    NASA Astrophysics Data System (ADS)

    Cancellaro, M.; Carli, M.; Neri, A.

    2009-02-01

    In this contribution, a novel method for distributed video coding for stereo sequences is proposed. The system encodes independently the left and right frames of the stereoscopic sequence. The decoder exploits the side information to achieve the best reconstruction of the correlated video streams. In particular, a syndrome coder approach based on a lifted Tree Structured Haar wavelet scheme has been adopted. The experimental results show the effectiveness of the proposed scheme.

  1. OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences

    PubMed Central

    Liu, Guozhen; Uddin, Monica; Islam, Munirul; Goodman, Morris; Grossman, Lawrence I; Romero, Roberto; Wildman, Derek E

    2007-01-01

    Background Rapidly accumulating genome sequence data from multiple species offer powerful opportunities for the detection of DNA sequence evolution. Phylogenetic tree construction and codon-based tests for natural selection are the prevailing tools used to detect functionally important evolutionary change in protein coding sequences. These analyses often require multiple DNA sequence alignments that maintain the correct reading frame for each collection of putative orthologous sequences. Since this feature is not available in most alignment tools, codon reading frames often must be checked manually before evolutionary analyses can commence. Results Here we report an online codon-preserved alignment tool (OCPAT) that generates multiple sequence alignments automatically from the coding sequences of any list of human gene IDs and their putative orthologs from genomes of other vertebrate tetrapods. OCPAT is programmed to extract putative orthologous genes from genomes and to align the orthologs with the reading frame maintained in all species. OCPAT also optimizes the alignment by trimming the most variable alignment regions at the 5' and 3' ends of each gene. The resulting output of alignments is returned in several formats, which facilitates further molecular evolutionary analyses by appropriate available software. Alignments are generally robust and reliable, retaining the correct reading frame. The tool can serve as the first step for comparative genomic analyses of protein-coding gene sequences including phylogenetic tree reconstruction and detection of natural selection. We aligned 20,658 human RefSeq mRNAs using OCPAT. Most alignments are missing sequence(s) from at least one species; however, functional annotation clustering of the ~1700 transcripts that were alignable to all species shows that genes involved in multi-subunit protein complexes are highly conserved. Conclusion The OCPAT program facilitates large-scale evolutionary and phylogenetic analyses of

  2. Widespread position-specific conservation of synonymous rare codons within coding sequences

    PubMed Central

    Steele, Aaron; Carmichael, Rory; Rodriguez, Anabel; Specht, Alicia T.; Ngo, Kim; Emrich, Scott

    2017-01-01

    Synonymous rare codons are considered to be sub-optimal for gene expression because they are translated more slowly than common codons. Yet surprisingly, many protein coding sequences include large clusters of synonymous rare codons. Rare codons at the 5’ terminus of coding sequences have been shown to increase translational efficiency. Although a general functional role for synonymous rare codons farther within coding sequences has not yet been established, several recent reports have identified rare-to-common synonymous codon substitutions that impair folding of the encoded protein. Here we test the hypothesis that although the usage frequencies of synonymous codons change from organism to organism, codon rarity will be conserved at specific positions in a set of homologous coding sequences, for example to tune translation rate without altering a protein sequence. Such conservation of rarity–rather than specific codon identity–could coordinate co-translational folding of the encoded protein. We demonstrate that many rare codon cluster positions are indeed conserved within homologous coding sequences across diverse eukaryotic, bacterial, and archaeal species, suggesting they result from positive selection and have a functional role. Most conserved rare codon clusters occur within rather than between conserved protein domains, challenging the view that their primary function is to facilitate co-translational folding after synthesis of an autonomous structural unit. Instead, many conserved rare codon clusters separate smaller protein structural motifs within structural domains. These smaller motifs typically fold faster than an entire domain, on a time scale more consistent with translation rate modulation by synonymous codon usage. While proteins with conserved rare codon clusters are structurally and functionally diverse, they are enriched in functions associated with organism growth and development, suggesting an important role for synonymous codon usage in

  3. Identification of a conserved sequence in the non-coding regions of many human genes

    SciTech Connect

    Donehower, L.A.; Slagle, B.L.; Wilde, M.; Darlington, G.; Butel, J.S. )

    1989-01-25

    The authors have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.

  4. Complete coding sequences of European brown hare syndrome virus (EBHSV) strains isolated in 1982 in Sweden.

    PubMed

    Lopes, Ana M; Gavier-Widén, Dolores; Le Gall-Reculé, Ghislaine; Esteves, Pedro J; Abrantes, Joana

    2013-10-01

    European brown hare syndrome (EBHS) is characterised by high mortality of European brown hares (Lepus europaeus) and mountain hares (Lepus timidus). European brown hare syndrome virus (EBHSV) and the closely related rabbit haemorrhagic disease virus (RHDV) comprise the genus Lagovirus, family Caliciviridae. In contrast to RHDV, which is well studied, with more than 30 complete genome sequences available, the only complete genome sequence available for EBHSV was obtained from a strain isolated in 1989 in France. EBHS was originally diagnosed in Sweden in 1980. Here, we report the complete coding sequences of two EBHSV strains isolated from European brown hares that died with liver lesions characteristic of EBHS in Sweden in 1982. These sequences represent the oldest complete coding sequences of EBHSV isolated from the original area of virus diagnosis. The genomic organisation is similar to that of the published French sequence. Comparison with this sequence revealed several nucleotide substitutions, corresponding to 6 % divergence. At the amino acid level, the Swedish strains are 2 % different from the French strain. Most amino acid substitutions were located within the major capsid protein VP60, but when considering the amino acid sequence length of each protein, VP10 is the protein with the highest percentage of amino acid differences. The same result was obtained when Swedish strains were compared. This evolutionary pattern has not been described previously for members of the genus Lagovirus.

  5. Coding patient emotional cues and concerns in medical consultations: the Verona coding definitions of emotional sequences (VR-CoDES).

    PubMed

    Zimmermann, Christa; Del Piccolo, Lidia; Bensing, Jozien; Bergvik, Svein; De Haes, Hanneke; Eide, Hilde; Fletcher, Ian; Goss, Claudia; Heaven, Cathy; Humphris, Gerry; Kim, Young-Mi; Langewitz, Wolf; Meeuwesen, Ludwien; Nuebling, Matthias; Rimondini, Michela; Salmon, Peter; van Dulmen, Sandra; Wissow, Larry; Zandbelt, Linda; Finset, Arnstein

    2011-02-01

    To present the Verona Coding Definitions of Emotional Sequences (VR-CoDES CC), a consensus based system for coding patient expressions of emotional distress in medical consultations, defined as Cues or Concerns. The system was developed by an international group of communication researchers. First, consensus was reached in different steps. Second, a reliability study was conducted on 20 psychiatric consultations. A Cue is defined as a verbal or non-verbal hint which suggests an underlying unpleasant emotion that lacks clarity. A Concern is defined as a clear and unambiguous expression of an unpleasant current or recent emotion that is explicitly verbalized with or without a stated issue of importance. The conceptual framework sets precise criteria for cues and concerns and for whom (health provider or patient) elicits the cue/concern. Inter-rater reliability proved satisfactory (agreement 81.5%, Cohen's Kappa 0.70). The VR-CoDES CC will facilitate comparative research on provider-patient communication sequences in which patients express emotional distress. The VR-CoDES CC may be used to help clinicians in recognizing or facilitating cues and concerns, thereby improving the recognition of patients' emotional distress, the therapeutic alliance and quality of care for these patients. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  6. Genetic characterization of three novel chicken parvovirus strains based on analysis of their coding sequences.

    PubMed

    Koo, Bon-Sang; Lee, Hae-Rim; Jeon, Eun-Ok; Han, Moo-Sung; Min, Kyeong-Cheol; Lee, Seung-Baek; Bae, Yeon-Ji; Cho, Sun-Hyung; Mo, Jong-Suk; Kwon, Hyuk Moo; Sung, Haan Woo; Kim, Jong-Nyeo; Mo, In-Pil

    2015-01-01

    Chicken parvovirus (ChPV) is one of the causative agents of viral enteritis. Recently, the genome of the ABU-P1 strain of ChPV was fully sequenced and determined to have a distinct genomic composition compared with that of vertebrate parvoviruses. However, no comparative sequence analysis of coding regions of ChPVs was possible because of the lack of other sequence information. In this study, we obtained the nucleotide sequences of all genomic coding regions of three ChPVs by polymerase chain reaction using 13 primer sets, and deduced the amino acid sequences from the nucleotide sequences. The non-structural protein 1 (NS1) gene of the three ChPVs showed 95.0 to 95.5% nucleotide sequence identity and 96.5 to 98.1% amino acid sequence identity to those of NS1 from the ABU-P1 strain, respectively, and even higher nucleotide and amino acid similarities to one another. The viral proteins (VP) gene was more divergent between the three ChPV Korean strains and ABU-P1, with 88.1 to 88.3% nucleotide identity and 93.0% amino acid identity. Analysis of the putative tertiary structure of the ChPV VP2 protein showed that variable regions with less than 80% nucleotide similarity between the three Korean strains and ABU-P1 occurred in large loops of the VP2 protein believed to be involved in antigenicity, pathogenicity, and tissue tropism in other parvoviruses. Based on our analysis of full-length coding sequences, we discovered greater variation in ChPV strains than reported previously, especially in partial regions of the VP2 protein.

  7. Complete Coding Genome Sequence of a Putative Novel Teschovirus Serotype 12 Strain

    PubMed Central

    Jiménez-Clavero, M. A.

    2016-01-01

    Porcine teschoviruses are ubiquitous and prevalent viruses generally harmless to their hosts, the suids. Here, we report the first complete coding genome sequence of a putative new serotype of porcine teschovirus (PTV-12), strain CC25, isolated from fecal material from a healthy pig in Spain. PMID:26966207

  8. Complete Coding Sequences of Six Toscana Virus Strains Isolated from Human Patients in France

    PubMed Central

    Leparc-Goffart, Isabelle; Piorkowski, Geraldine; Coutard, Bruno; Papageorgiou, Nicolas; De Lamballerie, Xavier

    2016-01-01

    Toscana virus (TOSV) is an arthropod-borne phlebovirus belonging to the Sandfly fever Naples virus species (genus Phlebovirus, family Bunyaviridae). Here, we report the complete coding sequences of six TOSV strains isolated from human patients having acquired the infection in southeastern France during a 12-year period. PMID:27231377

  9. POLYMORPHISM IN THE CODING REGION SEQUENCE OF GDF8 GENE IN INDIAN SHEEP.

    PubMed

    Pothuraju, M; Mishra, S K; Kumar, S N; Mohamed, N F; Kataria, R S; Yadav, D K; Arora, R

    2015-11-01

    The present study was undertaken to identify polymorphism in the coding sequence of GDF8gene across indigenous meat type sheep breeds. A 1647 bp sequence was generated, encompassing 208 bp of the 5'UTR, 1128 bp of coding region (exon1, 2 and 3) as well as 311 bp of 3'UTR. The sheep and goat GDF8 gene sequences were observed to be highly conserved as compared to cattle, buffalo, horse and pig. Several nucleotide variations were observed across coding sequence of GDF8 gene in Indian sheep. Three polymorphic sites were identified in the 5'UTR, one in exon 1 and one in the exon 2 regions. Both SNPs in the exonic region were found to be non-synonymous. The mutations c.539T > G and c.821T > A discovered in this study in the exon 1 and exon 2, respectively, have not been previously reported. The information generated provides preliminary indication of the functional diversity present in Indian sheep at the coding region of GDF8gene. The novel as well as the previously reported SNPs discovered in the Indian sheep warrant further analysis to see whether they affect the phenotype. Future studies will need to establish the affect of reported SNPs in the expression of the GDF8 gene in Indian sheep population.

  10. First Complete Coding Sequence of a Spanish Isolate of Swine Vesicular Disease Virus

    PubMed Central

    Vázquez-Calvo, Ángela; Saiz, Juan-Carlos; Martín-Acebes, Miguel A.

    2016-01-01

    Swine vesicular disease virus (SVDV) is a porcine pathogen and a member of the Enterovirus genus within the Picornaviridae family. The SVDV genome is composed of a single-stranded RNA molecule of positive polarity. Here, we report the first complete sequence of the coding region of a Spanish SVDV isolate (SPA/1/'93). PMID:26941157

  11. Complete Coding Sequence of Zika Virus from a French Polynesia Outbreak in 2013

    PubMed Central

    Piorkowski, Géraldine; Charrel, Rémi N.; Boubis, Laetitia; Leparc-Goffart, Isabelle; de Lamballerie, Xavier

    2014-01-01

    Zika virus is an arthropod-borne Flavivirus member of the Spondweni serocomplex, transmitted by Aedes mosquitoes. We report here the complete coding sequence of a Zika virus strain belonging to the Asian lineage, isolated from an infected patient returning from French Polynesia, an epidemic area in 2013/2014. PMID:24903869

  12. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

    PubMed

    Siddharthan, Rahul

    2006-03-16

    Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  13. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes

    PubMed Central

    Lin, Michael F.; Kheradpour, Pouya; Washietl, Stefan; Parker, Brian J.; Pedersen, Jakob S.; Kellis, Manolis

    2011-01-01

    The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes—especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape. PMID:21994248

  14. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  15. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  16. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  17. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  18. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  19. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  20. Translational resistivity/conductivity of coding sequences during exponential growth of Escherichia coli.

    PubMed

    Takai, Kazuyuki

    2017-01-21

    Codon adaptation index (CAI) has been widely used for prediction of expression of recombinant genes in Escherichia coli and other organisms. However, CAI has no mechanistic basis that rationalizes its application to estimation of translational efficiency. Here, I propose a model based on which we could consider how codon usage is related to the level of expression during exponential growth of bacteria. In this model, translation of a gene is considered as an analog of electric current, and an analog of electric resistance corresponding to each gene is considered. "Translational resistance" is dependent on the steady-state concentration and the sequence of the mRNA species, and "translational resistivity" is dependent only on the mRNA sequence. The latter is the sum of two parts: one is the resistivity for the elongation reaction (coding sequence resistivity), and the other comes from all of the other steps of the decoding reaction. This electric circuit model clearly shows that some conditions should be met for codon composition of a coding sequence to correlate well with its expression level. On the other hand, I calculated relative frequency of each of the 61 sense codon triplets translated during exponential growth of E. coli from a proteomic dataset covering over 2600 proteins. A tentative method for estimating relative coding sequence resistivity based on the data is presented.

  1. Key for protein coding sequences identification: computer analysis of codon strategy.

    PubMed Central

    Rodier, F; Gabarro-Arpa, J; Ehrlich, R; Reiss, C

    1982-01-01

    The signal qualifying an AUG or GUG as an initiator in mRNAs processed by E. coli ribosomes is not found to be a systematic, literal homology sequence. In contrast, stability analysis reveals that initiators always occur within nucleic acid domains of low stability, for which a high A/U content is observed. Since no aminoacid selection pressure can be detected at N-termini of the proteins, the A/U enrichment results from a biased usage of the code degeneracy. A computer analysis is presented which allows easy detection of the codon strategy. N-terminal codons carry rather systematically A or U in third position, which suggests a mechanism for translation initiation and helps to detect protein coding sequences in sequenced DNA. PMID:7038623

  2. Statistical analysis of nucleotide runs in coding and noncoding DNA sequences.

    PubMed

    Sprizhitsky YuA; Nechipurenko YuD; Alexandrov, A A; Volkenstein, M V

    1988-10-01

    A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.

  3. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are

  4. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    SciTech Connect

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth; Arkin, Adam P.; Deutschbauer, Adam

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes

  5. Structural annotation of equine protein-coding genes determined by mRNA sequencing.

    PubMed

    Coleman, S J; Zeng, Z; Wang, K; Luo, S; Khrebtukova, I; Mienaltowski, M J; Schroth, G P; Liu, J; MacLeod, J N

    2010-12-01

    The horse, like the majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for the majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758105 sequence tags of 35 bases each, equalling 10.28 gbp of total sequence data. The tag alignments represent approximately 207 × coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to refine the structural annotation for 11,356 of these predicted genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq-derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.

  6. The complete mitochondrial genome sequence of the hydrothermal vent galatheid crab Shinkaia crosnieri (Crustacea: Decapoda: Anomura): A novel arrangement and incomplete tRNA suite

    PubMed Central

    Yang, Jin-Shu; Yang, Wei-Jun

    2008-01-01

    Background Metazoan mitochondrial genomes usually consist of the same 37 genes. Such genes contain useful information for phylogenetic analyses and evolution modelling. Although complete mitochondrial genomes have been determined for over 1,000 animals to date, hydrothermal vent species have, thus far, remained excluded due to the scarcity of collected specimens. Results The mitochondrial genome of the hydrothermal vent galatheid crab Shinkaia crosnieri is 15,182 bp in length, and is composed of 13 protein-coding genes, two ribosomal RNA genes and only 18 transfer RNA genes. The total AT content of the genome, as is typical for decapods, is 72.9%. We identified a non-coding control region of 327 bp according to its location and AT-richness. This is the smallest control region discovered in crustaceans so far. A mechanism of cytoplasmic tRNA import was addressed to compensate for the four missing tRNAs. The S. crosnieri mitogenome exhibits a novel arrangement of mitochondrial genes. We investigated the mitochondrial gene orders and found that at least six rearrangements from the ancestral pancrustacean (crustacean + hexapod) pattern have happened successively. The codon usage, nucleotide composition and bias show no substantial difference with other decapods. Phylogenetic analyses using the concatenated nucleotide and amino acid sequences of the 13 protein-coding genes prove consistent with the previous classification based upon their morphology. Conclusion The present study will supply considerable data of use for both genomic and evolutionary research on hydrothermal vent ecosystems. The mitochondrial genetic characteristics of decapods are sustained in this case of S. crosnieri despite the absence of several tRNAs and a number of dramatic rearrangements. Our results may provide evidence for the immigrating hypothesis about how vent species originate. PMID:18510775

  7. The complete mitochondrial genome sequence of the hydrothermal vent galatheid crab Shinkaia crosnieri (Crustacea: Decapoda: Anomura): a novel arrangement and incomplete tRNA suite.

    PubMed

    Yang, Jin-Shu; Nagasawa, Hiromichi; Fujiwara, Yoshihiro; Tsuchida, Shinji; Yang, Wei-Jun

    2008-05-30

    Metazoan mitochondrial genomes usually consist of the same 37 genes. Such genes contain useful information for phylogenetic analyses and evolution modelling. Although complete mitochondrial genomes have been determined for over 1,000 animals to date, hydrothermal vent species have, thus far, remained excluded due to the scarcity of collected specimens. The mitochondrial genome of the hydrothermal vent galatheid crab Shinkaia crosnieri is 15,182 bp in length, and is composed of 13 protein-coding genes, two ribosomal RNA genes and only 18 transfer RNA genes. The total AT content of the genome, as is typical for decapods, is 72.9%. We identified a non-coding control region of 327 bp according to its location and AT-richness. This is the smallest control region discovered in crustaceans so far. A mechanism of cytoplasmic tRNA import was addressed to compensate for the four missing tRNAs. The S. crosnieri mitogenome exhibits a novel arrangement of mitochondrial genes. We investigated the mitochondrial gene orders and found that at least six rearrangements from the ancestral pancrustacean (crustacean + hexapod) pattern have happened successively. The codon usage, nucleotide composition and bias show no substantial difference with other decapods. Phylogenetic analyses using the concatenated nucleotide and amino acid sequences of the 13 protein-coding genes prove consistent with the previous classification based upon their morphology. The present study will supply considerable data of use for both genomic and evolutionary research on hydrothermal vent ecosystems. The mitochondrial genetic characteristics of decapods are sustained in this case of S. crosnieri despite the absence of several tRNAs and a number of dramatic rearrangements. Our results may provide evidence for the immigrating hypothesis about how vent species originate.

  8. Large-scale coding sequence change underlies the evolution of postdevelopmental novelty in honey bees.

    PubMed

    Jasper, William Cameron; Linksvayer, Timothy A; Atallah, Joel; Friedman, Daniel; Chiu, Joanna C; Johnson, Brian R

    2015-02-01

    Whether coding or regulatory sequence change is more important to the evolution of phenotypic novelty is one of biology's major unresolved questions. The field of evo-devo has shown that in early development changes to regulatory regions are the dominant mode of genetic change, but whether this extends to the evolution of novel phenotypes in the adult organism is unclear. Here, we conduct ten RNA-Seq experiments across both novel and conserved tissues in the honey bee to determine to what extent postdevelopmental novelty is based on changes to the coding regions of genes. We make several discoveries. First, we show that with respect to novel physiological functions in the adult animal, positively selected tissue-specific genes of high expression underlie novelty by conferring specialized cellular functions. Such genes are often, but not always taxonomically restricted genes (TRGs). We further show that positively selected genes, whether TRGs or conserved genes, are the least connected genes within gene expression networks. Overall, this work suggests that the evo-devo paradigm is limited, and that the evolution of novelty, postdevelopment, follows additional rules. Specifically, evo-devo stresses that high network connectedness (repeated use of the same gene in many contexts) constrains coding sequence change as it would lead to negative pleiotropic effects. Here, we show that in the adult animal, the converse is true: Genes with low network connectedness (TRGs and tissue-specific conserved genes) underlie novel phenotypes by rapidly changing coding sequence to perform new-specialized functions.

  9. Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences

    PubMed Central

    Ivanov, Ivaylo P.; Firth, Andrew E.; Michel, Audrey M.; Atkins, John F.; Baranov, Pavel V.

    2011-01-01

    In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5′ cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized—both for increased coding capacity and potentially also for novel regulatory mechanisms—remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5′ untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data. PMID:21266472

  10. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for

  11. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape

    PubMed Central

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-01-01

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates’ conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water–land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods’ enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. PMID:26253316

  12. Error probability bounds for trellis coded modulation over sequence dependent channels

    NASA Astrophysics Data System (ADS)

    Oka, Ikuo; Biglieri, Ezio

    1989-04-01

    A technique for obtaining an upper bound to the error event probability of trellis-coded modulation in sequence-dependent channels is derived. The technique is based on the transfer function of a state diagram which has N + 1 nodes and whose branch labels are N x N error matrices. Some methods for simplifying the computation of bit error probability at the price of a looser bound are proposed. Numerical results show the applicability of the techniques presented here to trellis-coded 16-QAM with two-symbol intersymbol interference.

  13. Golay sequences coded coherent optical OFDM for long-haul transmission

    NASA Astrophysics Data System (ADS)

    Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

    2017-09-01

    We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.

  14. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation.

    PubMed

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-30

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy-combining sequential and modular concepts-enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  15. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation

    NASA Astrophysics Data System (ADS)

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-11-01

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy--combining sequential and modular concepts--enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain.

  16. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

    PubMed Central

    Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

    2010-01-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640

  17. SinEx DB: a database for single exon coding sequences in mammalian genomes.

    PubMed

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.

  18. SinEx DB: a database for single exon coding sequences in mammalian genomes

    PubMed Central

    Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.

    2016-01-01

    Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816

  19. Coupled enhancer and coding sequence evolution of a homeobox gene shaped leaf diversity

    PubMed Central

    Vuolo, Francesco; Mentink, Remco A.; Hajheidari, Mohsen; Bailey, C. Donovan; Filatov, Dmitry A.; Tsiantis, Miltos

    2016-01-01

    Here we investigate mechanisms underlying the diversification of biological forms using crucifer leaf shape as an example. We show that evolution of an enhancer element in the homeobox gene REDUCED COMPLEXITY (RCO) altered leaf shape by changing gene expression from the distal leaf blade to its base. A single amino acid substitution evolved together with this regulatory change, which reduced RCO protein stability, preventing pleiotropic effects caused by its altered gene expression. We detected hallmarks of positive selection in these evolved regulatory and coding sequence variants and showed that modulating RCO activity can improve plant physiological performance. Therefore, interplay between enhancer and coding sequence evolution created a potentially adaptive path for morphological evolution. PMID:27852629

  20. Incorporation of the influenza A virus NA segment into virions does not require cognate non-coding sequences

    PubMed Central

    Crescenzo-Chaigne, Bernadette; Barbezange, Cyril V. S.; Léandri, Stéphane; Roquin, Camille; Berthault, Camille; van der Werf, Sylvie

    2017-01-01

    For each influenza virus genome segment, the coding sequence is flanked by non-coding (NC) regions comprising shared, conserved sequences and specific, non-conserved sequences. The latter and adjacent parts of the coding sequence are involved in genome packaging, but the precise role of the non-conserved NC sequences is still unclear. The aim of this study is to better understand the role of the non-conserved non-coding sequences in the incorporation of the viral segments into virions. The NA-segment NC sequences were systematically replaced by those of the seven other segments. Recombinant viruses harbouring two segments with identical NC sequences were successfully rescued. Virus growth kinetics and serial passages were performed, and incorporation of the viral segments was tested by real-time RT-PCR. An initial virus growth deficiency correlated to a specific defect in NA segment incorporation. Upon serial passages, growth properties were restored. Sequencing revealed that the replacing 5′NC sequence length drove the type of mutations obtained. With sequences longer than the original, point mutations in the coding region with or without substitutions in the 3′NC region were detected. With shorter sequences, insertions were observed in the 5′NC region. Restoration of viral fitness was linked to restoration of the NA segment incorporation. PMID:28240311

  1. First Complete Coding Sequence of a Spanish Isolate of Swine Vesicular Disease Virus.

    PubMed

    Vázquez-Calvo, Ángela; Saiz, Juan-Carlos; Sobrino, Francisco; Martín-Acebes, Miguel A

    2016-03-03

    Swine vesicular disease virus (SVDV) is a porcine pathogen and a member of the Enterovirus genus within the Picornaviridae family. The SVDV genome is composed of a single-stranded RNA molecule of positive polarity. Here, we report the first complete sequence of the coding region of a Spanish SVDV isolate (SPA/1/'93). Copyright © 2016 Vázquez-Calvo et al.

  2. Mutation analysis of the coding sequence of the MECP2 gene in infantile autism.

    PubMed

    Beyer, Kim S; Blasi, Francesca; Bacchelli, Elena; Klauck, Sabine M; Maestrini, Elena; Poustka, Annemarie

    2002-10-01

    Mutations in the coding region of the methyl-CpG-binding protein 2 ( MECP2) gene cause Rett syndrome and have also been reported in a number of X-linked mental retardation syndromes. Furthermore, such mutations have recently been described in a few autistic patients. In this study, a large sample of individuals with autism was screened in order to elucidate systematically whether specific mutations in MECP2 play a role in autism. The mutation analysis of the coding sequence of the gene was performed by denaturing high-pressure liquid chromatography and direct sequencing. Taken together, 14 sequence variants were identified in 152 autistic patients from 134 German families and 50 unrelated patients from the International Molecular Genetic Study of Autism Consortium affected relative-pair sample. Eleven of these variants were excluded for having an aetiological role as they were either silent mutations, did not cosegregate with autism in the pedigrees of the patients or represented known polymorphisms. The relevance of the three remaining mutations towards the aetiology of autism could not be ruled out, although they were not localised within functional domains of MeCP2 and may be rare polymorphisms. Taking into account the large size of our sample, we conclude that mutations in the coding region of MECP2 do not play a major role in autism susceptibility. Therefore, infantile autism and Rett syndrome probably represent two distinct entities at the molecular genetic level.

  3. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  4. The Cipher Code of Simple Sequence Repeats in “Vampire Pathogens”

    PubMed Central

    Zou, Geng; Bello-Orti, Bernardo; Aragon, Virginia; Tucker, Alexander W.; Luo, Rui; Ren, Pinxing; Bi, Dingren; Zhou, Rui; Jin, Hui

    2015-01-01

    Blood inside mammals is a forbidden area for the majority of prokaryotic microbes; however, red blood cells tropism microbes, like “vampire pathogens” (VP), succeed in matching scarce nutrients and surviving strong immunity reactions. Here, we found VP of Mycoplasma, Rhizobiales, and Rickettsiales showed significantly higher counts of (AG)n dimeric simple sequence repeats (Di-SSRs) in the genomes, coding and non-coding regions than non Vampire Pathogens (N_VP). Regression analysis indicated a significant correlation between GC content and the span of (AG)n-Di-SSR variation. Gene Ontology (GO) terms with abundance of (AG)3-Di-SSRs shared by the VP strains were associated with purine nucleotide metabolism (FDR < 0.01), indicating an adaptation to the limited availability of purine and nucleotide precursors in blood. Di-amino acids coded by (AG)n-Di-SSRs included all three six-fold code amino acids (Arg, Leu and Ser) and significantly higher counts of Di-amino acids coded by (AG)3, (GA)3, and (TC)3 in VP than N_VP. Furthermore, significant differences (P < 0.001) on the numbers of triplexes formed from (AG)n-Di-SSRs between VP and N_VP in Mycoplasma suggested the potential role of (AG)n-Di-SSRs in gene regulation. PMID:26215592

  5. A direct sequence spread spectrum code acquisition circuit for wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Ghaisari, Jafar; Ferdosi, Arash

    2011-06-01

    Narrow band (NB), spread spectrum (SS), and ultra wide band (UWB) are three physical layer bandwidth types used in wireless sensor networks (WSN). SS and UWB technologies have many advantages over NB, which make them preferable for WSN. Synchronisation of different nodes in a WSN is an important task that is necessary to improve cooperation and lifetime of nodes. Code acquisition is the main step of a node's time synchronisation. In this article, a pseudo noise code generator and a code acquisition circuit are proposed, designed and tested using direct sequence SS technique. To investigate the properties of the designed circuits, simulations are carried out via Xilinx Foundation Series software in the real mode. The results demonstrate excellent performance of the proposed algorithms and circuits in all realistic conditions. The code acquisition circuit proposed an adaptive testing window for single dwell serial search method. The code acquisition circuit is a clock phase free approach, thus the clock coherency step is cancelled. Moreover, clock phase difference between transmitter and receiver nodes does not mostly affect the acquisition and thus synchronisation time.

  6. A novel DNA sequence similarity calculation based on simplified pulse-coupled neural network and Huffman coding

    NASA Astrophysics Data System (ADS)

    Jin, Xin; Nie, Rencan; Zhou, Dongming; Yao, Shaowen; Chen, Yanyan; Yu, Jiefu; Wang, Quan

    2016-11-01

    A novel method for the calculation of DNA sequence similarity is proposed based on simplified pulse-coupled neural network (S-PCNN) and Huffman coding. In this study, we propose a coding method based on Huffman coding, where the triplet code was used as a code bit to transform DNA sequence into numerical sequence. The proposed method uses the firing characters of S-PCNN neurons in DNA sequence to extract features. Besides, the proposed method can deal with different lengths of DNA sequences. First, according to the characteristics of S-PCNN and the DNA primary sequence, the latter is encoded using Huffman coding method, and then using the former, the oscillation time sequence (OTS) of the encoded DNA sequence is extracted. Simultaneously, relevant features are obtained, and finally the similarities or dissimilarities of the DNA sequences are determined by Euclidean distance. In order to verify the accuracy of this method, different data sets were used for testing. The experimental results show that the proposed method is effective.

  7. Nucleosomal signatures impose nucleosome positioning in coding and noncoding sequences in the genome

    PubMed Central

    González, Sara; García, Alicia; Vázquez, Enrique; Serrano, Rebeca; Sánchez, Mar; Quintales, Luis; Antequera, Francisco

    2016-01-01

    In the yeast genome, a large proportion of nucleosomes occupy well-defined and stable positions. While the contribution of chromatin remodelers and DNA binding proteins to maintain this organization is well established, the relevance of the DNA sequence to nucleosome positioning in the genome remains controversial. Through quantitative analysis of nucleosome positioning, we show that sequence changes distort the nucleosomal pattern at the level of individual nucleosomes in three species of Schizosaccharomyces and in Saccharomyces cerevisiae. This effect is equally detected in transcribed and nontranscribed regions, suggesting the existence of sequence elements that contribute to positioning. To identify such elements, we incorporated information from nucleosomal signatures into artificial synthetic DNA molecules and found that they generated regular nucleosomal arrays indistinguishable from those of endogenous sequences. Strikingly, this information is species-specific and can be combined with coding information through the use of synonymous codons such that genes from one species can be engineered to adopt the nucleosomal organization of another. These findings open the possibility of designing coding and noncoding DNA molecules capable of directing their own nucleosomal organization. PMID:27662899

  8. The signal sequence coding region promotes nuclear export of mRNA.

    PubMed

    Palazzo, Alexander F; Springer, Michael; Shibata, Yoko; Lee, Chung-Sheng; Dias, Anusha P; Rapoport, Tom A

    2007-12-01

    In eukaryotic cells, most mRNAs are exported from the nucleus by the transcription export (TREX) complex, which is loaded onto mRNAs after their splicing and capping. We have studied in mammalian cells the nuclear export of mRNAs that code for secretory proteins, which are targeted to the endoplasmic reticulum membrane by hydrophobic signal sequences. The mRNAs were injected into the nucleus or synthesized from injected or transfected DNA, and their export was followed by fluorescent in situ hybridization. We made the surprising observation that the signal sequence coding region (SSCR) can serve as a nuclear export signal of an mRNA that lacks an intron or functional cap. Even the export of an intron-containing natural mRNA was enhanced by its SSCR. Like conventional export, the SSCR-dependent pathway required the factor TAP, but depletion of the TREX components had only moderate effects. The SSCR export signal appears to be characterized in vertebrates by a low content of adenines, as demonstrated by genome-wide sequence analysis and by the inhibitory effect of silent adenine mutations in SSCRs. The discovery of an SSCR-mediated pathway explains the previously noted amino acid bias in signal sequences and suggests a link between nuclear export and membrane targeting of mRNAs.

  9. Sequence and developmental expression of mRNA coding for a gap junction protein in Xenopus

    PubMed Central

    1988-01-01

    Cloned complementary DNAs representing the complete coding sequence for an embryonic gap junction protein in the frog Xenopus laevis have been isolated and sequenced. The cDNAs hybridize with an RNA of 1.5 kb that is first detected in gastrulating embryos and accumulates throughout gastrulation and neurulation. By the tailbud stage, the highest abundance of the transcript is found in the region containing ventroposterior endoderm and the rudiment of the liver. In the adult, transcripts are present in the lungs, alimentary tract organs, and kidneys, but are not detected in the brain, heart, body wall and skeletal muscles, spleen, or ovary. The gene encoding this embryonic gap junction protein is present in only one or a few copies in the frog genome. In vitro translation of RNA synthesized from the cDNA template produces a 30-kD protein, as predicted by the coding sequence. This product has extensive sequence similarity to mammalian gap junction proteins in its putative transmembrane and extracellular domains, but has diverged substantially in two of its intracellular domains. PMID:2843548

  10. Episodic sequence memory is supported by a theta-gamma phase code

    PubMed Central

    Heusser, Andrew C.; Poeppel, David; Ezzyat, Youssef; Davachi, Lila

    2016-01-01

    The meaning we derive from our experiences is not a simple static extraction of the elements, but is largely based on the order in which those elements occur. Models propose that sequence encoding is supported by interactions between high and low frequency oscillations, such that elements within an experience are represented by neural cell assemblies firing at higher frequencies (i.e. gamma) and sequential order is coded by the specific timing of firing with respect to a lower frequency oscillation (i.e. theta). During episodic sequence memory formation in humans, we provide evidence that items in different sequence positions exhibit relatively greater gamma power along distinct phases of a theta oscillation. Furthermore, this segregation is related to successful temporal order memory. These results provide compelling evidence that memory for order, a core component of an episodic memory, capitalizes on the ubiquitous physiological mechanism of theta-gamma phase-amplitude coupling. PMID:27571010

  11. Short pulse acquisition by low sampling rate with phase-coded sequence in lidar system

    NASA Astrophysics Data System (ADS)

    Wu, Long; Xu, Jiajia; Lv, Wentao; Yang, Xiaocheng

    2016-11-01

    The requirement of high range resolution results in impractical collection of every returned laser pulse due to the limited response speed of imaging detectors. This paper proposes a phase coded sequence acquisition method for signal preprocessing. The system employs an m-sequence with N bits for demonstration with the detector controlled to accumulate N+1 bits of the echo signals to deduce one single returned laser pulse. An indoor experiment achieved 2 μs resolution with the sampling period of 28 μs by employing a 15-bit m-sequence. This method shows the potential to improve the detection capabilities of narrow laser pulses with the detectors at a low frame rate, especially for the imaging lidar systems. Meanwhile, the lidar system is able to improve the range resolution with available detectors of restricted performance.

  12. Mutations analysis of C1 inhibitor coding sequence gene among Portuguese patients with hereditary angioedema.

    PubMed

    Martinho, A; Mendes, J; Simões, O; Nunes, R; Gomes, J; Dias Castro, E; Leiria-Pinto, P; Ferreira, M B; Pereira, C; Castel-Branco, M G; Pais, L

    2013-04-01

    Mutations that modify the amino acid sequence of C1-INH (except Val458Met) are associated with HAE. More than 200 different mutations scattering the entire C1-INH gene have been reported. The main objective of this study was to report the mutational findings in a HAE cohort of 138 Portuguese patients followed in specialized consultation all over the country. DNA was extracted from peripheral blood with QiaSymphony BioRobot (QIAGEN Portugal). The sequence reactions were performed by using a DNA sequencing kit (Big Dye terminator cycle sequencing v1.1/v3.1 from Applied Biosystems) and sequencing products were immediately submitted to direct sequencing on an Applied Biosystem 3130 DNA Analyser. DNA sequences were analyzed at four different stages. Raw data and sequence alignments of all 8 exons and intron-exon boundaries were performed for each patient individually with SeqScape software and using SERPING1 gene NG_009625 of 24,300 bp (12-March-2011) as reference sequence. Sequence comparisons among patients and controls were performed with software CodonCode Aligner v.3.7 from CodonCode Corp and with Geneious 4.5 from Biomatters Lda. A total of 94 point mutations were observed among patients, and 67% of them were located on exon 8. In addition, we noticed one not described stop codon at position c.1459 C>T in three different patients. Translation termination was also found on exon 3 and 7, as a result of mutations at positions c.481A>7, c.1174C>T. In this population, the prevalence of the missense mutation p.Arg444Cys was 39 out of 42. Mutational analysis revealed 22 different pathogenic mutations, of which 64% were not described on HAE database. Although identification of disease causing mutations is not necessary to establish HAE diagnosis, studies on gene expression and characterization of rearrangements in SERPING1 gene are suggested in order to get new insights on function and genetic tests of C1 inhibitor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. The sequence coding and search system: An approach for constructing and analyzing event sequences at commercial nuclear power plants

    SciTech Connect

    Mays, G.T.

    1989-04-01

    The US Nuclear Regulatory Commission (NRC) has recognized the importance of the collection, assessment, and feedstock of operating experience data from commercial nuclear power plants and has centralized these activities in the Office for Analysis and Evaluation of Operational Data (AEOD). Such data is essential for performing safety and reliability analyses, especially analyses of trends and patterns to identify undesirable changes in plant performance at the earliest opportunity to implement corrective measures to preclude the occurrences of a more serious event. One of NRC's principal tools for collecting and evaluating operating experience data is the Sequence Coding and Search System (SCSS). The SCSS consists of a methodology for structuring event sequences and the requisite computer system to store and search the data. The source information for SCSS is the Licensee Event Report (LER), which is a legally required document. This paper describes the objective SCSS, the information it contains, and the format and approach for constructuring SCSS event sequences. Examples are presented demonstrating the use SCSS to support the analysis of LER data. The SCSS contains over 30,000 LERs describing events from 1980 through the present. Insights gained from working with a complex data system from the initial developmental stage to the point of a mature operating system are highlighted.

  14. Independent theta phase coding accounts for CA1 population sequences and enables flexible remapping

    PubMed Central

    Chadwick, Angus; van Rossum, Mark CW; Nolan, Matthew F

    2015-01-01

    Hippocampal place cells encode an animal's past, current, and future location through sequences of action potentials generated within each cycle of the network theta rhythm. These sequential representations have been suggested to result from temporally coordinated synaptic interactions within and between cell assemblies. Instead, we find through simulations and analysis of experimental data that rate and phase coding in independent neurons is sufficient to explain the organization of CA1 population activity during theta states. We show that CA1 population activity can be described as an evolving traveling wave that exhibits phase coding, rate coding, spike sequences and that generates an emergent population theta rhythm. We identify measures of global remapping and intracellular theta dynamics as critical for distinguishing mechanisms for pacemaking and coordination of sequential population activity. Our analysis suggests that, unlike synaptically coupled assemblies, independent neurons flexibly generate sequential population activity within the duration of a single theta cycle. DOI: http://dx.doi.org/10.7554/eLife.03542.001 PMID:25643396

  15. MIMO Radar System for Respiratory Monitoring Using Tx and Rx Modulation with M-Sequence Codes

    NASA Astrophysics Data System (ADS)

    Miwa, Takashi; Ogiwara, Shun; Yamakoshi, Yoshiki

    The importance of respiratory monitoring systems during sleep have increased due to early diagnosis of sleep apnea syndrome (SAS) in the home. This paper presents a simple respiratory monitoring system suitable for home use having 3D ranging of targets. The range resolution and azimuth resolution are obtained by a stepped frequency transmitting signal and MIMO arrays with preferred pair M-sequence codes doubly modulating in transmission and reception, respectively. Due to the use of these codes, Gold sequence codes corresponding to all the antenna combinations are equivalently modulated in receiver. The signal to interchannel interference ratio of the reconstructed image is evaluated by numerical simulations. The results of experiments on a developed prototype 3D-MIMO radar system show that this system can extract only the motion of respiration of a human subject 2m apart from a metallic rotatable reflector. Moreover, it is found that this system can successfully measure the respiration information of sleeping human subjects for 96.6 percent of the whole measurement time except for instances of large posture change.

  16. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  17. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer.

    PubMed

    Timofeeva, Maria N; Kinnersley, Ben; Farrington, Susan M; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J; Harris, Sarah E; Northwood, Emma L; Barrett, Jennifer H; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G; Houlston, Richard S

    2015-11-10

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10(-7)), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10(-7)); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10(-7) and OR = 1.09, P = 7.4 × 10(-8)); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10(-9)), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10(-6)). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10(-4)) and DNA mismatch repair genes (P = 6.1 × 10(-4)) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC.

  18. cDNA sequence of human transforming gene hst and identification of the coding sequence required for transforming activity

    SciTech Connect

    Taira, M.; Yoshida, T.; Miyagawa, K.; Sakamoto, H.; Terada, M.; Sugimura, T.

    1987-05-01

    The hst gene was originally identified as a transforming gene in DNAs from human stomach cancers and from a noncancerous portion of stomach mucosa by DNA-mediated transfection assay using NIH3T3 cells. cDNA clones of hst were isolated from the cDNA library constructed from poly(A)/sup +/ RNA of a secondary transformant induced by the DNA from a stomach cancer. The sequence analysis of the hst cDNA revealed the presence of two open reading frames. When this cDNA was inserted into an expression vector containing the simian virus 40 promoter, it efficiently induced the transformation of NIH3T3 cells upon transfection. It was found that one of the reading frames, which coded for 206 amino acids, was responsible for the transforming activity.

  19. Inference of mutation parameters and selective constraint in mammalian coding sequences by approximate Bayesian computation.

    PubMed

    Keightley, Peter D; Eöry, Lél; Halligan, Daniel L; Kirkpatrick, Mark

    2011-04-01

    We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa.

  20. A local multiple alignment method for detection of non-coding RNA sequences.

    PubMed

    Tabei, Yasuo; Asai, Kiyoshi

    2009-06-15

    Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple alignment investigations for the detection of ncRNAs; however, most of the proposed methods are designed for global multiple alignments. For this reason, these methods are not appropriate to identify locally conserved ncRNAs among genomic sequences. A more efficient local multiple alignment method for the detection of ncRNAs is required. We propose a new local multiple alignment method for the detection of ncRNAs. This method uses a local multiple alignment construction procedure inspired by ProDA, which is a local multiple aligner program for protein sequences with repeated and shuffled elements. To align sequences based on secondary structure information, we propose a new alignment model which incorporates secondary structure features. We define the conditional probability of an alignment via a conditional random field and use a gamma-centroid estimator to align sequences. The locally aligned subsequences are clustered into blocks of approximately globally alignable subsequences between pairwise alignments. Finally, these blocks are multiply aligned via MXSCARNA. In benchmark experiments, we demonstrate the high ability of the implemented software, SCARNA_LM, for local multiple alignment for the detection of ncRNAs. The C++ source code for SCARNA_LM and its experimental datasets are available at http://www.ncrna.org/software/scarna_lm/download. Supplementary data are available at Bioinformatics online.

  1. An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.

    PubMed

    Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J

    2014-11-01

    The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles.

  2. Cloning and sequencing of the gene coding for the large subunit of methylamine dehydrogenase from Thiobacillus versutus.

    PubMed Central

    Huitema, F; van Beeumen, J; van Driessche, G; Duine, J A; Canters, G W

    1993-01-01

    The gene that codes for the alpha-subunit of methylamine dehydrogenase from Thiobacillus versutus, madA, was cloned and sequenced. It codes for a protein of 395 amino acids preceded by a leader sequence of 31 amino acids. The derived amino acid sequence was confirmed by partial amino acid sequencing. The start of the mature protein could not be determined by direct sequencing, since the N terminus appeared to be blocked. Instead, it was determined by electrospray mass spectrometry. Confirmation of the results was obtained by sequencing the N terminus after pyroglutamate aminopeptidase digestion. The sequence is homologous to the Paracoccus denitrificans nucleotide sequence. A second open reading frame, called open reading frame 3, is located immediately downstream of madA. PMID:8407797

  3. Rare coding mutations identified by sequencing of Alzheimer disease genome‐wide association studies loci

    PubMed Central

    Vardarajan, Badri N.; Ghani, Mahdi; Kahn, Amanda; Sheikh, Stephanie; Sato, Christine; Barral, Sandra; Lee, Joseph H.; Cheng, Rong; Reitz, Christiane; Lantigua, Rafael; Reyes‐Dumeyer, Dolly; Medrano, Martin; Jimenez‐Velazquez, Ivonne Z.; Rogaeva, Ekaterina; St George‐Hyslop, Peter

    2015-01-01

    Objective To detect rare coding variants underlying loci detected by genome‐wide association studies (GWAS) of late onset Alzheimer disease (LOAD). Methods We conducted targeted sequencing of ABCA7, BIN1, CD2AP, CLU, CR1, EPHA1, MS4A4A/MS4A6A, and PICALM in 3 independent LOAD cohorts: 176 patients from 124 Caribbean Hispanics families, 120 patients and 33 unaffected individuals from the 129 National Institute on Aging LOAD Family Study; and 263 unrelated Canadian individuals of European ancestry (210 sporadic patients and 53 controls). Rare coding variants found in at least 2 data sets were genotyped in independent groups of ancestry‐matched controls. Additionally, the Exome Aggregation Consortium was used as a reference data set for population‐based allele frequencies. Results Overall we detected a statistically significant 3.1‐fold enrichment of the nonsynonymous mutations in the Caucasian LOAD cases compared with controls (p = 0.002) and no difference in synonymous variants. A stop‐gain mutation in ABCA7 (E1679X) and missense mutation in CD2AP (K633R) were highly significant in Caucasian LOAD cases, and mutations in EPHA1 (P460L) and BIN1 (K358R) were significant in Caribbean Hispanic families with LOAD. The EPHA1 variant segregated completely in an extended Caribbean Hispanic family and was also nominally significant in the Caucasians. Additionally, BIN1 (K358R) segregated in 2 of the 6 Caribbean Hispanic families where the mutations were discovered. Interpretation Targeted sequencing of confirmed GWAS loci revealed an excess burden of deleterious coding mutations in LOAD, with the greatest burden observed in ABCA7 and BIN1. Identifying coding variants in LOAD will facilitate the creation of tractable models for investigation of disease‐related mechanisms and potential therapies. Ann Neurol 2015;78:487–498 PMID:26101835

  4. Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

    PubMed Central

    Hemberg, Martin; Gray, Jesse M.; Cloonan, Nicole; Kuersten, Scott; Grimmond, Sean; Greenberg, Michael E.; Kreiman, Gabriel

    2012-01-01

    More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements. PMID:22684627

  5. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes.

    PubMed

    Röske, Kerstin; Foecking, Mark F; Yooseph, Shibu; Glass, John I; Calcutt, Michael J; Wise, Kim S

    2010-07-13

    -genomic shuffling. We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches.

  6. RT-PCR amplification of the complete NF1 coding sequence

    SciTech Connect

    Ming Hong Shen; Meena Upadhyaya

    1994-09-01

    Neurofibromatosis type 1 (NF1) is a common autosomal dominant disorder. The NF1 gene is a large gene, 350kb in size, with at least 51 exons. It has proved hard to detect mutations in the gene by examining genomic DNA due to the high mutation rate and the large size of the gene. Since the cloning of the gene, only 45 causative mutations have been reported from over 500 unrelated NF1 patients screened. The coding sequence of the NF1 gene is approximately 3% of the genomic sequence; it will therefore be easier to search for unknown mutations by the study of mRNA. We describe a simple RT-PCR-based strategy to amplify the total coding sequence of the NF1 transcript from peripheral blood lymphocyte RNA. This strategy involves an initial cDNA synthesis step utilizing a set of random hexamers, followed by two consecutive rounds of PCR amplifications. The first round of amplification was performed using four NF1-specific nested primer pairs. This amplification allows the construction of overlapping fragments which span a 8694 bp cDNA sequence of the gene. For mutation analysis, the amplified products or their digests were subjected to electrophoresis on Hydrolink gels. Two disease-causing mutations, a 3 bp deletion in exon 17 and a 10 bp deletion in exon 44, originally detected in the genomic DNA from two unrelated NF1 patients, have been confirmed at the RNA level. The combination of this strategy with other established techniques such as SSCP, chemical cleavage of mismatch, protein truncation test (PTT) and quantitative PCR should greatly facilitate mutation and expression analyses in the NF1 gene.

  7. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  8. Multiplex iterative plasmid engineering for combinatorial optimization of metabolic pathways and diversification of protein coding sequences.

    PubMed

    Li, Yifan; Gu, Qun; Lin, Zhenquan; Wang, Zhiwen; Chen, Tao; Zhao, Xueming

    2013-11-15

    Engineering complex biological systems typically requires combinatorial optimization to achieve the desired functionality. Here, we present Multiplex Iterative Plasmid Engineering (MIPE), which is a highly efficient and customized method for combinatorial diversification of plasmid sequences. MIPE exploits ssDNA mediated λ Red recombineering for the introduction of mutations, allowing it to target several sites simultaneously and generate libraries of up to 10(7) sequences in one reaction. We also describe "restriction digestion mediated co-selection (RD CoS)", which enables MIPE to produce enhanced recombineering efficiencies with greatly simplified coselection procedures. To demonstrate this approach, we applied MIPE to fine-tune gene expression level in the 5-gene riboflavin biosynthetic pathway and successfully isolated a clone with 2.67-fold improved production in less than a week. We further demonstrated the ability of MIPE for highly multiplexed diversification of protein coding sequence by simultaneously targeting 23 codons scattered along the 750 bp sequence. We anticipate this method to benefit the optimization of diverse biological systems in synthetic biology and metabolic engineering.

  9. Verona Coding Definitions of Emotional Sequences (VR-CoDES): Conceptual framework and future directions.

    PubMed

    Piccolo, Lidia Del; Finset, Arnstein; Mellblom, Anneli V; Figueiredo-Braga, Margarida; Korsvold, Live; Zhou, Yuefang; Zimmermann, Christa; Humphris, Gerald

    2017-06-21

    To discuss the theoretical and empirical framework of VR-CoDES and potential future direction in research based on the coding system. The paper is based on selective review of papers relevant to the construction and application of VR-CoDES. VR-CoDES system is rooted in patient-centered and biopsychosocial model of healthcare consultations and on a functional approach to emotion theory. According to the VR-CoDES, emotional interaction is studied in terms of sequences consisting of an eliciting event, an emotional expression by the patient and the immediate response by the clinician. The rationale for the emphasis on sequences, on detailed classification of cues and concerns, and on the choices of explicit vs. non-explicit responses and providing vs. reducing room for further disclosure, as basic categories of the clinician responses, is described. Results from research on VR-CoDES may help raise awareness of emotional sequences. Future directions in applying VR-CoDES in research may include studies on predicting patient and clinician behavior within the consultation, qualitative analyses of longer sequences including several VR-CoDES triads, and studies of effects of emotional communication on health outcomes. VR-CoDES may be applied to develop interventions to promote good handling of patients' emotions in healthcare encounters. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-06

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.

  11. Coding and decoding libraries of sequence-defined functional copolymers synthesized via photoligation

    PubMed Central

    Zydziak, Nicolas; Konrad, Waldemar; Feist, Florian; Afonin, Sergii; Weidner, Steffen; Barner-Kowollik, Christopher

    2016-01-01

    Designing artificial macromolecules with absolute sequence order represents a considerable challenge. Here we report an advanced light-induced avenue to monodisperse sequence-defined functional linear macromolecules up to decamers via a unique photochemical approach. The versatility of the synthetic strategy—combining sequential and modular concepts—enables the synthesis of perfect macromolecules varying in chemical constitution and topology. Specific functions are placed at arbitrary positions along the chain via the successive addition of monomer units and blocks, leading to a library of functional homopolymers, alternating copolymers and block copolymers. The in-depth characterization of each sequence-defined chain confirms the precision nature of the macromolecules. Decoding of the functional information contained in the molecular structure is achieved via tandem mass spectrometry without recourse to their synthetic history, showing that the sequence information can be read. We submit that the presented photochemical strategy is a viable and advanced concept for coding individual monomer units along a macromolecular chain. PMID:27901024

  12. Polarity Effects in the Hisg Gene of Salmonella Require a Site within the Coding Sequence

    PubMed Central

    Ciampi, M. S.; Roth, J. R.

    1988-01-01

    A single site in the middle of the coding sequence of the hisG gene of Salmonella is required for most of the polar effect of mutations in this gene. Nonsense and insertion mutations mapping upstream of this point in the hisG gene all have strong polar effects on expression of downstream genes in the operon; mutations mapping promotor distal to this site have little or no polar effect. Two previously known hisG mutations, mapping in the region of the polarity site, abolish the polarity effect of insertion mutations mapping upstream of this region. New polarity site mutations have been selected which have lost the polar effect of upstream nonsense mutations. All mutations abolishing the function of the site are small deletions; three are identical, 28-bp deletions which have arisen independently. A fourth mutation is a deletion of 16 base pairs internal to the larger deletion. Several point mutations within this 16-bp region have no effect on the function of the polarity site. We believe that a small number of polarity sites of this type are responsible for polarity in all genes. The site in the hisG gene is more easily detected than most because it appears to be the only such site in the hisG gene and because it maps in the center of the coding sequence. PMID:3282985

  13. The correlation of protein hydropathy with the base composition of coding sequences.

    PubMed

    D'Onofrio, G; Jabbari, K; Musto, H; Bernardi, G

    1999-09-30

    The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between and or ( values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the vs. correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.

  14. Optimization of Mutation Pressure in Relation to Properties of Protein-Coding Sequences in Bacterial Genomes

    PubMed Central

    Błażej, Paweł; Miasojedow, Błażej; Grabińska, Małgorzata; Mackiewicz, Paweł

    2015-01-01

    Most mutations are deleterious and require energetically costly repairs. Therefore, it seems that any minimization of mutation rate is beneficial. On the other hand, mutations generate genetic diversity indispensable for evolution and adaptation of organisms to changing environmental conditions. Thus, it is expected that a spontaneous mutational pressure should be an optimal compromise between these two extremes. In order to study the optimization of the pressure, we compared mutational transition probability matrices from bacterial genomes with artificial matrices fulfilling the same general features as the real ones, e.g., the stationary distribution and the speed of convergence to the stationarity. The artificial matrices were optimized on real protein-coding sequences based on Evolutionary Strategies approach to minimize or maximize the probability of non-synonymous substitutions and costs of amino acid replacements depending on their physicochemical properties. The results show that the empirical matrices have a tendency to minimize the effects of mutations rather than maximize their costs on the amino acid level. They were also similar to the optimized artificial matrices in the nucleotide substitution pattern, especially the high transitions/transversions ratio. We observed no substantial differences between the effects of mutational matrices on protein-coding sequences in genomes under study in respect of differently replicated DNA strands, mutational cost types and properties of the referenced artificial matrices. The findings indicate that the empirical mutational matrices are rather adapted to minimize mutational costs in the studied organisms in comparison to other matrices with similar mathematical constraints. PMID:26121655

  15. A probabilistic coding based quantum genetic algorithm for multiple sequence alignment.

    PubMed

    Huo, Hongwei; Xie, Qiaoluan; Shen, Xubang; Stojkovic, Vojislav

    2008-01-01

    This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.

  16. RNA Sequencing and Co-expressed Long Non-coding RNA in Modern and Wild Wheats.

    PubMed

    Cagirici, Halise Busra; Alptekin, Burcu; Budak, Hikmet

    2017-09-06

    There is an urgent need for the improvement of drought-tolerant bread and durum wheat. The huge and complex genome of bread wheat (BBAADD genome) stands as a vital obstruction for understanding the molecular mechanism underlying drought tolerance. However, tetraploid wheat (Triticum turgidum ssp., BBAA genome) is an ancestor of modern bread wheat and offers an important model for studying the drought response due to its less complex genome. Additionally, several wild relatives of tetraploid wheat have already shown a significant drought tolerance. We sequenced root transcriptome of three tetraploid wheat varieties with varying stress tolerance profiles, and built differential expression library of their transcripts under control and drought conditions. More than 5,000 differentially expressed transcripts were identified from each genotype. Functional characterization of transcripts specific to drought-tolerant genotype, revealed their association with osmolytes production and secondary metabolite pathways. Comparative analysis of differentially expressed genes and their non-coding RNA partners, long noncoding RNAs and microRNAs, provided valuable insight to gene expression regulation in response to drought stress. LncRNAs as well as coding transcripts share similar structural features in different tetraploid species; yet, lncRNAs slightly differ from coding transcripts. Several miRNA-lncRNA target pairs were detected as differentially expressed in drought stress. Overall, this study suggested an important pool of transcripts where their manipulations confer a better performance of wheat varieties under drought stress.

  17. Adaptive three-dimensional motion-compensated wavelet transform for image sequence coding

    NASA Astrophysics Data System (ADS)

    Leduc, Jean-Pierre

    1994-09-01

    This paper describes a 3D spatio-temporal coding algorithm for the bit-rate compression of digital-image sequences. The coding scheme is based on different specificities namely, a motion representation with a four-parameter affine model, a motion-adapted temporal wavelet decomposition along the motion trajectories and a signal-adapted spatial wavelet transform. The motion estimation is performed on the basis of four-parameter affine transformation models also called similitude. This transformation takes into account translations, rotations and scalings. The temporal wavelet filter bank exploits bi-orthogonal linear-phase dyadic decompositions. The 2D spatial decomposition is based on dyadic signal-adaptive filter banks with either para-unitary or bi-orthogonal bases. The adaptive filtering is carried out according to a performance criterion to be optimized under constraints in order to eventually maximize the compression ratio at the expense of graceful degradations of the subjective image quality. The major principles of the present technique is, in the analysis process, to extract and to separate the motion contained in the sequences from the spatio-temporal redundancy and, in the compression process, to take into account of the rate-distortion function on the basis of the spatio-temporal psycho-visual properties to achieve the most graceful degradations. To complete this description of the coding scheme, the compression procedure is therefore composed of scalar quantizers which exploit the spatio-temporal 3D psycho-visual properties of the Human Visual System and of entropy coders which finalize the bit rate compression.

  18. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  19. Both Maintenance and Avoidance of RNA-Binding Protein Interactions Constrain Coding Sequence Evolution.

    PubMed

    Savisaar, Rosina; Hurst, Laurence D

    2017-05-01

    While the principal force directing coding sequence (CDS) evolution is selection on protein function, to ensure correct gene expression CDSs must also maintain interactions with RNA-binding proteins (RBPs). Understanding how our genes are shaped by these RNA-level pressures is necessary for diagnostics and for improving transgenes. However, the evolutionary impact of the need to maintain RBP interactions remains unresolved. Are coding sequences constrained by the need to specify RBP binding motifs? If so, what proportion of mutations are affected? Might sequence evolution also be constrained by the need not to specify motifs that might attract unwanted binding, for instance because it would interfere with exon definition? Here, we have scanned human CDSs for motifs that have been experimentally determined to be recognized by RBPs. We observe two sets of motifs-those that are enriched over nucleotide-controlled null and those that are depleted. Importantly, the depleted set is enriched for motifs recognized by non-CDS binding RBPs. Supporting the functional relevance of our observations, we find that motifs that are more enriched are also slower-evolving. The net effect of this selection to preserve is a reduction in the over-all rate of synonymous evolution of 2-3% in both primates and rodents. Stronger motif depletion, on the other hand, is associated with stronger selection against motif gain in evolution. The challenge faced by our CDSs is therefore not only one of attracting the right RBPs but also of avoiding the wrong ones, all while also evolving under selection pressures related to protein structure. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. Code-Switching to Know a TL Equivalent of an L1 Word: Request-Provision-Acknowledgement (RPA) Sequence

    ERIC Educational Resources Information Center

    Lucero, Edgar

    2011-01-01

    This article focuses on the learner's use of Code-switching to learn the TL (Target Language) equivalent of an L1 word. The interactional pattern that this situation creates defines the Request-Provision-Acknowledgement (RPA) sequence. The article explains each of the turns of the sequence under the combination of the Ethnomethodological…

  1. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses.

    PubMed

    Turco, Gina; Schnable, James C; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.

  2. Two lamprey Hedgehog genes share non-coding regulatory sequences and expression patterns with gnathostome Hedgehogs.

    PubMed

    Kano, Shungo; Xiao, Jin-Hua; Osório, Joana; Ekker, Marc; Hadzhiev, Yavor; Müller, Ferenc; Casane, Didier; Magdelenat, Ghislaine; Rétaux, Sylvie

    2010-10-13

    Hedgehog (Hh) genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE) with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional) changes in the intronic/regulatory sequences.

  3. Two Lamprey Hedgehog Genes Share Non-Coding Regulatory Sequences and Expression Patterns with Gnathostome Hedgehogs

    PubMed Central

    Ekker, Marc; Hadzhiev, Yavor; Müller, Ferenc; Casane, Didier; Magdelenat, Ghislaine; Rétaux, Sylvie

    2010-01-01

    Hedgehog (Hh) genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE) with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional) changes in the intronic/regulatory sequences. PMID:20967201

  4. A molecular code dictates sequence-specific DNA recognition by homeodomains.

    PubMed Central

    Damante, G; Pellizzari, L; Esposito, G; Fogolari, F; Viglino, P; Fabbro, D; Tell, G; Formisano, S; Di Lauro, R

    1996-01-01

    Most homeodomains bind to DNA sequences containing the motif 5'-TAAT-3'. The homeodomain of thyroid transcription factor 1 (TTF-1HD) binds to sequences containing a 5'-CAAG-3' core motif, delineating a new mechanism for differential DNA recognition by homeodomains. We investigated the molecular basis of the DNA binding specificity of TTF-1HD by both structural and functional approaches. As already suggested by the three-dimensional structure of TTF-1HD, the DNA binding specificities of the TTF-1, Antennapedia and Engrailed homeodomains, either wild-type or mutants, indicated that the amino acid residue in position 54 is involved in the recognition of the nucleotide at the 3' end of the core motif 5'-NAAN-3'. The nucleotide at the 5' position of this core sequence is recognized by the amino acids located in position 6, 7 and 8 of the TTF-1 and Antennapedia homeodomains. These data, together with previous suggestions on the role of amino acids in position 50, indicate that the DNA binding specificity of homeodomains can be determined by a combinatorial molecular code. We also show that some specific combinations of the key amino acid residues involved in DNA recognition do not follow a simple, additive rule. Images PMID:8890172

  5. Contributions of the Hfq protein to translation regulation by small noncoding RNAs binding to the mRNA coding sequence.

    PubMed

    Wroblewska, Zuzanna; Olejniczak, Mikolaj

    2016-01-01

    The bacterial Sm-like protein Hfq affects the regulation of translation by small noncoding RNAs (sRNAs). In this way, Hfq participates in the cell adaptation to environmental stress, regulation of cellular metabolism, and bacterial virulence. The majority of known sRNAs bind complementary sequences in the 5'-untranslated mRNA regions. However, recent studies have shown that sRNAs can also target the mRNA coding sequence, even far downstream of the AUG start codon. In this review, we discuss how Hfq contributes to the translation regulation by those sRNAs which bind to the mRNA coding sequence.

  6. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison.

    PubMed

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-07-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features.

  7. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution

    PubMed Central

    Drummond, D. Allan; Wilke, Claus O.

    2009-01-01

    Summary The biological causes of selective pressures on coding-sequence evolution remain controversial, despite the surprising consistency of covariation between common measures of evolutionary change (substitution rates) and gene expression (mRNA levels, codon usage) across taxa. We carry out a unified analysis which reveals these conserved patterns in E. coli, yeast, worm, fly, mouse, and human, and suggests that all trends stem largely from a unified underlying selective pressure. In metazoans, these trends are strongest in tissues composed of neurons, whose structure and lifetime confer extreme sensitivity to protein misfolding. We propose, and demonstrate using a molecular-level evolutionary simulation, that selection against toxicity of misfolded proteins generated by ribosome errors suffices to create all the observed covariation. The mechanistic model of molecular evolution which emerges yields testable biochemical predictions, calls into question use of nonsynonymous-to-synonymous substitution ratios (Ka/Ks) to detect functional selection, and suggests how mistranslation may contribute to neurodegenerative disease. PMID:18662548

  8. Phylogenomic Resolution of the Phylogeny of Laurasiatherian Mammals: Exploring Phylogenetic Signals within Coding and Noncoding Sequences.

    PubMed

    Chen, Meng-Yun; Liang, Dan; Zhang, Peng

    2017-08-01

    The interordinal relationships of Laurasiatherian mammals are currently one of the most controversial questions in mammalian phylogenetics. Previous studies mainly relied on coding sequences (CDS) and seldom used noncoding sequences. Here, by data mining public genome data, we compiled an intron data set of 3,638 genes (all introns from a protein-coding gene are considered as a gene) (19,055,073 bp) and a CDS data set of 10,259 genes (20,994,285 bp), covering all major lineages of Laurasiatheria (except Pholidota). We found that the intron data contained stronger and more congruent phylogenetic signals than the CDS data. In agreement with this observation, concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies, whereas the CDS data set produced weakly supported and incongruent results. Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup, but the CDS data produced unstable results under the same conditions. Interestingly, gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical, suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data. Our final result of Laurasiatheria phylogeny is (Eulipotyphla,((Chiroptera, Perissodactyla),(Carnivora, Cetartiodactyla))), favoring a close relationship between Chiroptera and Perissodactyla. Our study 1) provides a well-supported phylogenetic framework for Laurasiatheria, representing a step towards ending the long-standing "hard" polytomy and 2) argues that intron within genome data is a promising data resource for resolving rapid radiation events across the tree of life. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding Sequences

    PubMed Central

    Wilson, Daniel J.; Hernandez, Ryan D.; Andolfatto, Peter; Przeworski, Molly

    2011-01-01

    Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions. PMID:22144911

  10. A quantum-inspired genetic algorithm based on probabilistic coding for multiple sequence alignment.

    PubMed

    Huo, Hong-Wei; Stojkovic, Vojislav; Xie, Qiao-Luan

    2010-02-01

    Quantum parallelism arises from the ability of a quantum memory register to exist in a superposition of base states. Since the number of possible base states is 2(n), where n is the number of qubits in the quantum memory register, one operation on a quantum computer performs what an exponential number of operations on a classical computer performs. The power of quantum algorithms comes from taking advantages of quantum parallelism. Quantum algorithms are exponentially faster than classical algorithms. Genetic optimization algorithms are stochastic search algorithms which are used to search large, nonlinear spaces where expert knowledge is lacking or difficult to encode. QGMALIGN--a probabilistic coding based quantum-inspired genetic algorithm for multiple sequence alignment is presented. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The experimental results show that QGMALIGN can compete with the popular methods, such as CLUSTALX and SAGA, and performs well on the presenting biological data. Moreover, the addition of genetic operators to the quantum-inspired algorithm lowers the cost of overall running time.

  11. EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes.

    PubMed

    Jeon, Yoon-Seong; Lee, Kihyun; Park, Sang-Cheol; Kim, Bong-Soo; Cho, Yong-Joon; Ha, Sung-Min; Chun, Jongsik

    2014-02-01

    EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/.

  12. Natural selection on coding and noncoding DNA sequences is associated with virulence genes in a plant pathogenic fungus.

    PubMed

    Rech, Gabriel E; Sanz-Martín, José M; Anisimova, Maria; Sukno, Serenella A; Thon, Michael R

    2014-09-04

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.

  13. Natural Selection on Coding and Noncoding DNA Sequences Is Associated with Virulence Genes in a Plant Pathogenic Fungus

    PubMed Central

    Rech, Gabriel E.; Sanz-Martín, José M.; Anisimova, Maria; Sukno, Serenella A.; Thon, Michael R.

    2014-01-01

    Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5′ untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen. PMID:25193312

  14. Detection by real time PCR of walnut allergen coding sequences in processed foods.

    PubMed

    Linacero, Rosario; Ballesteros, Isabel; Sanchiz, Africa; Prieto, Nuria; Iniesto, Elisa; Martinez, Yolanda; Pedrosa, Mercedes M; Muzquiz, Mercedes; Cabanillas, Beatriz; Rovira, Mercè; Burbano, Carmen; Cuadrado, Carmen

    2016-07-01

    A quantitative real-time PCR (RT-PCR) method, employing novel primer sets designed on Jug r 1, Jug r 3, and Jug r 4 allergen-coding sequences, was set up and validated. Its specificity, sensitivity, and applicability were evaluated. The DNA extraction method based on CTAB-phenol-chloroform was best for walnut. RT-PCR allowed a specific and accurate amplification of allergen sequence, and the limit of detection was 2.5pg of walnut DNA. The method sensitivity and robustness were confirmed with spiked samples, and Jug r 3 primers detected up to 100mg/kg of raw walnut (LOD 0.01%, LOQ 0.05%). Thermal treatment combined with pressure (autoclaving) reduced yield and amplification (integrity and quality) of walnut DNA. High hydrostatic pressure (HHP) did not produce any effect on the walnut DNA amplification. This RT-PCR method showed greater sensitivity and reliability in the detection of walnut traces in commercial foodstuffs compared with ELISA assays.

  15. Enhancement of reporter gene detection sensitivity by insertion of specific mini-peptide-coding sequences.

    PubMed

    Cutrera, J; Dibra, D; Xia, X; Li, S

    2010-02-01

    Two important aspects of gene therapy are to increase the level of gene expression and track the gene delivery site and expression, and a sensitive reporter gene may be one of the options for preclinical studies and possibly for human clinical trials. We report the novel concept of increasing the activity of the gene products. With the insertion of the mini-peptide-coding sequence CWDDWLC into the plasmid DNA of a SEAP reporter gene, we observed vast increases in the enzyme activity in vitro in all murine and human cell lines used. In addition, in vivo injection of this CWDDWLC-SEAP-encoding gene resulted in the same increases in reporter gene activity, but these increases did not correspond to alterations in the level of the gene products in the serum. Minor sequence changes in this mini-peptide negate the activity increase of the reporter gene. We report the novel concept of increasing the activity of gene products as another method to improve the reporting sensitivity of reporter genes. This improved reporter gene could complement any improved vector for maximizing the reporter sensitivity. Moreover, this strategy has the potential to be used to discover peptides that improve the activity of therapeutic genes.

  16. Recombination regulator PRDM9 influences the instability of its own coding sequence in humans.

    PubMed

    Jeffreys, Alec J; Cotton, Victoria E; Neumann, Rita; Lam, Kwan-Wood Gabriel

    2013-01-08

    PRDM9 plays a key role in specifying meiotic recombination hotspot locations in humans and mice via recognition of hotspot sequence motifs by a variable tandem-repeat zinc finger domain in the protein. We now explore germ-line instability of this domain in humans. We show that repeat turnover is driven by mitotic and meiotic mutation pathways, the latter frequently resulting in substantial remodeling of zinc fingers. Turnover dynamics predict frequent allele switches in populations with correspondingly fast changes of the recombination landscape, fully consistent with the known rapid evolution of hotspot locations. We found variation in meiotic instability between men that correlated with PRDM9 status. One particular "destabilizer" variant caused hyperinstability not only of itself but also of otherwise-stable alleles in heterozygotes. PRDM9 protein thus appears to regulate the instability of its own coding sequence. However, destabilizer variants are strongly self-limiting in populations and probably have little impact on the evolution of the recombination landscape.

  17. Real Time PCR to detect hazelnut allergen coding sequences in processed foods.

    PubMed

    Iniesto, Elisa; Jiménez, Ana; Prieto, Nuria; Cabanillas, Beatriz; Burbano, Carmen; Pedrosa, Mercedes M; Rodríguez, Julia; Muzquiz, Mercedes; Crespo, Jesús F; Cuadrado, Carmen; Linacero, Rosario

    2013-06-01

    A quantitative RT-PCR method, employing novel primer sets designed on Cor a 9, Cor a 11 and Cor a 13 allergen-coding sequences has been setup and validated. Its specificity, sensitivity and applicability have been compared. The effect of processing on detectability of these hazelnut targets in complex food matrices was also studied. The DNA extraction method based on CTAB-phenol-chloroform was the best for hazelnut. RT-PCR using primers for Cor a 9, 11 and 13 allowed a specific and accurate amplification of these sequences. The limit of detection was 1 ppm of raw hazelnut. The method sensitivity and robustness were confirmed with spiked samples. Thermal treatments (roasting and autoclaving) reduced yield and amplificability of hazelnut DNA, however, high-hydrostatic pressure did not affect. Compared with an ELISA assay, this RT-PCR showed higher sensitivity to detected hazelnut traces in commercial foodstuffs. The RT-PCR method described is the most sensitive of those reported for the detection of hazelnut traces in processed foods.

  18. Enhancement of Reporter Gene Detection Sensitivity by Insertion of Specific Mini-Peptide-Coding Sequences

    PubMed Central

    Cutrera, Jeffry; Dibra, Denada; Xia, Xueqing; Li, Shulin

    2009-01-01

    Two important aspects for gene therapy are to increase the level of gene expression and track the gene delivery site and expression, and a sensitive reporter gene may be one of the options for preclinical studies and possibly for human clinical trials. We report the novel concept of increasing the activity of the gene products. With the insertion of the mini-peptide-coding sequence CWDDWLC into the plasmid DNA of a SEAP reporter gene, we observed vast increases in the enzyme activity in vitro in all murine and human cell lines used. Also, in vivo injection of this CWDDWLC-SEAP encoding gene resulted in the same increases in reporter gene activity, but these increases did not correspond to alterations in the level of the gene products in the serum. Minor sequence changes in this mini-peptide negate the activity increase of the reporter gene. We report the novel concept of increasing the activity of gene products as another method to improve the reporting sensitivity of reporter genes. This improved reporter gene could complement any improved vector for maximizing the reporter sensitivity. Also, this strategy has the potential to be used to discover peptides that improve the activity of therapeutic genes. PMID:19713998

  19. FOURTH SEMINAR TO THE MEMORY OF D.N. KLYSHKO: Algebraic solution of the synthesis problem for coded sequences

    NASA Astrophysics Data System (ADS)

    Leukhin, Anatolii N.

    2005-08-01

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.

  20. [Learning of reproduction of random sequences by the right and the left hand movements: coding of positions or movements].

    PubMed

    Bobrova, E V; Liakhovetskiĭ, V A; Skopin, G N

    2012-01-01

    Positional and movement errors during reproduction of memorized sequences of six random hand movements were analyzed. The task was performed by two groups of subjects: during six days by one hand (right/left) and during next six days by another hand (left/right). Mean values of accuracy errors decreases during learning only in a group which begins to work by the right hand. The quantity of transposition errors depends on type of error: positional or movement one. Subjects transpose the positions of the right hand more often when it begins to perform the task. Subjects transpose the movements of the left hand more often when it begins to perform the task. The results are evident in favor of the hypothesis about two type of movement coding: positional and vector coding (coding of positions or of changing of positions) specific in the right and the left hemispheres and suggest that learning of reproduction of movement sequences is provided by vector coding.

  1. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents

    PubMed Central

    Liu, Sophia S.; Hockenberry, Adam J.; Lancichinetti, Andrea; Jewett, Michael C.

    2016-01-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems. PMID:27835644

  2. Human beta-hexosaminidase alpha chain: coding sequence and homology with the beta chain.

    PubMed Central

    Myerowitz, R; Piekarz, R; Neufeld, E F; Shows, T B; Suzuki, K

    1985-01-01

    We have isolated a cDNA clone, p beta H alpha-5, from an adult human liver library that contains the entire coding sequence of the alpha chain of beta-hexosaminidase. The cDNA insert of p beta H alpha-5 is 1944 base pairs long and contains a 168-base-pair 5' untranslated region, a 186-base-pair 3' untranslated region, and an open reading frame of 1587 base pairs corresponding to 529 amino acids (Mr, 60,697). The first 17-22 amino acids satisfy the requirements of a signal sequence. A striking sequence homology with a published partial amino acid sequence for the beta chain [O'Dowd, B. F., Quan, F., Willard, H. F., Lamhonwah, A. M., Korneluk, R. G., Lowden, J. A., Gravel, R. A. & Mahuran, D. J. (1985) Proc. Natl. Acad. Sci. USA 82, 1184-1188] suggests that both chains may have evolved from a common ancestor. A shorter alpha-chain cDNA was found to hybridize to the long arm of chromosome 15, the known location for the alpha-chain gene. In addition, we isolated another alpha-chain cDNA clone, p beta H alpha-4, from a simian virus 40-transformed human fibroblast library that contained an extra 453-base-pair piece at its 3' end. A probe consisting of this additional sequence hybridized exclusively to a single mRNA species (2.6 kilobases) in mRNA preparations from cultured human fibroblasts. In contrast, p beta H alpha-5 hybridized to both a 2.1-kilobase major and a 2.6-kilobase minor mRNA species in these same mRNA preparations, indicating the presence of two distinct alpha-chain mRNA species differing at the 3' end. Fibroblasts from an Ashkenazi Jewish patient with classic Tay-Sachs disease were deficient in both species of mRNA, confirming their genetic relationship. Images PMID:2933746

  3. Biased gene conversion and GC-content evolution in the coding sequences of reptiles and vertebrates.

    PubMed

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2014-12-19

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins.

  4. Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates

    PubMed Central

    Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas

    2015-01-01

    Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834

  5. Sequence-Based Analysis Uncovers an Abundance of Non-Coding RNA in the Total Transcriptome of Mycobacterium tuberculosis

    PubMed Central

    Arnvig, Kristine B.; Comas, Iñaki; Thomson, Nicholas R.; Houghton, Joanna; Boshoff, Helena I.; Croucher, Nicholas J.; Rose, Graham; Perkins, Timothy T.; Parkhill, Julian; Dougan, Gordon; Young, Douglas B.

    2011-01-01

    RNA sequencing provides a new perspective on the genome of Mycobacterium tuberculosis by revealing an extensive presence of non-coding RNA, including long 5’ and 3’ untranslated regions, antisense transcripts, and intergenic small RNA (sRNA) molecules. More than a quarter of all sequence reads mapping outside of ribosomal RNA genes represent non-coding RNA, and the density of reads mapping to intergenic regions was more than two-fold higher than that mapping to annotated coding sequences. Selected sRNAs were found at increased abundance in stationary phase cultures and accumulated to remarkably high levels in the lungs of chronically infected mice, indicating a potential contribution to pathogenesis. The ability of tubercle bacilli to adapt to changing environments within the host is critical to their ability to cause disease and to persist during drug treatment; it is likely that novel post-transcriptional regulatory networks will play an important role in these adaptive responses. PMID:22072964

  6. A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N(1)-methyladenosine modification.

    PubMed

    Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P

    2017-03-01

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N(1)-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N(1)-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  7. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  8. Variation in the number of nucleoli and incomplete homogenization of 18S ribosomal DNA sequences in leaf cells of the cultivated Oriental ginseng (Panax ginseng Meyer)

    PubMed Central

    Chelomina, Galina N.; Rozhkovan, Konstantin V.; Voronova, Anastasia N.; Burundukova, Olga L.; Muzarok, Tamara I.; Zhuravlev, Yuri N.

    2015-01-01

    Background Wild ginseng, Panax ginseng Meyer, is an endangered species of medicinal plants. In the present study, we analyzed variations within the ribosomal DNA (rDNA) cluster to gain insight into the genetic diversity of the Oriental ginseng, P. ginseng, at artificial plant cultivation. Methods The roots of wild P. ginseng plants were sampled from a nonprotected natural population of the Russian Far East. The slides were prepared from leaf tissues using the squash technique for cytogenetic analysis. The 18S rDNA sequences were cloned and sequenced. The distribution of nucleotide diversity, recombination events, and interspecific phylogenies for the total 18S rDNA sequence data set was also examined. Results In mesophyll cells, mononucleolar nuclei were estimated to be dominant (75.7%), while the remaining nuclei contained two to four nucleoli. Among the analyzed 18S rDNA clones, 20% were identical to the 18S rDNA sequence of P. ginseng from Japan, and other clones differed in one to six substitutions. The nucleotide polymorphism was more expressed at the positions 440–640 bp, and distributed in variable regions, expansion segments, and conservative elements of core structure. The phylogenetic analysis confirmed conspecificity of ginseng plants cultivated in different regions, with two fixed mutations between P. ginseng and other species. Conclusion This study identified the evidences of the intragenomic nucleotide polymorphism in the 18S rDNA sequences of P. ginseng. These data suggest that, in cultivated plants, the observed genome instability may influence the synthesis of biologically active compounds, which are widely used in traditional medicine. PMID:27158239

  9. Capsid coding sequences of foot-and-mouth disease viruses are determinants of pathogenicity in pigs.

    PubMed

    Lohse, Louise; Jackson, Terry; Bøtner, Anette; Belsham, Graham J

    2012-05-24

    The surface exposed capsid proteins, VP1, VP2 and VP3, of foot-and-mouth disease virus (FMDV) determine its antigenicity and the ability of the virus to interact with host-cell receptors. Hence, modification of these structural proteins may alter the properties of the virus.In the present study we compared the pathogenicity of different FMDVs in young pigs. In total 32 pigs, 7-weeks-old, were exposed to virus, either by direct inoculation or through contact with inoculated pigs, using cell culture adapted (O1K B64), chimeric (O1K/A-TUR and O1K/O-UKG) or field strain (O-UKG/34/2001) viruses. The O1K B64 virus and the two chimeric viruses are identical to each other except for the capsid coding region.Animals exposed to O1K B64 did not exhibit signs of disease, while pigs exposed to each of the other viruses showed typical clinical signs of foot-and-mouth disease (FMD). All pigs infected with the O1K/O-UKG chimera or the field strain (O-UKG/34/2001) developed fulminant disease. Furthermore, 3 of 4 in-contact pigs exposed to the O1K/O-UKG virus died in the acute phase of infection, likely from myocardial infection. However, in the group exposed to the O1K/A-TUR chimeric virus, only 1 pig showed symptoms of disease within the time frame of the experiment (10 days). All pigs that developed clinical disease showed a high level of viral RNA in serum and infected pigs that survived the acute phase of infection developed a serotype specific antibody response. It is concluded that the capsid coding sequences are determinants of FMDV pathogenicity in pigs.

  10. Substantia nigra pars reticulata neurons code initiation of a serial pattern: implications for natural action sequences and sequential disorders.

    PubMed

    Meyer-Luehmann, Melanie; Thompson, Jeffrey F; Berridge, Kent C; Aldridge, J Wayne

    2002-10-01

    Sequences of movements are initiated abnormally in neurological disorders involving basal ganglia dysfunction, such as Parkinson's disease or Tourette's syndrome. The substantia nigra pars reticulata (SNpr) is one of the two primary output structures of the basal ganglia. However, little is known about how substantia nigra mediates the initiation of normal movement sequences. We studied its role in coding initiation of a sequentially stereotyped but natural movement sequence by recording neuronal activity in SNpr during behavioural performance of 'syntactic grooming chains'. These are rule-governed sequences of up to 25 grooming movements emitted in four predictable (syntactic) phases, which occur spontaneously during grooming behaviour by rats and other rodents. Our results show that neuronal activation in central SNpr codes the onset of this entire rule-governed sequential pattern of grooming actions, not elemental grooming movements. We conclude that the context of sequential pattern may be more important than the elemental motor parameters in determining SNpr neuronal activation.

  11. The vicilin gene family of pea (Pisum sativum L.): a complete cDNA coding sequence for preprovicilin.

    PubMed Central

    Lycett, G W; Delauney, A J; Gatehouse, J A; Gilroy, J; Croy, R R; Boulter, D

    1983-01-01

    A cDNA plasmid bank has been constructed using mRNA from developing pea seeds and three cDNAs coding for vicilin polypeptides have been selected. These cDNAs have been sequenced and between them cover the whole of the coding sequence plus part of the 5' and 3' untranslated regions. Comparison with amino acid sequence data from the protein indicates that vicilin is synthesised as preprovicilin with subsequent removal of a signal peptide and a C-terminal peptide as well as post translational endo-proteolytic cleavage. The cDNAs represent two different classes of vicilin genes whilst amino acid data show that there are at least three major classes of vicilin polypeptide. The vicilin sequences show extensive homology with conglycinin and phaseolin except in the regions of the internal proteolytic cleavages. The evolutionary significance of this relationship is discussed. Images PMID:6687941

  12. Classifier assessment and feature selection for recognizing short coding sequences of human genes.

    PubMed

    Song, Kai; Zhang, Ze; Tong, Tuo-Peng; Wu, Fang

    2012-03-01

    With the ever-increasing pace of genome sequencing, there is a great need for fast and accurate computational tools to automatically identify genes in these genomes. Although great progress has been made in the development of gene-finding algorithms during the past decades, there is still room for further improvement. In particular, the issue of recognizing short exons in eukaryotes is still not solved satisfactorily. This article is devoted to assessing various linear and kernel-based classification algorithms and selecting the best combination of Z-curve features for further improvement of the issue. Eight state-of-the-art linear and kernel-based supervised pattern recognition techniques were used to identify the short (21-192 bp) coding sequences of human genes. By measuring the prediction accuracy, the tradeoff between sensitivity and specificity and the time consumption, partial least squares (PLS) and kernel partial least squares (KPLS) algorithms were verified to be the most optimal linear and kernel-based classifiers, respectively. A surprising result was that, by making good use of the interpretability of the PLS and the Z-curve methods, 93 Z-curve features were proved to be the best selective combination. Using them, the average recognition accuracy was improved as high as 7.7% by means of KPLS when compared with what was obtained by the Fisher discriminant analysis using 189 Z-curve variables (Gao and Zhang, 2004 ). The used codes are freely available from the following approaches (implemented in MATLAB and supported on Linux and MS Windows): (1) SVM: http://www.support-vector-machines.org/SVM_soft.html. (2) GP: http://www.gaussianprocess.org. (3) KPLS and KFDA: Taylor, J.S., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK. (4) PLS: Wise, B.M., and Gallagher, N.B. 2011. PLS-Toolbox for use with MATLAB: ver 1.5.2. Eigenvector Technologies, Manson, WA. Supplementary Material for this article is

  13. [Transposition errors during learning to reproduce a sequence by the right- and the left-hand movements: simulation of positional and movement coding].

    PubMed

    Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N

    2012-01-01

    Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.

  14. Empirical Transition Probability Indexing Sparse-Coding Belief Propagation (ETPI-SCoBeP) Genome Sequence Alignment

    PubMed Central

    Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2014-01-01

    The advance in human genome sequencing technology has significantly reduced the cost of data generation and overwhelms the computing capability of sequence analysis. Efficiency, efficacy, and scalability remain challenging in sequence alignment, which is an important and foundational operation for genome data analysis. In this paper, we propose a two-stage approach to tackle this problem. In the preprocessing step, we match blocks of reference and target sequences based on the similarities between their empirical transition probability distributions using belief propagation. We then conduct a refined match using our recently published sparse-coding belief propagation (SCoBeP) technique. Our experimental results demonstrated robustness in nucleotide sequence alignment, and our results are competitive to those of the SOAP aligner and the BWA algorithm. Moreover, compared to SCoBeP alignment, the proposed technique can handle sequences of much longer lengths. PMID:25983537

  15. Mice carrying a complete deletion of the talin2 coding sequence are viable and fertile

    SciTech Connect

    Debrand, Emmanuel; Conti, Francesco J.; Bate, Neil; Spence, Lorraine; Mazzeo, Daniela; Pritchard, Catrin A.; Monkley, Susan J.; Critchley, David R.

    2012-09-21

    Highlights: Black-Right-Pointing-Pointer Mice lacking talin2 are viable and fertile with only a mildly dystrophic phenotype. Black-Right-Pointing-Pointer Talin2 null fibroblasts show no major defects in proliferation, adhesion or migration. Black-Right-Pointing-Pointer Maintaining a colony of talin2 null mice is difficult indicating an underlying defect. -- Abstract: Mice homozygous for several Tln2 gene targeted alleles are viable and fertile. Here we show that although the expression of talin2 protein is drastically reduced in muscle from these mice, other tissues continue to express talin2 albeit at reduced levels. We therefore generated a Tln2 allele lacking the entire coding sequence (Tln2{sup cd}). Tln2{sup cd/cd} mice were viable and fertile, and the genotypes of Tln2{sup cd/+} intercrosses were at the expected Mendelian ratio. Tln2{sup cd/cd} mice showed no major difference in body mass or the weight of the major organs compared to wild-type, although they displayed a mildly dystrophic phenotype. Moreover, Tln2{sup cd/cd} mouse embryo fibroblasts showed no obvious defects in cell adhesion, migration or proliferation. However, the number of Tln2{sup cd/cd} pups surviving to adulthood was variable suggesting that such mice have an underlying defect.

  16. Cymbidium Ringspot Tombusvirus Coat Protein Coding Sequence Acts as an Avirulent RNA

    PubMed Central

    Szittya, György; Burgyán, József

    2001-01-01

    Avirulent genes either directly or indirectly produce elicitors that are recognized by specific receptors of plant resistance genes, leading to the induction of host defense responses such as hypersensitive reaction (HR). HR is characterized by the development of a necrotic lesion at the site of infection which results in confinement of the invader to this area. Artificial chimeras and mutants of cymbidium ringspot (CymRSV) and the pepper isolate of tomato bushy stunt (TBSV-P) tombusviruses were used to determine viral factors involved in the HR resistance phenotype of Datura stramonium upon infection with CymRSV. A series of constructs carrying deletions and frameshifts of the CymRSV coat protein (CP) undoubtedly clarified that an 860-nucleotide (nt)-long RNA sequence in the CymRSV CP coding region (between nt 2666 and 3526) is the elicitor of a very rapid HR-like response of D. stramonium which limits the virus spread. This finding provides the first evidence that an untranslatable RNA can trigger an HR-like resistance response in virus-infected plants. The effectiveness of the resistance response might indicate that other nonhost resistance could also be due to RNA-mediated HR. It is an appealing explanation that RNA-mediated HR has evolved as an alternative defense strategy against RNA viruses. PMID:11160744

  17. PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3′ UTRs and coding sequences

    PubMed Central

    Šulc, Miroslav; Marín, Ray M.; Robins, Harlan S.; Vaníček, Jiří

    2015-01-01

    The purpose of the proposed web server, publicly available at http://paccmit.epfl.ch, is to provide a user-friendly interface to two algorithms for predicting messenger RNA (mRNA) molecules regulated by microRNAs: (i) PACCMIT (Prediction of ACcessible and/or Conserved MIcroRNA Targets), which identifies primarily mRNA transcripts targeted in their 3′ untranslated regions (3′ UTRs), and (ii) PACCMIT-CDS, designed to find mRNAs targeted within their coding sequences (CDSs). While PACCMIT belongs among the accurate algorithms for predicting conserved microRNA targets in the 3′ UTRs, the main contribution of the web server is 2-fold: PACCMIT provides an accurate tool for predicting targets also of weakly conserved or non-conserved microRNAs, whereas PACCMIT-CDS addresses the lack of similar portals adapted specifically for targets in CDS. The web server asks the user for microRNAs and mRNAs to be analyzed, accesses the precomputed P-values for all microRNA–mRNA pairs from a database for all mRNAs and microRNAs in a given species, ranks the predicted microRNA–mRNA pairs, evaluates their significance according to the false discovery rate and finally displays the predictions in a tabular form. The results are also available for download in several standard formats. PMID:25948580

  18. Unstable microsatellite repeats facilitate rapid evolution of coding and regulatory sequences.

    PubMed

    Jansen, A; Gemayel, R; Verstrepen, K J

    2012-01-01

    Tandem repeats are intrinsically highly variable sequences since repeat units are often lost or gained during replication or following unequal recombination events. Because of their low complexity and their instability, these repeats, which are also called satellite repeats, are often considered to be useless 'junk' DNA. However, recent findings show that tandem repeats are frequently found within promoters of stress-induced genes and within the coding regions of genes encoding cell-surface and regulatory proteins. Interestingly, frequent changes in these repeats often confer phenotypic variability. Examples include variation in the microbial cell surface, rapid tuning of internal molecular clocks in flies, and enhanced morphological plasticity in mammals. This suggests that instead of being useless junk DNA, some variable tandem repeats are useful functional elements that confer 'evolvability', facilitating swift evolution and rapid adaptation to changing environments. Since changes in repeats are frequent and reversible, repeats provide a unique type of mutation that bridges the gap between rare genetic mutations, such as single nucleotide polymorphisms, and highly unstable but reversible epigenetic inheritance.

  19. Multiple Distinct Splicing Enhancers in the Protein-Coding Sequences of a Constitutively Spliced Pre-mRNA

    PubMed Central

    Schaal, Thomas D.; Maniatis, Tom

    1999-01-01

    We have identified multiple distinct splicing enhancer elements within protein-coding sequences of the constitutively spliced human β-globin pre-mRNA. Each of these highly conserved sequences is sufficient to activate the splicing of a heterologous enhancer-dependent pre-mRNA. One of these enhancers is activated by and binds to the SR protein SC35, whereas at least two others are activated by the SR protein SF2/ASF. A single base mutation within another enhancer element inactivates the enhancer but does not change the encoded amino acid. Thus, overlapping protein coding and RNA recognition elements may be coselected during evolution. These studies provide the first direct evidence that SR protein-specific splicing enhancers are located within the coding regions of constitutively spliced pre-mRNAs. We propose that these enhancers function as multisite splicing enhancers to specify 3′ splice-site selection. PMID:9858550

  20. ICRPfinder: a fast pattern design algorithm for coding sequences and its application in finding potential restriction enzyme recognition sites

    PubMed Central

    Li, Chao; Li, Yuhua; Zhang, Xiangmin; Stafford, Phillip; Dinu, Valentin

    2009-01-01

    Background Restriction enzymes can produce easily definable segments from DNA sequences by using a variety of cut patterns. There are, however, no software tools that can aid in gene building -- that is, modifying wild-type DNA sequences to express the same wild-type amino acid sequences but with enhanced codons, specific cut sites, unique post-translational modifications, and other engineered-in components for recombinant applications. A fast DNA pattern design algorithm, ICRPfinder, is provided in this paper and applied to find or create potential recognition sites in target coding sequences. Results ICRPfinder is applied to find or create restriction enzyme recognition sites by introducing silent mutations. The algorithm is shown capable of mapping existing cut-sites but importantly it also can generate specified new unique cut-sites within a specified region that are guaranteed not to be present elsewhere in the DNA sequence. Conclusion ICRPfinder is a powerful tool for finding or creating specific DNA patterns in a given target coding sequence. ICRPfinder finds or creates patterns, which can include restriction enzyme recognition sites, without changing the translated protein sequence. ICRPfinder is a browser-based JavaScript application and it can run on any platform, in on-line or off-line mode. PMID:19747395

  1. A novel all-optical label processing based on multiple optical orthogonal codes sequences for optical packet switching networks

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Xu, Bo; Ling, Yun

    2008-05-01

    This paper proposes an all-optical label processing scheme that uses the multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) networks. In this scheme, each MOOCS is a permutation or combination of the multiple optical orthogonal codes (MOOC) selected from the multiple-groups optical orthogonal codes (MGOOC). Following a comparison of different optical label processing (OLP) schemes, the principles of MOOCS-OPS network are given and analyzed. Firstly, theoretical analyses are used to prove that MOOCS is able to greatly enlarge the number of available optical labels when compared to the previous single optical orthogonal code (SOOC) for OPS (SOOC-OPS) network. Then, the key units of the MOOCS-based optical label packets, including optical packet generation, optical label erasing, optical label extraction and optical label rewriting etc., are given and studied. These results are used to verify that the proposed MOOCS-OPS scheme is feasible.

  2. Reasoning from Incomplete Knowledge.

    ERIC Educational Resources Information Center

    Collins, Allan M.; And Others

    People use a variety of plausible, but uncertain inferences to answer questions about which their knowledge is incomplete. Such inferential thinking and reasoning is being incorporated into the SCHOLAR computer-assisted instruction (CAI) system. Socratic tutorial techniques in CAI systems such as SCHOLAR are described, and examples of their…

  3. Nucleotide sequence of the capsid protein gene and 3' non-coding region of papaya mosaic virus RNA.

    PubMed

    Abouhaidar, M G

    1988-01-01

    The nucleotide sequences of cDNA clones corresponding to the 3' OH end of papaya mosaic virus RNA have been determined. The 3'-terminal sequence obtained was 900 nucleotides in length, excluding the poly(A) tail, and contained an open reading frame capable of giving rise to a protein of 214 amino acid residues with an Mr of 22930. This protein was identified as the viral capsid protein. The 3' non-coding region of PMV genome RNA was about 121 nucleotides long [excluding the poly(A) tail] and homologous to the complementary sequence of the non-coding region at the 5' end of PMV RNA. A long open reading frame was also found in the predicted 5' end region of the negative strand.

  4. Applying the Verona coding definitions of emotional sequences (VR-CoDES) to code medical students' written responses to written case scenarios: Some methodological and practical considerations.

    PubMed

    Ortwein, Heiderose; Benz, Alexander; Carl, Petra; Huwendiek, Sören; Pander, Tanja; Kiessling, Claudia

    2017-02-01

    To investigate whether the Verona Coding Definitions of Emotional Sequences to code health providers' responses (VR-CoDES-P) can be used for assessment of medical students' responses to patients' cues and concerns provided in written case vignettes. Student responses in direct speech to patient cues and concerns were analysed in 21 different case scenarios using VR-CoDES-P. A total of 977 student responses were available for coding, and 857 responses were codable with the VR-CoDES-P. In 74.6% of responses, the students used either a "reducing space" statement only or a "providing space" statement immediately followed by a "reducing space" statement. Overall, the most frequent response was explicit information advice (ERIa) followed by content exploring (EPCEx) and content acknowledgement (EPCAc). VR-CoDES-P were applicable to written responses of medical students when they were phrased in direct speech. The application of VR-CoDES-P is reliable and feasible when using the differentiation of "providing" and "reducing space" responses. Communication strategies described by students in non-direct speech were difficult to code and produced many missings. VR-CoDES-P are useful for analysis of medical students' written responses when focusing on emotional issues. Students need precise instructions for their response in the given test format. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Code optimization of the subroutine to remove near identical matches in the sequence database homology search tool PSI-BLAST.

    PubMed

    Aspnäs, Mats; Mattila, Kimmo; Osowski, Kristoffer; Westerholm, Jan

    2010-06-01

    A central task in protein sequence characterization is the use of a sequence database homology search tool to find similar protein sequences in other individuals or species. PSI-BLAST is a widely used module of the BLAST package that calculates a position-specific score matrix from the best matching sequences and performs iterated searches using a method to avoid many similar sequences for the score. For some queries and parameter settings, PSI-BLAST may find many similar high-scoring matches, and therefore up to 80% of the total run time may be spent in this procedure. In this article, we present code optimizations that improve the cache utilization and the overall performance of this procedure. Measurements show that, for queries where the number of similar matches is high, the optimized PSI-BLAST program may be as much as 2.9 times faster than the original program.

  6. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  7. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation

    PubMed Central

    McLysaght, Aoife; Guerzoni, Daniele

    2015-01-01

    The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. PMID:26323763

  8. Cloning and nucleotide sequence of the gene coding for aspartokinase II from a thermophilic methylotrophic Bacillus sp.

    PubMed Central

    Schendel, F J; Flickinger, M C

    1992-01-01

    The structural gene coding for the lysine-sensitive aspartokinase II of the methylotrophic thermotolerant Bacillus sp. strain MGA3 was cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking all three aspartokinase isozymes. The nucleotide sequence of the entire 2.2-kb PstI fragment was determined, and a single open reading frame coding for the aspartokinase II enzyme was found. Aspartokinase II was shown to be an alpha 2 beta 2 tetramer (M(r) 122,000) with the beta subunit (M(r) 18,000) encoded within the alpha subunit (M(r) 45,000) in the samea reading frame. The enzyme was purified, and the N-terminal sequences of the alpha and beta subunits were identical with those predicted from the gene sequences. The predicted amino acid sequence was 76% identical with the sequence of the Bacillus subtilis aspartokinase II. The transcription initiation site was located approximately 350 bp upstream of the translation start site, and putative promoter regions at -10 (TATGCT) and -35 (ATGACA) were identified. A 300-nucleotide intervening sequence between the transcription initiation and translational start sites suggests a possible attenuation mechanism for the regulation of transcription of this enzyme in the presence of lysine. Images PMID:1444390

  9. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    NASA Astrophysics Data System (ADS)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  10. Resolving arthropod phylogeny: exploring phylogenetic signal within 41 kb of protein-coding nuclear gene sequence.

    PubMed

    Regier, Jerome C; Shultz, Jeffrey W; Ganley, Austen R D; Hussey, April; Shi, Diane; Ball, Bernard; Zwick, Andreas; Stajich, Jason E; Cummings, Michael P; Martin, Joel W; Cunningham, Clifford W

    2008-12-01

    This study attempts to resolve relationships among and within the four basal arthropod lineages (Pancrustacea, Myriapoda, Euchelicerata, Pycnogonida) and to assess the widespread expectation that remaining phylogenetic problems will yield to increasing amounts of sequence data. Sixty-eight regions of 62 protein-coding nuclear genes (approximately 41 kilobases (kb)/taxon) were sequenced for 12 taxonomically diverse arthropod taxa and a tardigrade outgroup. Parsimony, likelihood, and Bayesian analyses of total nucleotide data generally strongly supported the monophyly of each of the basal lineages represented by more than one species. Other relationships within the Arthropoda were also supported, with support levels depending on method of analysis and inclusion/exclusion of synonymous changes. Removing third codon positions, where the assumption of base compositional homogeneity was rejected, altered the results. Removing the final class of synonymous mutations--first codon positions encoding leucine and arginine, which were also compositionally heterogeneous--yielded a data set that was consistent with a hypothesis of base compositional homogeneity. Furthermore, under such a data-exclusion regime, all 68 gene regions individually were consistent with base compositional homogeneity. Restricting likelihood analyses to nonsynonymous change recovered trees with strong support for the basal lineages but not for other groups that were variably supported with more inclusive data sets. In a further effort to increase phylogenetic signal, three types of data exploration were undertaken. (1) Individual genes were ranked by their average rate of nonsynonymous change, and three rate categories were assigned--fast, intermediate, and slow. Then, bootstrap analysis of each gene was performed separately to see which taxonomic groups received strong support. Five taxonomic groups were strongly supported independently by two or more genes, and these genes mostly belonged to the slow

  11. Characterization of EBV Promoters and Coding Regions by Sequencing PCR-Amplified DNA Fragments.

    PubMed

    Szenthe, Kalman; Bánáti, Ferenc

    2017-01-01

    DNA sequencing approaches originally developed in two directions, the chemical degradation method and the chain-termination method. The latter one became more widespread and a huge amount of sequencing data including whole genome sequences accumulated, based on the use of capillary sequencer systems and the application of a modified chain-termination method which proved to be relatively easy, fast, and reliable. In addition, relatively long, up to 1000 bp sequences could be obtained with a single read with high per-base accuracy. Although the recent appearance of next-generation DNA sequencing (NGS) technologies enabled high-throughput and low cost analysis of DNA, the modified chain-terminating methods are often applied in research until now. In the following, we shall present the application of capillary sequencing for the sequence characterization of viral genomes in case of partial and whole genome sequencing, and demonstrate it on the BARF1 promoter of Epstein Barr virus (EBV).

  12. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  13. Nucleotide sequence of the melA gene, coding for alpha-galactosidase in Escherichia coli K-12.

    PubMed Central

    Liljeström, P L; Liljeström, P

    1987-01-01

    Melibiose uptake and hydrolysis in E.coli is performed by the MelB and MelA proteins, respectively. We report the cloning and sequencing of the melA gene. The nucleotide sequence data showed that melA codes for a 450 amino acid long protein with a molecular weight of 50.6 kd. The sequence data also supported the assumption that the mel locus forms an operon with melA in proximal position. A comparison of MelA with alpha-galactosidase proteins from yeast and human origin showed that these proteins have only limited homology, the yeast and human proteins being more related. However, regions common to all three proteins were found indicating sequences that might comprise the active site of alpha-galactosidase. PMID:3031590

  14. Melting temperature highlights functionally important RNA structure and sequence elements in yeast mRNA coding regions.

    PubMed

    Qi, Fei; Frishman, Dmitrij

    2017-03-07

    Secondary structure elements in the coding regions of mRNAs play an important role in gene expression and regulation, but distinguishing functional from non-functional structures remains challenging. Here we investigate the dependence of sequence-structure relationships in the coding regions on temperature based on the recent PARTE data by Wan et al. Our main finding is that the regions with high and low thermostability (high Tm and low Tm regions) are under evolutionary pressure to preserve RNA secondary structure and primary sequence, respectively. Sequences of low Tm regions display a higher degree of evolutionary conservation compared to high Tm regions. Low Tm regions are under strong synonymous constraint, while high Tm regions are not. These findings imply that high Tm regions contain thermo-stable functionally important RNA structures, which impose relaxed evolutionary constraint on sequence as long as the base-pairing patterns remain intact. By contrast, low thermostability regions contain single-stranded functionally important conserved RNA sequence elements accessible for binding by other molecules. We also find that theoretically predicted structures of paralogous mRNA pairs become more similar with growing temperature, while experimentally measured structures tend to diverge, which implies that the melting pathways of RNA structures cannot be fully captured by current computational approaches.

  15. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia

  16. Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

    PubMed

    Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

    2015-07-01

    Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke

  17. Application of the verona coding definitions of emotional sequences (VR-CoDES) on a pediatric data set.

    PubMed

    Vatne, Torun M; Finset, Arnstein; Ørnes, Knut; Ruland, Cornelia M

    2010-09-01

    Adult patients present concerns as defined in the Verona Coding Definitions of Emotional Sequences (VR-CoDES), but we do not know how children express their concerns during medical consultations. This study aimed to evaluate the applicability of VR-CoDES to pediatric oncology consultations. Twenty-eight pediatric consultations were coded with the Verona Coding Definitions of Emotional Sequences (VR-CoDES), and the material was also qualitatively analyzed for descriptive purposes. Five consultations were randomly selected for reliability testing and descriptive statistics were computed. Perfect inter-rater reliability for concerns and moderate reliability for cues were obtained. Cues and/or concerns were present in over half of the consultations. Cues were more frequent than concerns, with the majority of cues being verbal hints to hidden concerns or non-verbal cues. Intensity of expressions, limitations in vocabulary, commonality of statements, and complexity of the setting complicated the use of VR-CoDES. Child-specific cues; use of the imperative, cues about past experiences, and use of onomatopoeia were observed. Children with cancer express concerns during medical consultations. VR-CoDES is a reliable tool for coding concerns in pediatric data sets. For future applications in pediatric settings an appendix should be developed to incorporate the child-specific traits. Copyright (c) 2010 Elsevier Ireland Ltd. All rights reserved.

  18. Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes.

    PubMed

    Yu, Jia-Feng; Chen, Qing-Li; Ren, Jing; Yang, Yan-Ling; Wang, Ji-Hua; Sun, Xiao

    2015-07-07

    The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.

  19. ENAM Mutations with Incomplete Penetrance

    PubMed Central

    Seymen, F.; Lee, K.-E.; Koruyucu, M.; Gencay, K.; Bayram, M.; Tuna, E.B.; Lee, Z.H.; Kim, J.-W.

    2014-01-01

    Amelogenesis imperfecta (AI) is a genetic disease affecting tooth enamel formation. AI can be an isolated entity or a phenotype of syndromes. To date, more than 10 genes have been associated with various forms of AI. We have identified 2 unrelated Turkish families with hypoplastic AI and performed mutational analysis. Whole-exome sequencing identified 2 novel heterozygous nonsense mutations in the ENAM gene (c.454G>T p.Glu152* in family 1, c.358C>T p.Gln120* in family 2) in the probands. Affected individuals were heterozygous for the mutation in each family. Segregation analysis within each family revealed individuals with incomplete penetrance or extremely mild enamel phenotype, in spite of having the same mutation with the other affected individuals. We believe that these findings will broaden our understanding of the clinical phenotype of AI caused by ENAM mutations. PMID:25143514

  20. 5'-coding sequence of the nasA gene of Azotobacter vinelandii is required for efficient expression.

    PubMed

    Wang, Baomin; Wang, Yumei; Kennedy, Christina

    2014-10-01

    The operon nasACBH in Azotobacter vinelandii encodes nitrate and nitrite reductases that sequentially reduce nitrate to nitrite and to ammonium for nitrogen assimilation into organic molecules. Our previous analyses showed that nasACBH expression is subject to antitermination regulation that occurs upstream of the nasA gene in response to the availability of nitrate and nitrite. In this study, we continued expression analyses of the nasA gene and observed that the nasA 5'-coding sequence plays an important role in gene expression, as demonstrated by the fact that deletions caused over sixfold reduction in the expression of the lacZ reporter gene. Further analysis suggests that the nasA 5'-coding sequence promotes gene expression in a way that is not associated with weakened transcript folding around the translational initiation region or codon usage bias. The findings from this study imply that there exists potential to improve gene expression in A. vinelandii by optimizing 5'-coding sequences. © 2014 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  1. Systematic analyses of the cancer genome: lessons learned from sequencing most of the annotated human protein-coding genes.

    PubMed

    Sjöblom, Tobias

    2008-01-01

    The availability of a reference human genome sequence has enabled unbiased mutational analyses of tumor genomes to identify the mutated genes that cause cancer. This review discusses recent insights from such analyses of protein-coding genes in breast and colorectal cancers. Mutational analyses of approximately 18,000 human protein-coding genes in breast and colorectal cancers have identified 280 candidate cancer genes. These include known cancer genes, but most had not previously been linked to cancer. There are few frequently mutated cancer genes among hundreds of less frequently mutated candidate cancer genes, and the compendium of mutated genes differs among tumors of the same tissue origin. Recent work has shown the feasibility of coding cancer genome sequencing, and new technologies promise to facilitate these mutational analyses. Whereas cancer genetics can identify candidate genes in a rapid and scalable fashion, careful functional studies of mutated genes are required for ultimate proof of cancer gene status and translation into clinical utility. The rapid progress of cancer genetics has yielded novel diagnostic and therapeutic modalities, and cancer genome sequencing will accelerate this development to the benefit of cancer patients.

  2. Dual-frequency tissue harmonic suppression using phase-coded pulse sequence: proof of concept using a phantom.

    PubMed

    Shen, Che-Chou; Wang, Hui-Ting

    2013-03-01

    The presence of tissue harmonic generation during acoustic propagation is one major limitation in nonlinear detection of microbubble contrast agents. However, conventional solutions for tissue harmonic suppression are not applicable in dual-frequency (DF) harmonic imaging. In DF harmonic imaging, the second harmonic signal at second harmonic (2f(0)) frequency and the inter-modulation harmonic signal at fundamental (f(0)) frequency are simultaneously generated for imaging and both need to be suppressed to improve contrast-to-tissue ratio (CTR). In this study, a novel phase-coded pulse sequence is developed to accomplish DF tissue harmonic suppression. Phase-coded pulse sequence utilizes multiple firings with equidistant transmit phase for harmonic cancellation in the sum of respective echoes. For the f(0) transmit component, the transmit phase comes from the equidistant set of {-2π/3, 0, 2π/3} to suppress the second harmonic signal at 2f(0) frequency. Moreover, in order to provide the inter-modulation harmonic suppression at f(0) frequency, the 2f(0) transmit phase has to be particularly manipulated for the corresponding f(0) transmit phase. The proposed three-pulse sequence can remove not only the second-order harmonic signal but also other higher-order counterparts at both f(0) and 2f(0) frequencies. Measurements were performed at f(0) equal to 2.25 MHz and using hydrophone in water and contrast agents in tissue phantom. Experimental results indicate that the sequence reduces the tissue harmonic magnitude by about 20 dB along the entire axial depths and the corresponding CTR improves at both frequencies. In DF harmonic imaging, the proposed phase-coded sequence can effectively remove the tissue harmonic background at both f(0) and 2f(0) frequencies for improvement of contrast detection. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Range sidelobe elimination in maximal sequence phase coded C.W. radars

    NASA Astrophysics Data System (ADS)

    Metaxas, D. G.; Aitchison, C. S.

    Elimination of the range sidelobe of the autocorrelation of the periodic binary maximal sequence has been achieved. A new property of the m-sequence is defined. As a result the ambiguity function of an m-sequence PSK CW radar coincides with the ambiguity function of the equivalent pulse radar as far as the range and velocity resolution are concerned. The signal to noise deterioration due to the post-correlation implementation of the new property is insignificant.

  4. An Incomplete Paradigm

    ERIC Educational Resources Information Center

    Boulding, Kenneth E.

    1978-01-01

    Examines the role of sociobiology in explaining human behavior. Recommends that sociobiologists consider both biogenetics (DNA and information coded in the genes) and noogenetics (process by which learned structures are transmitted from one generation to the next). (Author/DB)

  5. An Incomplete Paradigm

    ERIC Educational Resources Information Center

    Boulding, Kenneth E.

    1978-01-01

    Examines the role of sociobiology in explaining human behavior. Recommends that sociobiologists consider both biogenetics (DNA and information coded in the genes) and noogenetics (process by which learned structures are transmitted from one generation to the next). (Author/DB)

  6. [Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

    PubMed

    Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

    2011-01-01

    The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.

  7. DNA sequence-based "bar codes" for tracking the origins of expressed sequence tags from a maize cDNA library constructed using multiple mRNA sources.

    PubMed

    Qiu, Fang; Guo, Ling; Wen, Tsui-Jung; Liu, Feng; Ashlock, Daniel A; Schnable, Patrick S

    2003-10-01

    To enhance gene discovery, expressed sequence tag (EST) projects often make use of cDNA libraries produced using diverse mixtures of mRNAs. As such, expression data are lost because the origins of the resulting ESTs cannot be determined. Alternatively, multiple libraries can be prepared, each from a more restricted source of mRNAs. Although this approach allows the origins of ESTs to be determined, it requires the production of multiple libraries. A hybrid approach is reported here. A cDNA library was prepared using 21 different pools of maize (Zea mays) mRNAs. DNA sequence "bar codes" were added during first-strand cDNA synthesis to uniquely identify the mRNA source pool from which individual cDNAs were derived. Using a decoding algorithm that included error correction, it was possible to identify the source mRNA pool of more than 97% of the ESTs. The frequency at which a bar code is represented in an EST contig should be proportional to the abundance of the corresponding mRNA in the source pool. Consistent with this, all ESTs derived from several genes (zein and adh1) that are known to be exclusively expressed in kernels or preferentially expressed under anaerobic conditions, respectively, were exclusively tagged with bar codes associated with mRNA pools prepared from kernel and anaerobically treated seedlings, respectively. Hence, by allowing for the retention of expression data, the bar coding of cDNA libraries can enhance the value of EST projects.

  8. Isolation and sequence analysis of Clpg1, a gene coding for an endopolygalacturonase of the phytopathogenic fungus Colletotrichum lindemuthianum.

    PubMed

    Centis, S; Dumas, B; Fournier, J; Marolda, M; Esquerré-Tugayé, M T

    1996-04-17

    Oligodeoxyribonucleotide primers designed from the N-terminal amino acid (aa) sequence of the endopolygalacturonase (EndoPG) of Colletotrichum lindemuthianum (Cl) race beta and from an internal sequence conserved among different fungal EndoPG were used in a polymerase chain reaction (PCR) to amplify genomic related sequences of the fungus. A 542-bp fragment, designated pgA, was obtained and used as a probe to screen a partial genomic library of Cl. Among the positive clones, one was further analyzed. Nucleotide sequencing of this clone revealed on ORF encoding a 363-amino-acid (aa) polypeptide beginning with a signal peptide of 26 aa interrupted by an intron of 70 bp, and showing a high degree of homology to ten fungal EndoPG sequences. Consensus sequences were identified in the 5' non-coding region. This genomic clone was thereafter designated Clpg1. Southern analysis, performed with a Clpg1-specific probe, showed that this gene is present as a single copy in the Cl genome.

  9. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  10. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  11. An ultra-sparse code underliesthe generation of neural sequences in a songbird

    NASA Astrophysics Data System (ADS)

    Hahnloser, Richard H. R.; Kozhevnikov, Alexay A.; Fee, Michale S.

    2002-09-01

    Sequences of motor activity are encoded in many vertebrate brains by complex spatio-temporal patterns of neural activity; however, the neural circuit mechanisms underlying the generation of these pre-motor patterns are poorly understood. In songbirds, one prominent site of pre-motor activity is the forebrain robust nucleus of the archistriatum (RA), which generates stereotyped sequences of spike bursts during song and recapitulates these sequences during sleep. We show that the stereotyped sequences in RA are driven from nucleus HVC (high vocal centre), the principal pre-motor input to RA. Recordings of identified HVC neurons in sleeping and singing birds show that individual HVC neurons projecting onto RA neurons produce bursts sparsely, at a single, precise time during the RA sequence. These HVC neurons burst sequentially with respect to one another. We suggest that at each time in the RA sequence, the ensemble of active RA neurons is driven by a subpopulation of RA-projecting HVC neurons that is active only at that time. As a population, these HVC neurons may form an explicit representation of time in the sequence. Such a sparse representation, a temporal analogue of the `grandmother cell' concept for object recognition, eliminates the problem of temporal interference during sequence generation and learning attributed to more distributed representations.

  12. Complete Coding Sequence of Usutu Virus Strain Gracula religiosa/U1609393/Belgium/2016 Obtained from the Brain Tissue of an Infected Captive Common Hill Myna (Gracula religiosa)

    PubMed Central

    Lambrecht, Bénédicte; Vandenbussche, Frank; Steensels, Mieke

    2017-01-01

    ABSTRACT The complete and annotated coding sequence and partial noncoding sequence of an Usutu virus genome were sequenced from RNA extracted from a clinical brain tissue sample obtained from a common hill myna (Gracula religiosa), demonstrating close homology with Usutu viruses circulating in Europe. PMID:28336592

  13. Next-Generation Sequencing of Protein-Coding and Long Non-protein-Coding RNAs in Two Types of Exosomes Derived from Human Whole Saliva.

    PubMed

    Ogawa, Yuko; Tsujimoto, Masafumi; Yanoshita, Ryohei

    2016-01-01

    Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators.

  14. Targeted next-generation sequencing and non-coding RNA expression analysis of clear cell papillary renal cell carcinoma suggests distinct pathological mechanisms from other renal tumour subtypes.

    PubMed

    Lawrie, Charles H; Larrea, Erika; Larrinaga, Gorka; Goicoechea, Ibai; Arestin, María; Fernandez-Mercado, Marta; Hes, Ondrej; Cáceres, Francisco; Manterola, Lorea; López, José I

    2014-01-01

    Clear cell tubulopapillary renal cell carcinoma (CCPRCC) is a recently described rare renal malignancy that displays characteristic gross, microscopic and immunohistochemical differences from other renal tumour types. However, CCPRCC remains a very poorly understood entity. We therefore sought to elucidate some of the molecular mechanisms involved in this neoplasm by carrying out targeted next-generation sequencing (NGS) to identify associated mutations, and in addition examined the expression of non-coding (nc) RNAs. We identified multiple somatic mutations in CCPRCC cases, including a recurrent [3/14 cases (21%)] non-synonymous T992I mutation in the MET proto-oncogene, a gene associated with epithelial-to-mesenchymal transition (EMT). Using a microarray approach, we found that the expression of mature (n = 1105) and pre-miRNAs (n = 1105), as well as snoRNA and scaRNAs (n = 2214), in CCPRCC cases differed from that of clear cell renal cell carcinoma (CCRCC) or papillary renal cell carcinoma (PRCC) tumours. Surprisingly, and unlike other renal tumour subtypes, we found that all five members of the miR-200 family were over-expressed in CCPRCC cases. As these miRNAs are intimately involved with EMT, we stained CCPRCC cases for E-cadherin, vimentin and β-catenin and found that the tumour cells of all cases were positive for all three markers, a combination rarely reported in other renal tumours that could have diagnostic implications. Taken together with the mutational analysis, these data suggest that EMT in CCPRCC tumour cells is incomplete or blocked, consistent with the indolent clinical course typical of this malignancy. In summary, as well as describing a novel pathological mechanism in renal carcinomas, this study adds to the mounting evidence that CCPRCC should be formally considered a distinct entity. Microarray data have been deposited in the GEO database [GEO accession number (GSE51554)]. Copyright © 2013 Pathological Society of Great Britain and Ireland

  15. Bar-coded, multiplexed sequencing of targeted DNA regions using the Illumina Genome Analyzer.

    PubMed

    Szelinger, Szabolcs; Kurdoglu, Ahmet; Craig, David W

    2011-01-01

    To date, genome-wide association (GWA) studies, in which thousands of markers throughout the genome are simultaneously genotyped, have identified hundreds of loci underlying disease susceptibility. These regions typically span 5-100 kb, and resequencing efforts to identify potential functional variants within these loci represent the next logical step in the genetic characterization pipeline. Next-generation DNA sequencing technologies are, in principle, well-suited for this task, yet despite the massive sequencing capability afforded by these platforms, the present-day reality is that it remains difficult, time-consuming, and expensive to resequence large numbers of samples across moderately sized genomic regions. To address this obstacle, we developed a generalized framework for multiplexed resequencing of targeted regions of the human genome on the Illumina Genome Analyzer using degenerate, indexed DNA sequence barcodes ligated to fragmented DNA prior to sequencing. Using this method, the DNA of multiple individuals can be simultaneously sequenced at several regions. We find that achieving adequate coverage is one of the most important factors in the design of an experiment, but other key considerations include whether the objective is to discover genetic variants for genotyping later by a separate method, to genotype all identified variants by sequencing, or to exhaustively identify all common and rare variants in the region. Given the massive bandwidth of next-generation sequencing technologies and their low inherent throughput in terms of sequencing arrays per week, multiplexed sequencing using the barcoding approach offers a clear mechanism for focusing bandwidth to a smaller region across many more individuals or samples.

  16. HLA-F coding and regulatory segments variability determined by massively parallel sequencing procedures in a Brazilian population sample.

    PubMed

    Lima, Thálitta Hetamaro Ayala; Buttura, Renato Vidal; Donadi, Eduardo Antônio; Veiga-Castelli, Luciana Caricati; Mendes-Junior, Celso Teixeira; Castelli, Erick C

    2016-10-01

    Human Leucocyte Antigen F (HLA-F) is a non-classical HLA class I gene distinguished from its classical counterparts by low allelic polymorphism and distinctive expression patterns. Its exact function remains unknown. It is believed that HLA-F has tolerogenic and immune modulatory properties. Currently, there is little information regarding the HLA-F allelic variation among human populations and the available studies have evaluated only a fraction of the HLA-F gene segment and/or have searched for known alleles only. Here we present a strategy to evaluate the complete HLA-F variability including its 5' upstream, coding and 3' downstream segments by using massively parallel sequencing procedures. HLA-F variability was surveyed on 196 individuals from the Brazilian Southeast. The results indicate that the HLA-F gene is indeed conserved at the protein level, where thirty coding haplotypes or coding alleles were detected, encoding only four different HLA-F full-length protein molecules. Moreover, a same protein molecule is encoded by 82.45% of all coding alleles detected in this Brazilian population sample. However, the HLA-F nucleotide and haplotype variability is much higher than our current knowledge both in Brazilians and considering the 1000 Genomes Project data. This protein conservation is probably a consequence of the key role of HLA-F in the immune system physiology.

  17. Hybridization Capture-Based Next-Generation Sequencing to Evaluate Coding Sequence and Deep Intronic Mutations in the NF1 Gene

    PubMed Central

    Cunha, Karin Soares; Oliveira, Nathalia Silva; Fausto, Anna Karoline; de Souza, Carolina Cruz; Gros, Audrey; Bandres, Thomas; Idrissi, Yamina; Merlio, Jean-Philippe; de Moura Neto, Rodrigo Soares; Silva, Rosane; Geller, Mauro; Cappellen, David

    2016-01-01

    Neurofibromatosis 1 (NF1) is one of the most common genetic disorders and is caused by mutations in the NF1 gene. NF1 gene mutational analysis presents a considerable challenge because of its large size, existence of highly homologous pseudogenes located throughout the human genome, absence of mutational hotspots, and diversity of mutations types, including deep intronic splicing mutations. We aimed to evaluate the use of hybridization capture-based next-generation sequencing to screen coding and noncoding NF1 regions. Hybridization capture-based next-generation sequencing, with genomic DNA as starting material, was used to sequence the whole NF1 gene (exons and introns) from 11 unrelated individuals and 1 relative, who all had NF1. All of them met the NF1 clinical diagnostic criteria. We showed a mutation detection rate of 91% (10 out of 11). We identified eight recurrent and two novel mutations, which were all confirmed by Sanger methodology. In the Sanger sequencing confirmation, we also included another three relatives with NF1. Splicing alterations accounted for 50% of the mutations. One of them was caused by a deep intronic mutation (c.1260 + 1604A > G). Frameshift truncation and missense mutations corresponded to 30% and 20% of the pathogenic variants, respectively. In conclusion, we show the use of a simple and fast approach to screen, at once, the entire NF1 gene (exons and introns) for different types of pathogenic variations, including the deep intronic splicing mutations. PMID:27999334

  18. Hybridization Capture-Based Next-Generation Sequencing to Evaluate Coding Sequence and Deep Intronic Mutations in the NF1 Gene.

    PubMed

    Cunha, Karin Soares; Oliveira, Nathalia Silva; Fausto, Anna Karoline; de Souza, Carolina Cruz; Gros, Audrey; Bandres, Thomas; Idrissi, Yamina; Merlio, Jean-Philippe; de Moura Neto, Rodrigo Soares; Silva, Rosane; Geller, Mauro; Cappellen, David

    2016-12-17

    Neurofibromatosis 1 (NF1) is one of the most common genetic disorders and is caused by mutations in the NF1 gene. NF1 gene mutational analysis presents a considerable challenge because of its large size, existence of highly homologous pseudogenes located throughout the human genome, absence of mutational hotspots, and diversity of mutations types, including deep intronic splicing mutations. We aimed to evaluate the use of hybridization capture-based next-generation sequencing to screen coding and noncoding NF1 regions. Hybridization capture-based next-generation sequencing, with genomic DNA as starting material, was used to sequence the whole NF1 gene (exons and introns) from 11 unrelated individuals and 1 relative, who all had NF1. All of them met the NF1 clinical diagnostic criteria. We showed a mutation detection rate of 91% (10 out of 11). We identified eight recurrent and two novel mutations, which were all confirmed by Sanger methodology. In the Sanger sequencing confirmation, we also included another three relatives with NF1. Splicing alterations accounted for 50% of the mutations. One of them was caused by a deep intronic mutation (c.1260 + 1604A > G). Frameshift truncation and missense mutations corresponded to 30% and 20% of the pathogenic variants, respectively. In conclusion, we show the use of a simple and fast approach to screen, at once, the entire NF1 gene (exons and introns) for different types of pathogenic variations, including the deep intronic splicing mutations.

  19. Beta.-glucosidase coding sequences and protein from orpinomyces PC-2

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong; Ximenes, Eduardo A.

    2001-02-06

    Provided is a novel .beta.-glucosidase from Orpinomyces sp. PC2, nucleotide sequences encoding the mature protein and the precursor protein, and methods for recombinant production of this .beta.-glucosidase.

  20. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago.

  1. Functional Divergence of APETALA1 and FRUITFULL is due to Changes in both Regulation and Coding Sequence

    PubMed Central

    McCarthy, Elizabeth W.; Mohamed, Abeer; Litt, Amy

    2015-01-01

    Gene duplications are prevalent in plants, and functional divergence subsequent to duplication may be linked with the occurrence of novel phenotypes in plant evolution. Here, we examine the functional divergence of Arabidopsis thaliana APETALA1 (AP1) and FRUITFULL (FUL), which arose via a duplication correlated with the origin of the core eudicots. Both AP1 and FUL play a role in floral meristem identity, but AP1 is required for the formation of sepals and petals whereas FUL is involved in cauline leaf and fruit development. AP1 and FUL are expressed in mutually exclusive domains but also differ in sequence, with unique conserved motifs in the C-terminal domains of the proteins that suggest functional differentiation. To determine whether the functional divergence of AP1 and FUL is due to changes in regulation or changes in coding sequence, we performed promoter swap experiments, in which FUL was expressed in the AP1 domain in the ap1 mutant and vice versa. Our results show that FUL can partially substitute for AP1, and AP1 can partially substitute for FUL; thus, the functional divergence between AP1 and FUL is due to changes in both regulation and coding sequence. We also mutated AP1 and FUL conserved motifs to determine if they are required for protein function and tested the ability of these mutated proteins to interact in yeast with known partners. We found that these motifs appear to play at best a minor role in protein function and dimerization capability, despite being strongly conserved. Our results suggest that the functional differentiation of these two paralogous key transcriptional regulators involves both differences in regulation and in sequence; however, sequence changes in the form of unique conserved motifs do not explain the differences observed. PMID:26697035

  2. [Evolution of the genetic code and earliest proteins. Reconstruction from the current sequences].

    PubMed

    Trifonov, E N

    2002-01-01

    One would expect that present-day protein sequences have changed many times during their evolution, at every point, so that there is no chance to recognize in the sequences any traces of their ancient organization. It turns out to be not true. Massive analysis of complete genomes of bacteria allows one to derive, according to very specific predictions, distinct features of very early sequences and to outline the history of evolution protein. Modern proteins appear to have evolved from short peptides of mixed sequences of two alphabet types. They were then closed to sequences of optimal size from which modern folds/domains and multidomain proteins were formed. The reconstruction of amino acid and codon chronology is described. A specific idea on the nature and evolutionary significance of gene splicing is suggested. The gene splicing, while obeying the rules of basic structural organization of proteins, offers accessibility to regions of sequence space that could not be reached by mutational changes typical for prokaryotes.

  3. Prediction and identification of sequences coding for orphan enzymes using genomic and metagenomic neighbours

    PubMed Central

    Yamada, Takuji; Waller, Alison S; Raes, Jeroen; Zelezniak, Aleksej; Perchat, Nadia; Perret, Alain; Salanoubat, Marcel; Patil, Kiran R; Weissenbach, Jean; Bork, Peer

    2012-01-01

    Despite the current wealth of sequencing data, one-third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16 345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome-scale metabolic models with these new sequence–function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction. PMID:22569339

  4. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    NASA Astrophysics Data System (ADS)

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

  5. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    PubMed Central

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance. PMID:28139710

  6. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  7. DNA methylation of miRNA coding sequences putatively associated with childhood obesity.

    PubMed

    Mansego, M L; Garcia-Lacarte, M; Milagro, F I; Marti, A; Martinez, J A

    2017-02-01

    Epigenetic mechanisms may be involved in obesity onset and its consequences. The aim of the present study was to evaluate whether DNA methylation status in microRNA (miRNA) coding regions is associated with childhood obesity. DNA isolated from white blood cells of 24 children (identification sample: 12 obese and 12 non-obese) from the Grupo Navarro de Obesidad Infantil study was hybridized in a 450 K methylation microarray. Several CpGs whose DNA methylation levels were statistically different between obese and non-obese were validated by MassArray® in 95 children (validation sample) from the same study. Microarray analysis identified 16 differentially methylated CpGs between both groups (6 hypermethylated and 10 hypomethylated). DNA methylation levels in miR-1203, miR-412 and miR-216A coding regions significantly correlated with body mass index standard deviation score (BMI-SDS) and explained up to 40% of the variation of BMI-SDS. The network analysis identified 19 well-defined obesity-relevant biological pathways from the KEGG database. MassArray® validation identified three regions located in or near miR-1203, miR-412 and miR-216A coding regions differentially methylated between obese and non-obese children. The current work identified three CpG sites located in coding regions of three miRNAs (miR-1203, miR-412 and miR-216A) that were differentially methylated between obese and non-obese children, suggesting a role of miRNA epigenetic regulation in childhood obesity. © 2016 World Obesity Federation.

  8. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences

    PubMed Central

    Miyashita, Toshio; Lee, Daniel J.; Smith, Katherine A.; Feldman, Daniel E.

    2016-01-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5–20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5–10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses. PMID:27574970

  9. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  10. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence.

    PubMed

    Gordon, Kacy L; Arthur, Robert K; Ruvinsky, Ilya

    2015-05-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements.

  11. Short Time-Scale Sensory Coding in S1 during Discrimination of Whisker Vibrotactile Sequences.

    PubMed

    McGuire, Leah M; Telian, Gregory; Laboy-Juárez, Keven J; Miyashita, Toshio; Lee, Daniel J; Smith, Katherine A; Feldman, Daniel E

    2016-08-01

    Rodent whisker input consists of dense microvibration sequences that are often temporally integrated for perceptual discrimination. Whether primary somatosensory cortex (S1) participates in temporal integration is unknown. We trained rats to discriminate whisker impulse sequences that varied in single-impulse kinematics (5-20-ms time scale) and mean speed (150-ms time scale). Rats appeared to use the integrated feature, mean speed, to guide discrimination in this task, consistent with similar prior studies. Despite this, 52% of S1 units, including 73% of units in L4 and L2/3, encoded sequences at fast time scales (≤20 ms, mostly 5-10 ms), accurately reflecting single impulse kinematics. 17% of units, mostly in L5, showed weaker impulse responses and a slow firing rate increase during sequences. However, these units did not effectively integrate whisker impulses, but instead combined weak impulse responses with a distinct, slow signal correlated to behavioral choice. A neural decoder could identify sequences from fast unit spike trains and behavioral choice from slow units. Thus, S1 encoded fast time scale whisker input without substantial temporal integration across whisker impulses.

  12. Incomplete periacetabular acetabuloplasty

    PubMed Central

    2014-01-01

    Background Residual acetabular dysplasia is one of the most common complications after treatment for developmental dysplasia of the hip. The acetabular growth response after reduction of a dislocated hip varies. The options are to wait and add a redirectional osteotomy as a secondary procedure at an older age, or to perform a primary acetabuloplasty at the time of the open reduction to stimulate acetabular development. We present the early results of such a procedure—open reduction and an incomplete periacetabular acetabuloplasty—as a one-stop procedure for developmental dysplasia of the hip. Patients and methods We retrospectively reviewed the results obtained with 55 hips (in 48 patients, 43 of them girls) treated between September 2004 and February 2011. This cohort included late presentations and failures of nonoperative treatment and excluded unsuccessful previous surgical treatment (including closed reductions), neuromuscular disease, and other teratological conditions. Patients were treated once the ossific nucleus was present or when they reached one year of age. 31 cases were late presentations while 17 represented failures of nonoperative treatment. The mean age of the patients at surgery was 1.3 (0.6–2.6) years. The mean follow-up period was 4 (2–8) years. According to the IHDI classification, 1 was grade I, 9 were grade II, 13 were grade III, and 32 were grade IV. Results The mean acetabular index fell from 38 (23–49) preoperatively to 21 (10–27) at the last follow-up. There were no infections, nerve palsies, or graft extrusions. None of the cases required secondary surgery for residual acetabular dysplasia. 8 patients developed avascular necrosis (AVN) of grade II or more. The incidence of AVN was significantly associated with previous, failed nonoperative treatment. 1 patient developed coxa magna requiring shelf arthroplasty 4 years after the index procedure and 1 patient with lateral growth arrest required medial screw epiphysiodesis

  13. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives

    PubMed Central

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  14. Next-gen sequencing identifies non-coding variation disrupting miRNA-binding sites in neurological disorders.

    PubMed

    Devanna, P; Chen, X S; Ho, J; Gajewski, D; Smith, S D; Gialluisi, A; Francks, C; Fisher, S E; Newbury, D F; Vernes, S C

    2017-03-14

    Understanding the genetic factors underlying neurodevelopmental and neuropsychiatric disorders is a major challenge given their prevalence and potential severity for quality of life. While large-scale genomic screens have made major advances in this area, for many disorders the genetic underpinnings are complex and poorly understood. To date the field has focused predominantly on protein coding variation, but given the importance of tightly controlled gene expression for normal brain development and disorder, variation that affects non-coding regulatory regions of the genome is likely to play an important role in these phenotypes. Herein we show the importance of 3 prime untranslated region (3'UTR) non-coding regulatory variants across neurodevelopmental and neuropsychiatric disorders. We devised a pipeline for identifying and functionally validating putatively pathogenic variants from next generation sequencing (NGS) data. We applied this pipeline to a cohort of children with severe specific language impairment (SLI) and identified a functional, SLI-associated variant affecting gene regulation in cells and post-mortem human brain. This variant and the affected gene (ARHGEF39) represent new putative risk factors for SLI. Furthermore, we identified 3'UTR regulatory variants across autism, schizophrenia and bipolar disorder NGS cohorts demonstrating their impact on neurodevelopmental and neuropsychiatric disorders. Our findings show the importance of investigating non-coding regulatory variants when determining risk factors contributing to neurodevelopmental and neuropsychiatric disorders. In the future, integration of such regulatory variation with protein coding changes will be essential for uncovering the genetic causes of complex neurological disorders and the fundamental mechanisms underlying health and disease.Molecular Psychiatry advance online publication, 14 March 2017; doi:10.1038/mp.2017.30.

  15. Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

    PubMed Central

    Zou, James; Valiant, Gregory; Valiant, Paul; Karczewski, Konrad; Chan, Siu On; Samocha, Kaitlin; Lek, Monkol; Sunyaev, Shamil; Daly, Mark; MacArthur, Daniel G.

    2016-01-01

    As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation. PMID:27796292

  16. Molecular differentiation of Nosema apis and Nosema ceranae based on species-specific sequence differences in a protein coding gene.

    PubMed

    Gisder, Sebastian; Genersch, Elke

    2013-05-01

    Nosema apis and Nosema ceranae are two microsporidian pathogens of the European honey bee, Apis mellifera. There is evidence that N. ceranae is more virulent than N. apis subject to environmental factors like climate. This makes N. ceranae one of the suspects in the increasing colony losses recently observed in many regions of the world. Correct differentiation between N. apis and N. ceranae is important and best accomplished by molecular methods. So far only protocols based on species-specific sequence differences in the 16S rRNA gene are available. However, recent studies indicated that these methods may lead to confusing results due to polymorphisms in and recombination between the multi-copy 16S rRNA genes. To solve this problem and to provide a reliable molecular tool for the differentiation between the two bee pathogenic microsporidia we here present and evaluate a duplex-PCR protocol based on species-specific sequence differences in the highly conserved gene coding for the DNA-dependent RNA polymerase II largest subunit. A total of 102 honey bee samples were analyzed by the novel PCR protocol and the results were compared with the results of the originally published PCR-RFLP analysis and two recently published differentiation protocols, based on 16S rRNA sequence differences. Although the novel PCR protocol proved to be as reliable as the 16S rRNA gene based PCR-RFLP it was superior to simple 16S rRNA based PCR protocols which tended to overestimate the rate of N. ceranae infections. Therefore, we propose that species-specific sequence differences of highly conserved protein coding genes should become the preferred molecular tool for differentiation of Nosema spp. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Sequencing of HLA class II genes based on the conserved diversity of the non-coding regions: sequencing based typing of HLA-DRB genes.

    PubMed

    Kotsch, K; Wehling, J; Blasczyk, R

    1999-05-01

    In this paper, we present a novel sequencing based typing strategy for the HLA-DRB1, 3, 4 and 5 loci. The new approach is based on a group-specific amplification from intron 1 to intron 2 according to the serologically-defined antigens. For this purpose, we have determined the 3' 500 bp-fragment of intron 1 and the 5' 340 bp-fragment of intron 2 of all serological antigens and their most frequent subtypes. We discovered a remarkably conserved diversity characterized by lineage-specific sequence motifs. This lineage-specificity of non-coding motifs in the 1st and 2nd intron offered the possibility to establish a clear serology-related amplification strategy. The method allows the complete analysis of the 2nd exon and the definition of the cis/trans linkage of sequence motifs by intron-mediated polymerase chain reaction (PCR)-based separation of the haplotypes in nearly all serologically heterozygous samples. In particular, the non-coding variabilities between the DR52-associated DRB1 groups made their independent amplification possible. Thus, compared to the standard procedures using exon-based amplification primers, the groups DR3, DR12, some DR13 alleles (1301, 1302) and the DR14 group could be amplified by specific primer mixes. The DR8 could be amplified with an individual primer mix not co-amplifying the DR12. The DR11 and DR13 did not show any individual motif in intron 1 or intron 2. In order to achieve a separate amplification, they had to be amplified by multispecific primer mixes (DR3/11/13/14; DR3/11/13 or DR11/13/14) excluding the other haplotype. Thus, exclusively the alleles in rare DR11,13 heterozygosities without a DRB1*1301 or 1302 could not be amplified separately. Fourteen primer mixes are used to amplify the specificities DR1-14, and 6 primer mixes for the specificities DR51-53. The sequence homology of the 3' end of intron 1 facilitated the application of only three different sequencing primers for all DRB alleles.

  18. Sequence diversity and positive selection at the Duffy-binding protein genes of Plasmodium knowlesi and P. cynomolgi: Analysis of the complete coding sequences of Thai isolates.

    PubMed

    Putaporntip, Chaturong; Kuamsab, Napaporn; Jongwutiwes, Somchai

    2016-10-01

    Plasmodium knowlesi and P. cynomolgi are simian malaria parasites capable of causing symptomatic human infections. The interaction between the Duffy binding protein alpha on P. knowlesi merozoite and the Duffy-antigen receptor for chemokine (DARC) on human and macaque erythrocyte membrane is prerequisite for establishment of blood stage infection whereas DARC is not required for erythrocyte invasion by P. cynomolgi. To gain insights into the evolution of the PkDBP gene family comprising PkDBPα, PkDBPβ and PkDBPγ, and a member of the DBP gene family of P. cynomolgi (PcyDBP1), the complete coding sequences of these genes were analyzed from Thai field isolates and compared with the publicly available DBP sequences of P. vivax (PvDBP). The complete coding sequences of PkDBPα (n=11), PkDBPβ (n=11), PkDBPγ (n=10) and PcyDBP1 (n=11) were obtained from direct sequencing of the PCR products. Nucleotide diversity of DBP is highly variable across malaria species. PcyDBP1 displayed the greatest level of nucleotide diversity while all PkDBP gene members exhibited comparable levels of diversity. Positive selection occurred in domains I, II and IV of PvDBP and in domain V of PcyDBP1. Although deviation from neutrality was not detected in domain II of PkDBPα, a signature of positive selection was identified in the putative DARC binding site in this domain. The DBP gene families seem to have arisen following the model of concerted evolution because paralogs rather than orthologs are clustered in the phylogenetic tree. The presence of identical or closely related repeats exclusive for the PkDBP gene family suggests that duplication of gene members postdated their divergence from the ancestral PcyDBP and PvDBP lineages. Intragenic recombination was detected in all DBP genes of these malaria species. Despite the limited number of isolates, P. knowlesi from Thailand shared phylogenetically related domain II sequences of both PkDBPα and PkDBPγ with those from Peninsular

  19. Molecular phylogenetic analysis in Hammondia-like organisms based on partial Hsp70 coding sequences

    USDA-ARS?s Scientific Manuscript database

    The 70-kDa heat shock protein (Hsp70) sequences are considered one of the most conserved proteins in all domain of life from Archaea to eukaryotes. Hammondia heydorni, H. hammondi, Toxoplasma gondii, Neospora hughesi and N. caninum (Hammondia-like organisms) are closely related tissue cyst-forming c...

  20. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  1. Full-length coding sequences of three major histocompatibility complex class I-related chain A alleles, MICA*019, MICA*027 and MICA*045, identified by sequence-based typing in Chinese individuals.

    PubMed

    Xu, Y P; Gao, S Q; Tao, H

    2015-10-01

    Full-length coding sequences of three major histocompatibility complex class I-related chain A alleles, MICA*019, MICA*027 and MICA*045. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  2. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  3. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  4. Polarity effects in the hisG gene of salmonella require a site within the coding sequence.

    PubMed

    Ciampi, M S; Roth, J R

    1988-02-01

    A single site in the middle of the coding sequence of the hisG gene of Salmonella is required for most of the polar effect of mutations in this gene. Nonsense and insertion mutations mapping upstream of this point in the hisG gene all have strong polar effects on expression of downstream genes in the operon; mutations mapping promotor distal to this site have little or no polar effect. Two previously known hisG mutations, mapping in the region of the polarity site, abolish the polarity effect of insertion mutations mapping upstream of this region. New polarity site mutations have been selected which have lost the polar effect of upstream nonsense mutations. All mutations abolishing the function of the site are small deletions; three are identical, 28-bp deletions which have arisen independently. A fourth mutation is a deletion of 16 base pairs internal to the larger deletion. Several point mutations within this 16-bp region have no effect on the function of the polarity site. We believe that a small number of polarity sites of this type are responsible for polarity in all genes. The site in the hisG gene is more easily detected than most because it appears to be the only such site in the hisG gene and because it maps in the center of the coding sequence.

  5. The genomic nucleotide sequences of two differentially expressed actin-coding genes from the sea star Pisaster ochraceus.

    PubMed

    Kowbel, D J; Smith, M J

    1989-04-30

    The genomic sequences of two differentially expressed actin genes from the sea star Pisaster ochraceus are reported. The cytoplasmic actin gene (Cy) is expressed in eggs and early development. The muscle actin gene (M) is expressed in tube feet and testes. Both genes contain an 1125-nucleotide coding region interrupted by three introns at codons 41, 121 and 204. Gene M contains two additional introns at codons 150 and 267. The intron position at codon 150, although present in higher vertebrate actins, has not been reported in actin genes from invertebrates. The M gene coding region has 89.5% nucleotide homology to the Cy gene, and differs from the Cy actin gene in 13 of 375 amino acids (aa), 11 of which are found in the C-terminal half of the gene. The C-terminal half of the M gene contains a significant number of muscle isotype codons. Even though there is only 1 aa change in the first 150 codons, there have been limited substitutions at many four-fold degenerate sites which may indicate selection pressure upon the secondary structure of the mRNA and/or a biased codon usage. Variant CCAAT, TATA, and poly(A)-addition signals have been identified in the 5' and 3' flanking regions. The presence of 5' and 3' splice junction sequences in the 5' flanking region of the Cy gene suggests the potential for an intron there.

  6. A unified mathematical framework for coding time, space, and sequences in the hippocampal region.

    PubMed

    Howard, Marc W; MacDonald, Christopher J; Tiganj, Zoran; Shankar, Karthik H; Du, Qian; Hasselmo, Michael E; Eichenbaum, Howard

    2014-03-26

    The medial temporal lobe (MTL) is believed to support episodic memory, vivid recollection of a specific event situated in a particular place at a particular time. There is ample neurophysiological evidence that the MTL computes location in allocentric space and more recent evidence that the MTL also codes for time. Space and time represent a similar computational challenge; both are variables that cannot be simply calculated from the immediately available sensory information. We introduce a simple mathematical framework that computes functions of both spatial location and time as special cases of a more general computation. In this framework, experience unfolding in time is encoded via a set of leaky integrators. These leaky integrators encode the Laplace transform of their input. The information contained in the transform can be recovered using an approximation to the inverse Laplace transform. In the temporal domain, the resulting representation reconstructs the temporal history. By integrating movements, the equations give rise to a representation of the path taken to arrive at the present location. By modulating the transform with information about allocentric velocity, the equations code for position of a landmark. Simulated cells show a close correspondence to neurons observed in various regions for all three cases. In the temporal domain, novel secondary analyses of hippocampal time cells verified several qualitative predictions of the model. An integrated representation of spatiotemporal context can be computed by taking conjunctions of these elemental inputs, leading to a correspondence with conjunctive neural representations observed in dorsal CA1.

  7. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles

    PubMed Central

    Rodrigue, Nicolas; Philippe, Hervé; Lartillot, Nicolas

    2010-01-01

    Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory. PMID:20176949

  8. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese

    PubMed Central

    Tang, Clara S.; Zhang, He; Cheung, Chloe Y. Y.; Xu, Ming; Ho, Jenny C. Y.; Zhou, Wei; Cherny, Stacey S.; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M.; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H. Y.; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S. M.; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C. B.; Hveem, Kristian; Cheung, Bernard M. Y.; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K.; Huo, Yong; Sham, Pak C.; Lam, Karen S. L.; Willer, Cristen J.; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10−7), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci—PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser—also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  9. A Unified Mathematical Framework for Coding Time, Space, and Sequences in the Hippocampal Region

    PubMed Central

    MacDonald, Christopher J.; Tiganj, Zoran; Shankar, Karthik H.; Du, Qian; Hasselmo, Michael E.; Eichenbaum, Howard

    2014-01-01

    The medial temporal lobe (MTL) is believed to support episodic memory, vivid recollection of a specific event situated in a particular place at a particular time. There is ample neurophysiological evidence that the MTL computes location in allocentric space and more recent evidence that the MTL also codes for time. Space and time represent a similar computational challenge; both are variables that cannot be simply calculated from the immediately available sensory information. We introduce a simple mathematical framework that computes functions of both spatial location and time as special cases of a more general computation. In this framework, experience unfolding in time is encoded via a set of leaky integrators. These leaky integrators encode the Laplace transform of their input. The information contained in the transform can be recovered using an approximation to the inverse Laplace transform. In the temporal domain, the resulting representation reconstructs the temporal history. By integrating movements, the equations give rise to a representation of the path taken to arrive at the present location. By modulating the transform with information about allocentric velocity, the equations code for position of a landmark. Simulated cells show a close correspondence to neurons observed in various regions for all three cases. In the temporal domain, novel secondary analyses of hippocampal time cells verified several qualitative predictions of the model. An integrated representation of spatiotemporal context can be computed by taking conjunctions of these elemental inputs, leading to a correspondence with conjunctive neural representations observed in dorsal CA1. PMID:24672015

  10. Acoustic radiation force impulse (ARFI) imaging of zebrafish embryo by high-frequency coded excitation sequence.

    PubMed

    Park, Jinhyoung; Lee, Jungwoo; Lau, Sien Ting; Lee, Changyang; Huang, Ying; Lien, Ching-Ling; Kirk Shung, K

    2012-04-01

    Acoustic radiation force impulse (ARFI) imaging has been developed as a non-invasive method for quantitative illustration of tissue stiffness or displacement. Conventional ARFI imaging (2-10 MHz) has been implemented in commercial scanners for illustrating elastic properties of several organs. The image resolution, however, is too coarse to study mechanical properties of micro-sized objects such as cells. This article thus presents a high-frequency coded excitation ARFI technique, with the ultimate goal of displaying elastic characteristics of cellular structures. Tissue mimicking phantoms and zebrafish embryos are imaged with a 100-MHz lithium niobate (LiNbO₃) transducer, by cross-correlating tracked RF echoes with the reference. The phantom results show that the contrast of ARFI image (14 dB) with coded excitation is better than that of the conventional ARFI image (9 dB). The depths of penetration are 2.6 and 2.2 mm, respectively. The stiffness data of the zebrafish demonstrate that the envelope is harder than the embryo region. The temporal displacement change at the embryo and the chorion is as large as 36 and 3.6 μm. Consequently, this high-frequency ARFI approach may serve as a remote palpation imaging tool that reveals viscoelastic properties of small biological samples.

  11. Natural image sequences constrain dynamic receptive fields and imply a sparse code.

    PubMed

    Häusler, Chris; Susemihl, Alex; Nawrot, Martin P

    2013-11-06

    In their natural environment, animals experience a complex and dynamic visual scenery. Under such natural stimulus conditions, neurons in the visual cortex employ a spatially and temporally sparse code. For the input scenario of natural still images, previous work demonstrated that unsupervised feature learning combined with the constraint of sparse coding can predict physiologically measured receptive fields of simple cells in the primary visual cortex. This convincingly indicated that the mammalian visual system is adapted to the natural spatial input statistics. Here, we extend this approach to the time domain in order to predict dynamic receptive fields that can account for both spatial and temporal sparse activation in biological neurons. We rely on temporal restricted Boltzmann machines and suggest a novel temporal autoencoding training procedure. When tested on a dynamic multi-variate benchmark dataset this method outperformed existing models of this class. Learning features on a large dataset of natural movies allowed us to model spatio-temporal receptive fields for single neurons. They resemble temporally smooth transformations of previously obtained static receptive fields and are thus consistent with existing theories. A neuronal spike response model demonstrates how the dynamic receptive field facilitates temporal and population sparseness. We discuss the potential mechanisms and benefits of a spatially and temporally sparse representation of natural visual input.

  12. Composition and phylogenetic analysis of vitellogenin coding sequences in the Indonesian coelacanth Latimeria menadoensis.

    PubMed

    Canapa, Adriana; Olmo, Ettore; Forconi, Mariko; Pallavicini, Alberto; Makapedua, Monica Daisy; Biscotti, Maria Assunta; Barucca, Marco

    2012-07-01

    The coelacanth Latimeria menadoensis, a living fossil, occupies a key phylogenetic position to explore the changes that have affected the genomes of the aquatic vertebrates that colonized dry land. This is the first study to isolate and analyze L. menadoensis mRNA. Three different vitellogenin transcripts were identified and their inferred amino acid sequences compared to those of other known vertebrates. The phylogenetic data suggest that the evolutionary history of this gene family in coelacanths was characterized by a different duplication event than those which occurred in teleosts, amniotes, and amphibia. Comparison of the three sequences highlighted differences in functional sites. Moreover, despite the presence of conserved sites compared with the other oviparous vertebrates, some sites were seen to have changed, others to be similar only to those of teleosts, and others still to resemble only to those of tetrapods.

  13. DNA sequencing and bar-coding using solid-state nanopores.

    PubMed

    Atas, Evrim; Singer, Alon; Meller, Amit

    2012-12-01

    Nanopores have emerged as a prominent single-molecule analytic tool with particular promise for genomic applications. In this review, we discuss two potential applications of the nanopore sensors: First, we present a nanopore-based single-molecule DNA sequencing method that utilizes optical detection for massively parallel throughput. Second, we describe a method by which nanopores can be used as single-molecule genotyping tools. For DNA sequencing, the distinction among the four types of DNA nucleobases is achieved by employing a biochemical procedure for DNA expansion. In this approach, each nucleobase in each DNA strand is converted into one of four predefined unique 16-mers in a process that preserves the nucleobase sequence. The resulting converted strands are then hybridized to a library of four molecular beacons, each carrying a unique fluorophore tag, that are perfect complements to the 16-mers used for conversion. Solid-state nanopores are then used to sequentially remove these beacons, one after the other, leading to a series of photon bursts in four colors that can be optically detected. Single-molecule genotyping is achieved by tagging the DNA fragments with γ-modified synthetic peptide nucleic acid probes coupled to an electronic characterization of the complexes using solid-state nanopores. This method can be used to identify and differentiate genes with a high level of sequence similarity at the single-molecule level, but different pathology or response to treatment. We will illustrate this method by differentiating the pol gene for two highly similar human immunodeficiency virus subtypes, paving the way for a novel diagnostics platform for viral classification. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts

    PubMed Central

    de la Calle-Mustienes, Elisa; Feijóo, Cármen Gloria; Manzanares, Miguel; Tena, Juan J.; Rodríguez-Seguel, Elisa; Letizia, Annalisa; Allende, Miguel L.; Gómez-Skarmeta, José Luis

    2005-01-01

    Recent studies of the genome architecture of vertebrates have uncovered two unforeseen aspects of its organization. First, large regions of the genome, called gene deserts, are devoid of protein-coding sequences and have no obvious biological role. Second, comparative genomics has highlighted the existence of an array of highly conserved non-coding regions (HCNRs) in all vertebrates. Most surprisingly, these structural features are strongly associated with genes that have essential functions during development. Among these, the vertebrate Iroquois (Irx) genes stand out on both fronts. Mammalian Irx genes are organized in two clusters (IrxA and IrxB) that span >1 Mb each with no other genes interspersed. Additionally, a large number of HCNRs exist within Irx clusters. We have systematically examined the enhancer activity of HCNRs from the IrxB cluster using transgenic Xenopus and zebrafish embryos. Most of these HCNRs are active in subdomains of endogenous Irx expression, and some are candidates to contain shared enhancers of neighboring genes, which could explain the evolutionary conservation of Irx clusters. Furthermore, HCNRs present in tetrapod IrxB but not in fish may be responsible for novel Irx expression domains that appeared after their divergence. Finally, we have performed a more detailed analysis on two IrxB ultraconserved non-coding regions (UCRs) duplicated in IrxA clusters in similar relative positions. These four regions share a core region highly conserved among all of them and drive expression in similar domains. However, inter-species conserved sequences surrounding the core, specific for each of these UCRs, are able to modulate their expression. PMID:16024824

  15. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences

    PubMed Central

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096

  16. Drosophila melanogaster paramyosin: developmental pattern, mapping and properties deduced from its complete coding sequence.

    PubMed

    Vinós, J; Maroto, M; Garesse, R; Marco, R; Cervera, M

    1992-02-01

    Several cDNA clones encoding the complete Drosophila paramyosin sequence, including two potential polyadenylation sites, have been obtained. Southern analysis and in situ hybridization to polytene chromosomes indicate that in Drosophila the paramyosin gene is single copy, located on the left arm of the third chromosome at region 66D14. Northern analyses show predominantly two different RNAs which are the products of the choice between the two alternative polyadenylation sites. The two species begin to be synthesized around 10 h of development when embryonic muscles are formed, expression peaking at the end of embryogenesis. The protein is first expressed at germ band shortening in association with muscle precursor cells. A second maximum of paramyosin RNA expression occurs at late pupal stages when the higher molecular weight form becomes more abundant. In young adults this species becomes the main transcript detected. The 102 kDa polypeptide sequence is highly similar to that of Caenorhabditis elegans paramyosin. The protein has a central alpha-helical coiled-coil rod, organized in 29 groups of four typical seven-residue repeats and flanked by two short non-alpha-helical regions. Several leucine zippers are located on the hydrophobic face of the alpha-helix in paramyosin which, together with disulfide bonds between cysteines, are probably involved in the stabilization of the dimer. The structural and functional properties of Drosophila paramyosin deduced from the sequence are compared with those of known invertebrate myosins and paramyosins.

  17. Rigid body motion analysis system for off-line processing of time-coded video sequences

    NASA Astrophysics Data System (ADS)

    Snow, Walter L.; Shortis, Mark R.

    1995-09-01

    Photogrammetry affords the only noncontact means of providing unambiguous six-degree-of- freedom estimates for rigid body motion analysis. Video technology enables convenient off- the-shelf capability for obtaining and storing image data at frame (30 Hz) or field (60 Hz) rates. Videometry combines these technologies with frame capture capability accessible to PCs to allow unavailable measurements critical to the study of rigid body dynamics. To effectively utilize this capability, however, some means of editing, post processing, and sorting substantial amounts of time coded video data is required. This paper discusses a prototype motion analysis system built around PC and video disk technology, which is proving useful in exploring applications of these concepts to rigid body tracking and deformation analysis. Calibration issues and user interactive software development associated with this project will be discussed, as will examples of measurement projects and data reduction.

  18. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Complete Coding Sequences of One H9 and Three H7 Low-Pathogenic Influenza Viruses Circulating in Wild Birds in Belgium, 2009 to 2012

    PubMed Central

    Rosseel, Toon; Marché, Sylvie; Steensels, Mieke; Vangeluwe, Didier; Linden, Annick; van den Berg, Thierry; Lambrecht, Bénédicte

    2016-01-01

    The complete coding sequences of four avian influenza A viruses (two H7N7, one H7N1, and one H9N2) circulating in wild waterfowl in Belgium from 2009 to 2012 were determined using Illumina sequencing. All viral genome segments represent viruses circulating in the Eurasian wild bird population. PMID:27284153

  20. Functional consequences of B-repeat sequence variation in the staphylococcal biofilm protein Aap: deciphering the assembly code.

    PubMed

    Shelton, Catherine L; Conrady, Deborah G; Herr, Andrew B

    2017-02-01

    Staphylococcus epidermidis is an opportunistic pathogen that can form robust biofilms that render the bacteria resistant to antibiotic action and immune responses. Intercellular adhesion in S. epidermidis biofilms is mediated by the cell wall-associated accumulation-associated protein (Aap), via zinc-mediated self-assembly of its B-repeat region. This region contains up to 17 nearly identical sequence repeats, with each repeat assumed to be functionally equivalent. However, Aap B-repeats exist as two subtypes, defined by a cluster of consensus or variant amino acids. These variable residues are positioned near the zinc-binding (and dimerization) site and the stability determinant for the B-repeat fold. We have characterized four B-repeat constructs to assess the functional relevance of the two Aap B-repeat subtypes. Analytical ultracentrifugation experiments demonstrated that constructs with the variant sequence show reduced or absent Zn(2+)-induced dimerization. Likewise, circular dichroism thermal denaturation experiments showed that the variant sequence could significantly stabilize the fold, depending on its location within the construct. Crystal structures of three of the constructs revealed that the side chains from the variant sequence form an extensive bonding network that can stabilize the fold. Furthermore, altered distribution of charged residues between consensus and variant sequences changes the electrostatic potential in the vicinity of the Zn(2+)-binding site, providing a mechanistic explanation for the loss of zinc-induced dimerization in the variant constructs. These data suggest an assembly code that defines preferred oligomerization modes of the B-repeat region of Aap and a slip-grip model for initial contact followed by firm intercellular adhesion during biofilm formation. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.

  1. Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences.

    PubMed

    Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya

    2016-07-12

    Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Utility of selected non-coding chloroplast DNA sequences for lineage assessment of Musa interspecific hybrids.

    PubMed

    Swangpol, Sasivimon; Volkaert, Hugo; Sotto, Rachel C; Seelanan, Tosak

    2007-07-31

    Single-copy chloroplast loci are used widely to infer phylogenetic relationship at different taxonomic levels among various groups of plants. To test the utility of chloroplast loci and to provide additional data applicable to hybrid evolution in Musa, we sequenced two introns, rpl16 and ndhA, and two intergenic spacers, psaA-ycf3 and petA-psbJ-psbL-psbF and combined these data. Using these four regions, Musa acuminata Colla (A)- and M. balbisiana Colla (B)-containing genomes were clearly distinguished. Some triploid interspecific hybrids contain A-type chloroplasts (the AAB/ABB) while others contain B-type chloroplasts (the BBA/BBB). The chloroplasts of all cultivars in 'Namwa' (BBA) group came from the same wild maternal origin, but the specific parents are still unrevealed. Though, average sequence divergences in each region were little (less than 2%), we propose that petA-psbJ intergenic spacer could be developed for diversity assessment within each genome. This segment contains three single nucleotide polymorphisms (SNPs) and two indels which could distinguish diversity within A genome whereas this same region also contains one SNP and an indel which could categorize B genome. However, an inverted repeat region which could form hairpin structure was detected in this spacer and thus was omitted from the analyses due to their incongruence to other regions. Until thoroughly identified in other members of Musaceae and Zingiberales clade, utility of this inverted repeat as phylogenetic marker in these taxa are cautioned.

  3. Characterization of genomic sequence coding for bromelain inhibitors in pineapple and expression of its recombinant isoform.

    PubMed

    Sawano, Yoriko; Muramatsu, Tomonari; Hatano, Ken-ichi; Nagata, Koji; Tanokura, Masaru

    2002-08-02

    Bromelain inhibitor (BI) is a cysteine proteinase inhibitor isolated from pineapple stem (Reddy, M. N., Keim, P. S., Heinrikson, R. L., and Kézdy, F. J. (1975) J. Biol. Chem. 250, 1741-1750). It consists of eight isoinhibitors, and each isoinhibitor has a two-chain structure. In this study, the genomic DNA has been cloned and found to encode a precursor protein with 246 amino acids (M(r) = approximately 27,500) containing three isoinhibitor domains (BI-III, -VI, and -VII) that are 93% identical to one another in amino acid sequences. The gene structure indicated that these isoinhibitors are produced by removal of the N-terminal pre-peptide (19 residues), 3 interchain peptides (each 5 residues), 2 interdomain peptides (each 19 residues), and the C-terminal pro-peptide (18 residues). Moreover, all the amino acid sequences of bromelain isoinhibitors could be explained by removal of one or two amino acids from BI-III, -VI, and -VII with exopeptidases. A recombinant single-chain BI-VI with and without the interchain peptide showed the same and no bromelain inhibitory activity as compared with the native BI-VI, respectively. These results indicate that the interchain peptide plays an important role of the folding process of the mature isoinhibitors.

  4. Phylogeny, character evolution, and biogeography of Cuscuta (dodders; Convolvulaceae) inferred from coding plastid and nuclear sequences.

    PubMed

    García, Miguel A; Costea, Mihai; Kuzmina, Maria; Stefanović, Saša

    2014-04-01

    The parasitic genus Cuscuta, containing some 200 species circumscribed traditionally in three subgenera, is nearly cosmopolitan, occurring in a wide range of habitats and hosts. Previous molecular studies, on subgenera Grammica and Cuscuta, delimited major clades within these groups. However, the sequences used were unalignable among subgenera, preventing the phylogenetic comparison across the genus. We conducted a broad phylogenetic study using rbcL and nrLSU sequences covering the morphological, physiological, and geographical diversity of Cuscuta. We used parsimony methods to reconstruct ancestral states for taxonomically important characters. Biogeographical inferences were obtained using statistical and Bayesian approaches. Four well-supported major clades are resolved. Two of them correspond to subgenera Monogynella and Grammica. Subgenus Cuscuta is paraphyletic, with section Pachystigma sister to subgenus Grammica. Previously described cases of strongly supported discordance between plastid and nuclear phylogenies, interpreted as reticulation events, are confirmed here and three new cases are detected. Dehiscent fruits and globose stigmas are inferred as ancestral character states, whereas the ancestral style number is ambiguous. Biogeographical reconstructions suggest an Old World origin for the genus and subsequent spread to the Americas as a consequence of one long-distance dispersal. Hybridization may play an important yet underestimated role in the evolution of Cuscuta. Our results disagree with scenarios of evolution (polarity) previously proposed for several taxonomically important morphological characters, and with their usage and significance. While several cases of long-distance dispersal are inferred, vicariance or dispersal to adjacent areas emerges as the dominant biogeographical pattern.

  5. Flanking sequence specificity determines coding microsatellite heteroduplex and mutation rates with defective DNA mismatch repair (MMR).

    PubMed

    Chung, H; Lopez, C G; Young, D J; Lai, J F; Holmstrom, J; Ream-Robinson, D; Cabrera, B L; Carethers, J M

    2010-04-15

    The activin type II receptor (ACVR2) contains two identical microsatellites in exons 3 and 10, but only the exon 10 microsatellite is frameshifted in mismatch repair (MMR)-defective colonic tumors. The reason for this selectivity is not known. We hypothesized that ACVR2 frameshifts were influenced by DNA sequences surrounding the microsatellite. We constructed plasmids in which exons 3 or 10 of ACVR2 were cloned +1 bp out of frame of enhanced green fluorescent protein (EGFP), allowing -1 bp frameshift to express EGFP. Plasmids were stably transfected into MMR-deficient cells, and subsequent non-fluorescent cells were sorted, cultured and harvested for mutation analysis. We swapped DNA sequences flanking the exon 3 and 10 microsatellites to test our hypothesis. Native ACVR2 exon 3 and 10 microsatellites underwent heteroduplex formation (A(7)/T(8)) in hMLH1(-/-) cells, but only exon 10 microsatellites fully mutated (A(7)/T(7)) in both hMLH1(-/-) and hMSH6(-/-) backgrounds, showing selectivity for exon 10 frameshifts and inability of exon 3 heteroduplexes to fully mutate. Substituting nucleotides flanking the exon 3 microsatellite for nucleotides flanking the exon 10 microsatellite significantly reduced heteroduplex and full mutation in hMLH1(-/-) cells. When the exon 3 microsatellite was flanked by nucleotides normally surrounding the exon 10 microsatellite, fully mutant exon 3 frameshifts appeared. Mutation selectivity for ACVR2 lies partly with flanking nucleotides surrounding each microsatellite.

  6. Porcine PPARGC1A (peroxisome proliferative activated receptor gamma coactivator 1A): coding sequence, genomic organization, polymorphisms and mapping.

    PubMed

    Jacobs, K; Rohrer, G; Van Poucke, M; Piumi, F; Yerle, M; Barthenschlager, H; Mattheeuws, M; Van Zeveren, A; Peelman, L J

    2006-01-01

    We report here the characterisation of porcine PPARGC1A. Primers based on human PPARGC1A were used to isolate two porcine BAC clones. Porcine coding sequences of PPARGC1A were sequenced together with the splice site regions and the 5' and 3' regions. Using direct sequencing nine SNPs were found. Allele frequencies were determined in unrelated animals of five different pig breeds. In the MARC Meishan-White Composite resource population, the polymorphism in exon 9 was significantly associated with leaf fat weight. PPARGC1A has been mapped by FISH to SSC8p21. A (CA)n microsatellite (SGU0001) has been localised near marker SWR1101 on chromosome 8 by RH mapping and at the same position as marker KS195 (32.5 cM) by linkage mapping. The AseI (nt857, Asn/Asn489) polymorphism in exon 8 was used to perform linkage analysis in the Hohenheim pedigrees and located the gene in the same genomic region. Transcription of the gene was detected in adipose, muscle, kidney, liver, brain, heart and adrenal gland tissues, which is in agreement with the function of PPARGC1A in adaptive thermogenesis. Copyright 2006 S. Karger AG, Basel.

  7. Cloning, sequencing, and expression of the gene coding for an antigenic 120-kilodalton protein of Rickettsia conorii.

    PubMed Central

    Schuenke, K W; Walker, D H

    1994-01-01

    Several high-molecular-mass (above 100 kDa) antigens are recognized by sera from humans infected with spotted fever group rickettsiae and may be important stimulators of the host immune response. Molecular cloning techniques were used to make genomic Rickettsia conorii (Malish 7 strain) libraries in expression vector lambda gt11. The 120-kDa R. conorii antigen was identified by monospecific antibodies to the recombinant protein expressed on construct lambda 4-7. The entire gene DNA sequence was obtained by using this construct and two other overlapping constructs. An open reading frame of 3,068 bp with a calculated molecular mass of approximately 112 kDa was identified. Promoters and a ribosome-binding site were identified on the basis of their DNA sequence homology to other rickettsial genes and their relative positions in the sequence. The DNA coding region shares no significant homology with other spotted fever group rickettsial antigen genes (i.e., the R. rickettsii 190-, 135-, and 17-kDa antigen-encoding genes). The PCR technique was used to amplify the gene from eight species of spotted fever group rickettsiae. A 75-kDa portion of the 120-kDa antigen was overexpressed in and purified from Escherichia coli. This polypeptide was recognized by antirickettsial antibodies and may be a useful diagnostic reagent for spotted fever group rickettsioses. Images PMID:8112862

  8. Identification of large intergenic non-coding RNAs in bovine muscle using next-generation transcriptomic sequencing.

    PubMed

    Billerey, Coline; Boussaha, Mekki; Esquerré, Diane; Rebours, Emmanuelle; Djari, Anis; Meersseman, Cédric; Klopp, Christophe; Gautheret, Daniel; Rocha, Dominique

    2014-06-19

    The advent of large-scale gene expression technologies has helped to reveal in eukaryotic cells, the existence of thousands of non-coding transcripts, whose function and significance remain mostly poorly understood. Among these non-coding transcripts, long non-coding RNAs (lncRNAs) are the least well-studied but are emerging as key regulators of diverse cellular processes. In the present study, we performed a survey in bovine Longissimus thoraci of lincRNAs (long intergenic non-coding RNAs not overlapping protein-coding transcripts). To our knowledge, this represents the first such study in bovine muscle. To identify lincRNAs, we used paired-end RNA sequencing (RNA-Seq) to explore the transcriptomes of Longissimus thoraci from nine Limousin bull calves. Approximately 14-45 million paired-end reads were obtained per library. A total of 30,548 different transcripts were identified. Using a computational pipeline, we defined a stringent set of 584 different lincRNAs with 418 lincRNAs found in all nine muscle samples. Bovine lincRNAs share characteristics seen in their mammalian counterparts: relatively short transcript and gene lengths, low exon number and significantly lower expression, compared to protein-encoding genes. As for the first time, our study identified lincRNAs from nine different samples from the same tissue, it is possible to analyse the inter-individual variability of the gene expression level of the identified lincRNAs. Interestingly, there was a significant difference when we compared the expression variation of the 418 lincRNAs with the 10,775 known selected protein-encoding genes found in all muscle samples. In addition, we found 2,083 pairs of lincRNA/protein-encoding genes showing a highly significant correlated expression. Fourteen lincRNAs were selected and 13 were validated by RT-PCR. Some of the lincRNAs expressed in muscle are located within quantitative trait loci for meat quality traits. Our study provides a glimpse into the linc

  9. Deconstruction of archaeal genome depict strategic consensus in core pathways coding sequence assembly.

    PubMed

    Pal, Ayon; Banerjee, Rachana; Mondal, Uttam K; Mukhopadhyay, Subhasis; Bothra, Asim K

    2015-01-01

    A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding frequency signature.

  10. Deconstruction of Archaeal Genome Depict Strategic Consensus in Core Pathways Coding Sequence Assembly

    PubMed Central

    Pal, Ayon; Banerjee, Rachana; Mondal, Uttam K.; Mukhopadhyay, Subhasis; Bothra, Asim K.

    2015-01-01

    A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding frequency signature

  11. Deletion of 5'-coding sequences of the cellular p53 gene in mouse erythroleukemia: a novel mechanism of oncogene regulation.

    PubMed Central

    Rovinski, B; Munroe, D; Peacock, J; Mowat, M; Bernstein, A; Benchimol, S

    1987-01-01

    The p53 gene is rearranged in an erythroleukemic cell line (DP15-2) transformed by Friend retrovirus. Here, we characterize the mutation and identify a deletion of approximately equal to 3.0 kilobases that removes exon 2 coding sequences. The gene is expressed in DP15-2 cells and results in synthesis of a 44,000-dalton protein that is missing the N-terminal amino acid residues of p53. The truncated protein is unusually stable and accumulates to high levels intracellularly. Moreover, it appears to have undergone a change in conformation as revealed by epitope mapping studies. This study represents the first description of an altered p53 gene product arising by mutation during neoplastic progression and identifies a region in the p53 protein molecule that plays a role in determining p53 stability in vivo. Images PMID:3547084

  12. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  13. Phenotypes of murine leukemia virus-induced tumors: influence of 3' viral coding sequences.

    PubMed Central

    Ott, D E; Keller, J; Sill, K; Rein, A

    1992-01-01

    Murine leukemia viruses (MuLVs) induce leukemias and lymphomas in mice. We have used fluorescence-activated cell sorter analysis to determine the hematopoietic phenotypes of tumor cells induced by a number of MuLVs. Tumor cells induced by ecotropic Moloney, amphotropic 4070A, and 10A1 MuLVs and by two chimeric MuLVs, Mo(4070A) and Mo(10A1), were examined with antibodies to 13 lineage-specific cell surface markers found on myeloid cell, T-cell, and B-cell lineages. The chimeric Mo(4070A) and Mo(10A1) MuLVs, consisting of Moloney MuLV with the carboxy half of the Pol region and nearly all of the Env region of 4070A and 10A1, respectively, were constructed to examine the possible influence of these sequences on Moloney MuLV-induced tumor cell phenotypes. In some instances, these phenotypic analyses were supplemented by Southern blot analysis for lymphoid cell-specific genomic DNA rearrangements at the immunoglobulin heavy-chain, the T-cell receptor gamma, and the T-cell receptor beta loci. The results of our analysis showed that Moloney MuLV, 4070A, Mo(4070A), and Mo(10A1) induced mostly T-cell tumors. Moloney MuLV and Mo(4070A) induced a wide variety of T-cell phenotypes, ranging from immature to mature phenotypes, while 4070A induced mostly prothymocyte and double-negative (CD4- CD8-) T-cell tumors. The tumor phenotypes obtained with 10A1 and Mo(10A1) were each less variable than those obtained with the other MuLVs tested. 10A1 uniformly induced a tumor consisting of lineage marker-negative cells that lack lymphoid cell-specific DNA rearrangements and histologically appear to be early undifferentiated erythroid cell-like precursors. The Mo(10A1) chimera consistently induced an intermediate T-cell tumor. The chimeric constructions demonstrated that while 4070A 3' pol and env sequences apparently did not influence the observed tumor cell phenotypes, the 10A1 half of pol and env had a strong effect on the phenotypes induced by Mo(10A1) that resulted in a phenotypic

  14. A base-sequence-modulated Golay code improves the excitation and measurement of ultrasonic guided waves in long bones.

    PubMed

    Song, Xiaojun; Ta, Dean; Wang, Weiqi

    2012-11-01

    Researchers are interested in using ultrasonic guided waves (GWs) to assess long bones. However, GWs suffer high attenuation when they propagate in long bones, resulting in a low SNR. To overcome this limitation, this paper introduces a base-sequence-modulated Golay code (BSGC) to produce larger amplitude and improve the SNR in the ultrasound evaluation of long bones. A 16-bit Golay code was used for excitation in computer simulation. The decoded GWs and the traditional GWs, which were generated by a single pulse, agreed well after decoding the received signals, and the SNR was improved by 26.12 dB. In the experiments using bovine bones, the BSGC excitation produced the amplitudes which were at least 237 times greater than those produced by a single pulse excitation. The BSGC excitation also allowed the GWs to be received over a longer distance between two transducers. The results suggest the BSGC excitation has the potential to measure GWs and assess long bones.

  15. Molecular systematics of the Brassicaceae: evidence from coding plastidic matK and nuclear Chs sequences.

    PubMed

    Koch, M; Haubold, B; Mitchell-Olds, T

    2001-03-01

    Phylogenetic relationships were inferred using nucleotide sequence variation of the nuclear-encoded chalcone synthase gene (Chs) and the chloroplast gene matK for members of five tribes from the family Brassicaceae to analyze tribal and subtribal structures. Phylogenetic trees from individual data sets are mostly in congruence with the results from a combined matK-Chs analysis with a total of 2721 base pairs, but with greater resolution and higher statistical support for deeper branching patterns. The analysis indicates that tribes Lepidieae, Arabideae, and Sisymbrieae are not monophyletic. Among taxa under study four different lineages each were detected in tribes Arabideae and Lepidieae, interspersed with taxa from tribes Sisymbrieae, Hesperideae, and Brassiceae. It is concluded that tribe Brassiceae might be the only monophyletic group of the traditional tribes. From our data we estimated several divergence times for different lineages among cruciferous plants: 5.8 mya (million years ago) for the Arabidopsis-Cardaminopsis split, 20 mya for the Brassica-Arabidopsis split, and ∼40 mya for the age of the deepest split between the most basal crucifer Aethionema and remaining cruciferous taxa.

  16. Variations in opsin coding sequences cause x-linked cone dysfunction syndrome with myopia and dichromacy.

    PubMed

    McClements, Michelle; Davies, Wayne I L; Michaelides, Michel; Young, Terri; Neitz, Maureen; MacLaren, Robert E; Moore, Anthony T; Hunt, David M

    2013-02-15

    To determine the role of variant L opsin haplotypes in seven families with Bornholm Eye Disease (BED), a cone dysfunction syndrome with dichromacy and myopia. Analysis of the opsin genes within the L/M opsin array at Xq28 included cloning and sequencing of an exon 3-5 gene fragment, long range PCR to establish gene order, and quantitative PCR to establish gene copy number. In vitro expression of normal and variant opsins was performed to examine cellular trafficking and spectral sensitivity of pigments. All except one of the BED families possessed L opsin genes that contained a rare exon 3 haplotype. The exception was a family with the deleterious Cys203Arg substitution. Two rare exon 3 haplotypes were found and, where determined, these variant opsin genes were in the first position in the array. In vitro expression in transfected cultured neuronal cells showed that the variant opsins formed functional pigments, which trafficked to the cell membranes. The variant opsins were, however, less stable than wild type. It is concluded that the variant L opsin haplotypes underlie BED. The reduction in the amount of variant opsin produced in vitro compared with wild type indicates a possible disease mechanism. Alternatively, the recently identified defective splicing of exon 3 of the variant opsin transcript may be involved. Both mechanisms explain the presence of dichromacy and cone dystrophy. Abnormal pigment may also underlie the myopia that is invariably present in BED subjects.

  17. Variations in Opsin Coding Sequences Cause X-Linked Cone Dysfunction Syndrome with Myopia and Dichromacy

    PubMed Central

    McClements, Michelle; Davies, Wayne I. L.; Michaelides, Michel; Young, Terri; Neitz, Maureen; MacLaren, Robert E.; Moore, Anthony T.; Hunt, David M.

    2013-01-01

    Purpose. To determine the role of variant L opsin haplotypes in seven families with Bornholm Eye Disease (BED), a cone dysfunction syndrome with dichromacy and myopia. Methods. Analysis of the opsin genes within the L/M opsin array at Xq28 included cloning and sequencing of an exon 3-5 gene fragment, long range PCR to establish gene order, and quantitative PCR to establish gene copy number. In vitro expression of normal and variant opsins was performed to examine cellular trafficking and spectral sensitivity of pigments. Results. All except one of the BED families possessed L opsin genes that contained a rare exon 3 haplotype. The exception was a family with the deleterious Cys203Arg substitution. Two rare exon 3 haplotypes were found and, where determined, these variant opsin genes were in the first position in the array. In vitro expression in transfected cultured neuronal cells showed that the variant opsins formed functional pigments, which trafficked to the cell membranes. The variant opsins were, however, less stable than wild type. Conclusions. It is concluded that the variant L opsin haplotypes underlie BED. The reduction in the amount of variant opsin produced in vitro compared with wild type indicates a possible disease mechanism. Alternatively, the recently identified defective splicing of exon 3 of the variant opsin transcript may be involved. Both mechanisms explain the presence of dichromacy and cone dystrophy. Abnormal pigment may also underlie the myopia that is invariably present in BED subjects. PMID:23322568

  18. Structural sequences are conserved in the genes coding for the alpha, alpha' and beta-subunits of the soybean 7S seed storage protein.

    PubMed Central

    Schuler, M A; Ladin, B F; Pollaco, J C; Freyer, G; Beachy, R N

    1982-01-01

    Cloned DNAs encoding four different proteins have been isolated from recombinant cDNA libraries constructed with Glycine max seed mRNAs. Two cloned DNAs code for the alpha and alpha'-subunits of the 7S seed storage protein (conglycinin). The other cloned cDNAs code for proteins which are synthesized in vitro as 68,000 d., 60,000 d. or 53,000 d. polypeptides. Hybrid selection experiments indicate that, under low stringency hybridization conditions, all four cDNAs hybridize with mRNAs for the alpha and alpha'-subunits and the 68,000 d., 60,000 d. and 53,000 d. in vitro translation products. Within three of the mRNA, there is a conserved sequence of 155 nucleotides which is responsible for this hybridization. The conserved nucleotides in the alpha and alpha'-subunit cDNAs and the 68,000 d. polypeptide cDNAs span both coding and noncoding sequences. The differences in the coding nucleotides outside the conserved region are extensive. This suggests that selective pressure to maintain the 155 conserved nucleotides has been influenced by the structure of the seed mRNA. RNA blot hybridizations demonstrate that mRNA encoding the other major subunit (beta) of the 7S seed storage protein also shares sequence homology with the conserved 155 nucleotide sequence of the alpha and alpha'-subunit mRNAs, but not with other coding sequences. Images PMID:6897678

  19. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    PubMed

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species.

  20. Nonsense mutation in the glycoprotein Ib. alpha. coding sequence associated with Bernard-Soulier syndrome

    SciTech Connect

    Ware, J.; Russell, S.R.; Vicente, V.; Scharf, R.E.; Tomer, A.; McMillian, R.; Ruggeri, Z.M. )

    1990-03-01

    Three distinct gene products, the {alpha} and {beta} chains of glycoprotein (GP) Ib and GP IX, constitute the platelet membrane GP Ib-IX complex, a receptor for von Willebrand factor and thrombin involved in platelet adhesion and aggregation. Defective function of the GP Ib-IX complex is the hallmark of a rare congenital bleeding disorder of still undefined pathogenesis, the Bernard-Soulier syndrome. The authors have analyzed the molecular basis of the disease in one patient in whom immunoblotting of solubilized platelets demonstrated absence of normal GP Ib{alpha} but presence of a smaller immunoreactive species. The truncated polypeptide was also present, along with normal protein, in platelets from the patient's mother and two of his four children. Genetic characterization identified a nucleotide transition changing the Trp-343 codon (TGG) to a nonsense codon (TGA). Such a mutation explains the origin of the smaller GP Ib{alpha}, which by lacking half of the sequence on the carboxyl-terminal side, including the transmembrane domain, cannot be properly inserted in the platelet membrane. Both normal and mutant codons were found in the patient, suggesting that he is a compound heterozygote with a still unidentified defect in the other GP Ib{alpha} allele. Nonsense mutation and truncated GP Ib{alpha} polypeptide were found to cosegregate in four individuals through three generations and were associated with either Bernard-Soulier syndrome or carrier state phenotype. The molecular abnormality demonstrated in this family provides evidence that defective synthesis of GP Ib{alpha} alters the membrane expression of the GP Ib-IX complex and may be responsible for Bernard-Soulier syndrome.

  1. OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals.

    PubMed

    Douzery, Emmanuel J P; Scornavacca, Celine; Romiguier, Jonathan; Belkhir, Khalid; Galtier, Nicolas; Delsuc, Frédéric; Ranwez, Vincent

    2014-07-01

    Comparative genomic studies extensively rely on alignments of orthologous sequences. Yet, selecting, gathering, and aligning orthologous exons and protein-coding sequences (CDS) that are relevant for a given evolutionary analysis can be a difficult and time-consuming task. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of orthologous genes in mammalian genomes using a phylogenetic framework. Since its first release in 2007, OrthoMaM has regularly evolved, not only to include newly available genomes but also to incorporate up-to-date software in its analytic pipeline. This eighth release integrates the 40 complete mammalian genomes available in Ensembl v73 and provides alignments, phylogenies, evolutionary descriptor information, and functional annotations for 13,404 single-copy orthologous CDS and 6,953 long exons. The graphical interface allows to easily explore OrthoMaM to identify markers with specific characteristics (e.g., taxa availability, alignment size, %G+C, evolutionary rate, chromosome location). It hence provides an efficient solution to sample preprocessed markers adapted to user-specific needs. OrthoMaM has proven to be a valuable resource for researchers interested in mammalian phylogenomics, evolutionary genomics, and has served as a source of benchmark empirical data sets in several methodological studies. OrthoMaM is available for browsing, query and complete or filtered downloads at http://www.orthomam.univ-montp2.fr/.

  2. Genomic integration of the full-length dystrophin coding sequence in Duchenne muscular dystrophy induced pluripotent stem cells.

    PubMed

    Farruggio, Alfonso P; Bhakta, Mital S; du Bois, Haley; Ma, Julia; P Calos, Michele

    2017-04-01

    The plasmid vectors that express the full-length human dystrophin coding sequence in human cells was developed. Dystrophin, the protein mutated in Duchenne muscular dystrophy, is extraordinarily large, providing challenges for cloning and plasmid production in Escherichia coli. The authors expressed dystrophin from the strong, widely expressed CAG promoter, along with co-transcribed luciferase and mCherry marker genes useful for tracking plasmid expression. Introns were added at the 3' and 5' ends of the dystrophin sequence to prevent translation in E. coli, resulting in improved plasmid yield. Stability and yield were further improved by employing a lower-copy number plasmid origin of replication. The dystrophin plasmids also carried an attB site recognized by phage phiC31 integrase, enabling the plasmids to be integrated into the human genome at preferred locations by phiC31 integrase. The authors demonstrated single-copy integration of plasmid DNA into the genome and production of human dystrophin in the human 293 cell line, as well as in induced pluripotent stem cells derived from a patient with Duchenne muscular dystrophy. Plasmid-mediated dystrophin expression was also demonstrated in mouse muscle. The dystrophin expression plasmids described here will be useful in cell and gene therapy studies aimed at ameliorating Duchenne muscular dystrophy.

  3. The analysis of incomplete data.

    NASA Technical Reports Server (NTRS)

    Hartley, H. O.; Hocking, R. R.

    1971-01-01

    In this paper, we attempt to provide a simple taxonomy for incomplete-data problems and at the same time develop unified methods of analysis. The emphasis is on techniques which are natural extensions of the complete-data analysis and which will handle rather general classes of incomplete-data problems as opposed to custom-made techniques for special problems. The principle of estimation is either maximum likelihood or is at least based on maximum likelihood.

  4. Detecting selection in the blue crab, Callinectes sapidus, using DNA sequence data from multiple nuclear protein-coding genes.

    PubMed

    Yednock, Bree K; Neigel, Joseph E

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available.

  5. Detecting Selection in the Blue Crab, Callinectes sapidus, Using DNA Sequence Data from Multiple Nuclear Protein-Coding Genes

    PubMed Central

    Yednock, Bree K.; Neigel, Joseph E.

    2014-01-01

    The identification of genes involved in the adaptive evolution of non-model organisms with uncharacterized genomes constitutes a major challenge. This study employed a rigorous and targeted candidate gene approach to test for positive selection on protein-coding genes of the blue crab, Callinectes sapidus. Four genes with putative roles in physiological adaptation to environmental stress were chosen as candidates. A fifth gene not expected to play a role in environmental adaptation was used as a control. Large samples (n>800) of DNA sequences from C. sapidus were used in tests of selective neutrality based on sequence polymorphisms. In combination with these, sequences from the congener C. similis were used in neutrality tests based on interspecific divergence. In multiple tests, significant departures from neutral expectations and indicative of positive selection were found for the candidate gene trehalose 6-phosphate synthase (tps). These departures could not be explained by any of the historical population expansion or bottleneck scenarios that were evaluated in coalescent simulations. Evidence was also found for balancing selection at ATP-synthase subunit 9 (atps) using a maximum likelihood version of the Hudson, Kreitmen, and Aguadé test, and positive selection favoring amino acid replacements within ATP/ADP translocase (ant) was detected using the McDonald-Kreitman test. In contrast, test statistics for the control gene, ribosomal protein L12 (rpl), which presumably has experienced the same demographic effects as the candidate loci, were not significantly different from neutral expectations and could readily be explained by demographic effects. Together, these findings demonstrate the utility of the candidate gene approach for investigating adaptation at the molecular level in a marine invertebrate for which extensive genomic resources are not available. PMID:24896825

  6. A common class of transcripts with 5′-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification

    PubMed Central

    Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P.; Palazzo, Alexander F.

    2017-01-01

    Introns are found in 5′ untranslated regions (5′UTRs) for 35% of all human transcripts. These 5′UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5′UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5′UTR intron status, we developed a classifier that can predict 5′UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5′ proximal-intron-minus-like-coding regions (“5IM” transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5′ cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5′ proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5′ proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. PMID:27994090

  7. Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs

    PubMed Central

    Wiemann, Stefan; Weil, Bernd; Wellenreuther, Ruth; Gassenhuber, Johannes; Glassl, Sabine; Ansorge, Wilhelm; Böcher, Michael; Blöcker, Helmut; Bauersachs, Stefan; Blum, Helmut; Lauber, Jürgen; Düsterhöft, Andreas; Beyer, Andreas; Köhrer, Karl; Strack, Normann; Mewes, Hans-Werner; Ottenwälder, Birgit; Obermaier, Brigitte; Tampe, Jens; Heubner, Dagmar; Wambutt, Rolf; Korn, Bernhard; Klein, Michaela; Poustka, Annemarie

    2001-01-01

    With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%–5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies. [The sequence data described in this paper have been submitted to the EMBL database under the accession nos. given in Table 2.] PMID:11230166

  8. Intraclonal diversity in follicular lymphoma analyzed by quantitative ultra-deep sequencing of non-coding regions1

    PubMed Central

    Spence, Janice M.; Abumoussa, Andrew; Spence, John P.; Burack, W. Richard

    2014-01-01

    Cancers are characterized by genomic instability and the resulting intra-clonal diversity is a prerequisite for tumor evolution. Therefore, metrics of tumor heterogeneity may prove to be clinically meaningful. Intra-clonal heterogeneity in follicular lymphoma (FL) is apparent from studies of somatic hypermutation (SHM) caused by Activation Induced Deaminase (AID) in IGH. Aberrant SHM (aSHM), defined as AID activity outside of the IG loci, predominantly targets non-coding regions causing numerous “passenger” mutations but has the potential to generate rare significant “driver” mutations. The quantitative relationship between SHM and aSHM has not been defined. To measure SHM and aSHM, ultradeep sequencing (>20,000 fold coverage) was performed on IGH (∼1650nt) and 9 other non-coding regions potentially targeted by AID (combined 9411nt), including the 5′UTR of BCL2. Single nucleotide variants (SNV) were found in 12/12 FL specimens (median 136 SHM and 53 aSHM). The aSHM SNVs were associated with AID-motifs (p<0.0001). The number of SNVs at BCL2 varied widely among specimens and correlated with the number of SNVs at 8 other potential aSHM sites. In contrast SHM at IGH was not predictive of aSHM. Tumor heterogeneity is apparent from SNVs at low variant allele frequencies (VAF); the relative number of SNVs with VAF<5% varied with clinical grade indicating that tumor heterogeneity based on aSHM reflects a clinically meaningful parameter. These data suggest that genome-wide aSHM may be estimated from aSHM of BCL2 but not SHM of IGH. The results demonstrate a practical approach to the quantification of intra-tumoral genetic heterogeneity for clinical specimens. PMID:25311808

  9. C.U.R.R.F. (Codon Usage regarding Restriction Finder): a free Java(®)-based tool to detect potential restriction sites in both coding and non-coding DNA sequences.

    PubMed

    Gatter, Michael; Gatter, Thomas; Matthäus, Falk

    2012-10-01

    The synthesis of complete genes is becoming a more and more popular approach in heterologous gene expression. Reasons for this are the decreasing prices and the numerous advantages in comparison to classic molecular cloning methods. Two of these advantages are the possibility to adapt the codon usage to the host organism and the option to introduce restriction enzyme target sites of choice. C.U.R.R.F. (Codon Usage regarding Restriction Finder) is a free Java(®)-based software program which is able to detect possible restriction sites in both coding and non-coding DNA sequences by introducing multiple silent or non-silent mutations, respectively. The deviation of an alternative sequence containing a desired restriction motive from the sequence with the optimal codon usage is considered during the search of potential restriction sites in coding DNA and mRNA sequences as well as protein sequences. C.U.R.R.F is available at http://www.zvm.tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_mathematik_und_naturwissenschaften/fachrichtung_biologie/mikrobiologie/allgemeine_mikrobiologie/currf.

  10. Mutations in the TSC2 gene: analysis of the complete coding sequence using the protein truncation test (PTT).

    PubMed

    van Bakel, I; Sepp, T; Ward, S; Yates, J R; Green, A J

    1997-09-01

    Mutations in the TSC2 gene on chromosome 16p13.3 are responsible for approximately 50% of familial tuberous sclerosis (TSC). The gene has 41 small exons spanning 45 kb of genomic DNA and encoding a 5.5 kb mRNA. Large germline deletions of TSC2 occur in <5% of cases, and a number of small intragenic mutations have been described. We analysed mRNA from 18 unrelated cases of TSC for TSC2 mutations using the protein truncation test (PTT). Three cases were predicted to be TSC2 mutations on the basis of linkage analysis or because a hamartoma from the patient showed loss of heterozygosity for 16p13.3 markers. Three overlapping PCR products, covering the complete coding sequence of mRNA, were generated from lymphoblastoid cell lines, translated into 35S-methionine labelled protein, and analysed by SDS-PAGE. PCR products showing PTT shifts were directly sequenced, and mutations confirmed by restriction enzyme digestion where possible. Six PTT shifts were identified. Five of these were caused by mutations predicted to produce a truncated protein: (i) a sporadic case showed a 32 bp deletion in exon 11, and a mutant mRNA without exon 11 was produced; the normal exon 10 was also spliced out; (ii) a sporadic case had a 1 bp deletion in exon 12 (1634delT); (iii) a TSC2-linked mother and daughter pair had a G-->T transversion in exon 23 (G2715T) introducing a cryptic splice site causing a 29 bp truncation of mRNA from exon 23; (iv) a sporadic case showed a 2 bp deletion in exon 36; (v) a sporadic case showed a 1 bp insertion disrupting the donor splice site of exon 37 (5007+2insA), resulting in the use of an upstream exonic cryptic splice site to cause a 29 bp truncation of mRNA from exon 37. In one case, the PTT shift was explained by in-frame splicing out of exon 10, in the presence of a normal exon 10 genomic sequence. Alternative splicing of exon 10 of the TSC2 gene may be a normal variant. Three 3rd base substitution polymorphisms were also detected during direct sequencing

  11. Cloning, sequencing, and expression of the apa gene coding for the Mycobacterium tuberculosis 45/47-kilodalton secreted antigen complex.

    PubMed Central

    Laqueyrerie, A; Militzer, P; Romain, F; Eiglmeier, K; Cole, S; Marchal, G

    1995-01-01

    Effective protection against a virulent challenge with Mycobacterium tuberculosis is induced mainly by previous immunization with living attenuated mycobacteria, and it has been hypothesized that secreted proteins serve as major targets in the specific immune response. To identify and purify molecules present in culture medium filtrate which are dominant antigens during effective vaccination, a two-step selection procedure was used to select antigens able to interact with T lymphocytes and/or antibodies induced by immunization with living bacteria and to counterselect antigens interacting with the immune effectors induced by immunization with dead bacteria. A Mycobacterium bovis BCG 45/47-kDa antigen complex, present in BCG culture filtrate, has been previously identified and isolated (F. Romain, A. Laqueyrerie, P. Militzer, P. Pescher, P. Chavarot, M. Lagranderie, G. Auregan, M. Gheorghiu, and G. Marchal, Infect. Immun. 61:742-750, 1993). Since the cognate antibodies recognize the very same antigens present in M. tuberculosis culture medium filtrates, a project was undertaken to clone, express, and sequence the corresponding gene of M. tuberculosis. An M. tuberculosis shuttle cosmid library was transferred in Mycobacterium smegmatis and screened with a competitive enzyme-linked immunosorbent assay to detect the clones expressing the proteins. A clone containing a 40-kb DNA insert was selected, and by means of subcloning in Escherichia coli, a 2-kb fragment that coded for the molecules was identified. An open reading frame in the 2,061-nucleotide sequence codes for a secreted protein with a consensus signal peptide of 39 amino acids and a predicted molecular mass of 28,779 Da. The gene was referred to as apa because of the high percentages of proline (21.7%) and alanine (19%) in the purified protein. Southern hybridization analysis of digested total genomic DNA from M. tuberculosis (reference strains H37Rv and H37Ra) indicated that the apa gene was present as a

  12. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate

    PubMed Central

    Mangold, Elisabeth; Böhmer, Anne C.; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E.; Nöthen, Markus M.; Borck, Guntram; Aldhorae, Khalid A.; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U.

    2016-01-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10−2). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10−5; ORallelic = 2.46 [95% CI 1.6–3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10−9). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  13. Sequences throughout the basic beta-1,3-glucanase mRNA coding region are targets for homology dependent post-transcriptional gene silencing.

    PubMed

    Jacobs; Sanders; Bots; Andriessen; Van Eldik GJ; Litière; Van Montagu M; Cornelissen

    1999-10-01

    In the transgenic tobacco line T17, plants homozygous for the gn1 transgene display developmentally regulated post-transcriptional silencing of basic beta-1,3-glucanase genes. Previously, it has been shown that silencing involves a markedly increased turnover of silencing-target glucanase mRNAs. Using a two-component viral reporter system facilitated a comparison, in a quantitat- ive manner, of the relative silencing efficiencies of various sequences derived from the gn1 transgene. The results show that target sites for the silencing mechanism are present throughout the coding region of the gn1 mRNA. Similar-sized coding region sequences along the entire gn1 mRNA display a similar susceptibility to the silencing mechanism. The susceptibility to silencing increases as the coding region elements increase in size. Relative to internal sequences, the 5' and 3' terminal regions of the gn1 mRNA are inefficient targets for the silencing machinery. Importantly, sequences of the gn1 transgene that are not part of the mature gn1 mRNA are not recognized by the silencing machinery when expressed in chimeric viral RNAs. These results show that the glucanase silencing mechanism in T17 plants is primarily directed against gn1 mRNA-internal sequences and that terminal sequences of the gn1 mRNA are relatively unaffected by the silencing mechanism.

  14. RNA sequencing identifies novel non-coding RNA and exon-specific effects associated with cigarette smoking.

    PubMed

    Parker, Margaret M; Chase, Robert P; Lamb, Andrew; Reyes, Alejandro; Saferali, Aabida; Yun, Jeong H; Himes, Blanca E; Silverman, Edwin K; Hersh, Craig P; Castaldi, Peter J

    2017-10-06

    Cigarette smoking is the leading modifiable risk factor for disease and death worldwide. Previous studies quantifying gene-level expression have documented the effect of smoking on mRNA levels. Using RNA sequencing, it is possible to analyze the impact of smoking on complex regulatory phenomena (e.g. alternative splicing, differential isoform usage) leading to a more detailed understanding of the biology underlying smoking-related disease. We used whole-blood RNA sequencing to describe gene and exon-level expression differences between 229 current and 286 former smokers in the COPDGene study. We performed differential gene expression and differential exon usage analyses using the voom/limma and DEXseq R packages. Samples from current and former smokers were compared while controlling for age, gender, race, lifetime smoke exposure, cell counts, and technical covariates. At an adjusted p-value <0.05, 171 genes were differentially expressed between current and former smokers. Differentially expressed genes included 7 long non-coding RNAs that have not been previously associated with smoking: LINC00599, LINC01362, LINC00824, LINC01624, RP11-563D10.1, RP11-98G13.1, AC004791.2. Secondary analysis of acute smoking (having smoked within 2-h) revealed 5 of the 171 smoking genes demonstrated an acute response above the baseline effect of chronic smoking. Exon-level analyses identified 9 exons from 8 genes with significant differential usage by smoking status, suggesting smoking-induced changes in isoform expression. Transcriptomic changes at the gene and exon levels from whole blood can refine our understanding of the molecular mechanisms underlying the response to smoking.

  15. Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected Sites

    PubMed Central

    Wong, Wendy S. W.; Yang, Ziheng; Goldman, Nick; Nielsen, Rasmus

    2004-01-01

    The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites. PMID:15514074

  16. TRENDS (Transport and Retention of Nuclides in Dominant Sequences): A code for modeling iodine behavior in containment during severe accidents

    SciTech Connect

    Weber, C.F.; Beahm, E.C.; Kress, T.S.; Daish, S.R.; Shockley, W.E.

    1989-01-01

    The ultimate aim of a description of iodine behavior in severe LWR accidents is a time-dependent accounting of iodine species released into containment and to the environment. Factors involved in the behavior of iodine can be conveniently divided into four general categories: (1) initial release into containment, (2) interaction of iodine species in containment not directly involving water pools, (3) interaction of iodine species in, or with, water pools, and (4) interaction with special systems such as ice condensers or gas treatment systems. To fill the large gaps in knowledge and to provide a means for assaying the iodine source term, this program has proceeded along two paths: (1) Experimental studies of the chemical behavior of iodine under containment conditions. (2) Development of TRENDS (Transport and Retention of Nuclides in Dominant Sequences), a computer code for modeling the behavior of iodine in containment and its release from containment. The main body of this report consists of a description of TRENDS. These two parts to the program are complementary in that models within TRENDS use data that were produced in the experimental program; therefore, these models are supported by experimental evidence that was obtained under conditions expected in severe accidents. 7 refs., 1 fig., 2 tabs.

  17. Mechanisms of Antisense Transcription Initiation from the 3′ End of the GAL10 Coding Sequence In Vivo

    PubMed Central

    Malik, Shivani; Durairaj, Geetha

    2013-01-01

    In spite of the important regulatory functions of antisense transcripts in gene expression, it remains unknown how antisense transcription is initiated. Recent studies implicated RNA polymerase II in initiation of antisense transcription. However, how RNA polymerase II is targeted to initiate antisense transcription has not been elucidated. Here, we have analyzed the association of RNA polymerase II with the antisense initiation site at the 3′ end of the GAL10 coding sequence in dextrose-containing growth medium that induces antisense transcription. We find that RNA polymerase II is targeted to the antisense initiation site at GAL10 by Reb1p activator as well as general transcription factors (e.g., TFIID, TFIIB, and Mediator) for antisense transcription initiation. Intriguingly, while GAL10 antisense transcription is dependent on TFIID, its sense transcription does not require TFIID. Further, the Gal4p activator that promotes GAL10 sense transcription is dispensable for antisense transcription. Moreover, the proteasome that facilitates GAL10 sense transcription does not control its antisense transcription. Taken together, our results reveal that GAL10 sense and antisense transcriptions are regulated differently and shed much light on the mechanisms of antisense transcription initiation. PMID:23836882

  18. Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence.

    PubMed

    Warnecke, Tobias; Weber, Claudia C; Hurst, Laurence D

    2009-08-01

    There is considerable variation in the rate at which different proteins evolve. Why is this? Classically, it has been considered that the density of functionally important sites must predict rates of protein evolution. Likewise, amino acid choice is usually assumed to reflect optimal protein function. In the present article, we briefly review evidence suggesting that this protein function-centred view is too simplistic. In particular, we concentrate on how selection acting during the protein's production history can also affect protein evolutionary rates and amino acid choice. Exploring the role of selection at the DNA and RNA level, we specifically address how the need (i) to specify exonic splice enhancer motifs in pre-mRNA, and (ii) to ensure nucleosome positioning on DNA have an impact on amino acid choice and rates of evolution. For both, we review evidence that sequence affected by more than one coding demand is particularly constrained. Strikingly, in mammals, splicing-related constraints are quantitatively as important as expression parameters in predicting rates of protein evolution. These results indicate that there is substantially more to protein evolution than protein functional constraints.

  19. Transactivation specificity is conserved among p53 family proteins and depends on a response element sequence code

    PubMed Central

    Ciribilli, Yari; Monti, Paola; Bisio, Alessandra; Nguyen, H. Thien; Ethayathulla, Abdul S.; Ramos, Ana; Foggetti, Giorgia; Menichini, Paola; Menendez, Daniel; Resnick, Michael A.; Viadiu, Hector; Fronza, Gilberto; Inga, Alberto

    2013-01-01

    Structural and biochemical studies have demonstrated that p73, p63 and p53 recognize DNA with identical amino acids and similar binding affinity. Here, measuring transactivation activity for a large number of response elements (REs) in yeast and human cell lines, we show that p53 family proteins also have overlapping transactivation profiles. We identified mutations at conserved amino acids of loops L1 and L3 in the DNA-binding domain that tune the transactivation potential nearly equally in p73, p63 and p53. For example, the mutant S139F in p73 has higher transactivation potential towards selected REs, enhanced DNA-binding cooperativity in vitro and a flexible loop L1 as seen in the crystal structure of the protein–DNA complex. By studying, how variations in the RE sequence affect transactivation specificity, we discovered a RE-transactivation code that predicts enhanced transactivation; this correlation is stronger for promoters of genes associated with apoptosis. PMID:23892287

  20. Cloning, nucleotide sequence, and expression of the Brucella melitensis omp31 gene coding for an immunogenic major outer membrane protein.

    PubMed Central

    Vizcaíno, N; Cloeckaert, A; Zygmunt, M S; Dubray, G

    1996-01-01

    The gene coding for the major outer membrane protein (OMP) of 31 to 34 kDa, now designated Omp31, of Brucella melitensis 16M was cloned and sequenced. A B. melitensis 16M genomic library was constructed in lambda GEM-12 XhoI half-site arms, and recombinant phages expressing omp31 were identified by using the anti-Omp31 monoclonal antibody (MAb) A59/10F09/G10. Subcloning of insert DNA from a positive phage into pGEM-7Zf allowed the selection of a plasmid bearing a 4.4-kb EcoRI fragment that seemed to contain the entire omp31 gene under control of its own promoter. omp31 was localized within a region of the EcoRI insert of approximately 1.1 kb. Sequencing of this region revealed an open reading frame of 720 bp encoding a protein of 240 amino acids and a predicted molecular mass of 25,307 Da. Cleavage of the first 19 amino acids, showing typical features of signal peptides for protein export, leaves a mature protein of 221 amino acids with a predicted molecular mass of 23,412 Da. The predicted amino acid sequence of B. melitensis 16M Omp31 showed 35.2% identity with the RopB OMP of Rhizobium leguminosarum bv. viciae 248 and 34.3% identity with Omp25 of B. abortus 544. As in Brucella spp., Omp31 was located in the outer membrane of recombinant Escherichia coli, but its reported peptidoglycan association in Brucella cells was not detected in E. coli. The ability of Omp31 to form oligomers resistant to sodium dodecyl sulfate denaturation at low temperatures, a characteristic described for several bacterial porins, was observed in both B. melitensis and recombinant E. coli. The epitope recognized by the anti-Omp31 MAb A59/10F09/G10, for which a protective activity has been suggested, has been delimited to a region of 36 amino acids of Omp31 covering the most hydrophilic part of the protein. The availability of recombinant Omp31 and the identification of the antigenic determinant recognized by MAb A59/10F09/G10 will allow the evaluation of their potential protective

  1. DNA polymorphism in morels: complete sequences of the internal transcribed spacer of genes coding for rRNA in Morchella esculenta (yellow morel) and Morchella conica (black morel).

    PubMed

    Wipf, D; Munch, J C; Botton, B; Buscot, F

    1996-09-01

    The internal transcribed spacer (ITS) of the gene coding for rRNA was sequenced in both directions with the gene walking technique in a black morel (Morchella conica) and a yellow morel (M. esculenta) to elucidate the ITS length discrepancy between the two species groups (750-bp ITS in black morels and 1,150-bp ITS in yellow morels.

  2. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    USDA-ARS?s Scientific Manuscript database

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  3. Full-length coding sequence for 12 bovine viral diarrhea virus isolates from persistently infected cattle in a feedyard in Kansas

    USDA-ARS?s Scientific Manuscript database

    We report here the full-length coding sequence of 12 bovine viral diarrhea virus (BVDV) isolates from persistently infected cattle from a feedyard in southwest Kansas, USA. These 12 genomes represent the three major genotypes (BVDV 1a, 1b, and 2a) of BVDV currently circulating in the United States....

  4. Sequence evaluation of FGF and FGFR gene conserved non-coding elements in non-syndromic cleft lip and palate cases.

    PubMed

    Riley, Bridget M; Murray, Jeffrey C

    2007-12-15

    Non-syndromic cleft lip and palate (NS CLP) is a complex birth defect resulting from multiple genetic and environmental factors. We have previously reported the sequencing of the coding region of genes in the fibroblast growth factor (FGF) signaling pathway, in which missense and non-sense mutations contribute to approximately 5%-6% NS CLP cases. In this article we report the sequencing of conserved non-coding elements (CNEs) in and around 11 of the FGF and FGFR genes, which identified 55 novel variants. Seven of variants are highly conserved among >/=8 species and 31 variants alter transcription factor binding sites, 8 of which are important for craniofacial development. Additionally, 15 NS CLP patients had a combination of coding mutations and CNE variants, suggesting that an accumulation of variants in the FGF signaling pathway may contribute to clefting. (c) 2007 Wiley-Liss, Inc.

  5. Incomplete

    ERIC Educational Resources Information Center

    Stauffer, Sandra L.

    2011-01-01

    Elizabeth Parker's reflection on her experience as a musician educator working with children in an urban non-profit context is an uncomfortable read for me. In a courageous act, Parker makes public her private misgivings about her past experience and allows scrutiny of them in the form of two public commentaries as well as the private musings of…

  6. Hfq assists small RNAs in binding to the coding sequence of ompD mRNA and in rearranging its structure

    PubMed Central

    Wroblewska, Zuzanna; Olejniczak, Mikolaj

    2016-01-01

    The bacterial protein Hfq participates in the regulation of translation by small noncoding RNAs (sRNAs). Several mechanisms have been proposed to explain the role of Hfq in the regulation by sRNAs binding to the 5′-untranslated mRNA regions. However, it remains unknown how Hfq affects those sRNAs that target the coding sequence. Here, the contribution of Hfq to the annealing of three sRNAs, RybB, SdsR, and MicC, to the coding sequence of Salmonella ompD mRNA was investigated. Hfq bound to ompD mRNA with tight, subnanomolar affinity. Moreover, Hfq strongly accelerated the rates of annealing of RybB and MicC sRNAs to this mRNA, and it also had a small effect on the annealing of SdsR. The experiments using truncated RNAs revealed that the contributions of Hfq to the annealing of each sRNA were individually adjusted depending on the structures of interacting RNAs. In agreement with that, the mRNA structure probing revealed different structural contexts of each sRNA binding site. Additionally, the annealing of RybB and MicC sRNAs induced specific conformational changes in ompD mRNA consistent with local unfolding of mRNA secondary structure. Finally, the mutation analysis showed that the long AU-rich sequence in the 5′-untranslated mRNA region served as an Hfq binding site essential for the annealing of sRNAs to the coding sequence. Overall, the data showed that the functional specificity of Hfq in the annealing of each sRNA to the ompD mRNA coding sequence was determined by the sequence and structure of the interacting RNAs. PMID:27154968

  7. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics.

  8. Deep sequencing of RNA from immune cell-derived vesicles uncovers the selective incorporation of small non-coding RNA biotypes with potential regulatory functions

    PubMed Central

    Nolte-’t Hoen, Esther N. M.; Buermans, Henk P. J.; Waasdorp, Maaike; Stoorvogel, Willem; Wauben, Marca H. M.; ’t Hoen, Peter A. C.

    2012-01-01

    Cells release RNA-carrying vesicles and membrane-free RNA/protein complexes into the extracellular milieu. Horizontal vesicle-mediated transfer of such shuttle RNA between cells allows dissemination of genetically encoded messages, which may modify the function of target cells. Other studies used array analysis to establish the presence of microRNAs and mRNA in cell-derived vesicles from many sources. Here, we used an unbiased approach by deep sequencing of small RNA released by immune cells. We found a large variety of small non-coding RNA species representing pervasive transcripts or RNA cleavage products overlapping with protein coding regions, repeat sequences or structural RNAs. Many of these RNAs were enriched relative to cellular RNA, indicating that cells destine specific RNAs for extracellular release. Among the most abundant small RNAs in shuttle RNA were sequences derived from vault RNA, Y-RNA and specific tRNAs. Many of the highly abundant small non-coding transcripts in shuttle RNA are evolutionary well-conserved and have previously been associated to gene regulatory functions. These findings allude to a wider range of biological effects that could be mediated by shuttle RNA than previously expected. Moreover, the data present leads for unraveling how cells modify the function of other cells via transfer of specific non-coding RNA species. PMID:22821563

  9. Construction of the coding sequence of the transcription variant 2 of the human Renalase gene and its expression in the prokaryotic system.

    PubMed

    Fedchenko, Valerii I; Kaloshin, Alexei A; Mezhevikina, Lyudmila M; Buneeva, Olga A; Medvedev, Alexei E

    2013-06-19

    Renalase is a recently discovered protein, involved in regulation of blood pressure in humans and animals. Although several splice variants of human renalase mRNA transcripts have been recognized, only one protein product, hRenalase1, has been found so far. In this study, we have used polymerase chain reaction (PCR)-based amplification of individual exons of the renalase gene and their joining for construction of full-length hRenalase2 coding sequence followed by expression of hRenalase2 as a polyHis recombinant protein in Escherichia coli cells. To date this is the first report on synthesis and purification of hRenalase2. Applicability of this approach was verified by constructing hRenalase1 coding sequence, its sequencing and expression in E. coli cells. hRenalase1 was used for generation of polyclonal antiserum in sheep. Western blot analysis has shown that polyclonal anti-renalase1 antibodies effectively interact with the hRenalase2 protein. The latter suggests that some functions and expression patterns of hRenalase1 documented by antibody-based data may be attributed to the presence of hRenalase2. The realized approach may be also used for construction of coding sequences of various (especially weakly expressible) genes, their transcript variants, etc.

  10. Incomplete penetrance in mitochondrial optic neuropathies.

    PubMed

    Caporali, Leonardo; Maresca, Alessandra; Capristo, Mariantonietta; Del Dotto, Valentina; Tagliavini, Francesca; Valentino, Maria Lucia; La Morgia, Chiara; Carelli, Valerio

    2017-07-14

    Incomplete penetrance characterizes the two most frequent inherited optic neuropathies, Leber's Hereditary Optic Neuropathy (LHON) and dominant optic atrophy (DOA), due to genetic errors in the mitochondrial DNA (mtDNA) and the nuclear DNA (nDNA), respectively. For LHON, compelling evidence has accumulated on the complex interplay of mtDNA haplogroups and environmental interacting factors, whereas the nDNA remains essentially non informative. However, a compensatory mechanism of activated mitochondrial biogenesis and increased mtDNA copy number, possibly driven by a permissive nDNA background, is documented in LHON; when successful it maintains unaffected the mutation carriers, but in some individuals it might be hampered by tobacco smoking or other environmental factors, resulting in disease onset. In females, mitochondrial biogenesis is promoted and maintained within the compensatory range by estrogens, partially explaining the gender bias in LHON. Concerning DOA, none of the above mechanisms has been fully explored, thus mtDNA haplogroups, environmental factors such as tobacco and alcohol, and further nDNA variants may all participate as protective factors or, on the contrary, favor disease expression and severity. Next generation sequencing, complemented by transcriptomics and proteomics, may provide some answers in the next future, even if the multifactorial model that seems to apply to incomplete penetrance in mitochondrial optic neuropathies remains problematic, and careful stratification of patients will play a key role for data interpretation. The deep understanding of which factors impinge on incomplete penetrance may shed light on the pathogenic mechanisms leading to optic nerve atrophy, on their possible compensation and, thus, on development of therapeutic strategies. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  11. Profile Likelihood and Incomplete Data.

    PubMed

    Zhang, Zhiwei

    2010-04-01

    According to the law of likelihood, statistical evidence is represented by likelihood functions and its strength measured by likelihood ratios. This point of view has led to a likelihood paradigm for interpreting statistical evidence, which carefully distinguishes evidence about a parameter from error probabilities and personal belief. Like other paradigms of statistics, the likelihood paradigm faces challenges when data are observed incompletely, due to non-response or censoring, for instance. Standard methods to generate likelihood functions in such circumstances generally require assumptions about the mechanism that governs the incomplete observation of data, assumptions that usually rely on external information and cannot be validated with the observed data. Without reliable external information, the use of untestable assumptions driven by convenience could potentially compromise the interpretability of the resulting likelihood as an objective representation of the observed evidence. This paper proposes a profile likelihood approach for representing and interpreting statistical evidence with incomplete data without imposing untestable assumptions. The proposed approach is based on partial identification and is illustrated with several statistical problems involving missing data or censored data. Numerical examples based on real data are presented to demonstrate the feasibility of the approach.

  12. Incomplete intestinal absorption of fructose.

    PubMed Central

    Kneepkens, C M; Vonk, R J; Fernandes, J

    1984-01-01

    Intestinal D-fructose absorption in 31 children was investigated using measurements of breath hydrogen. Twenty five children had no abdominal symptoms and six had functional bowel disorders. After ingestion of fructose (2 g/kg bodyweight), 22 children (71%) showed a breath hydrogen increase of more than 10 ppm over basal values, indicating incomplete absorption: the increase averaged 53 ppm, range 12 to 250 ppm. Four of these children experienced abdominal symptoms. Three of the six children with bowel disorders showed incomplete absorption. Seven children were tested again with an equal amount of glucose, and in three of them also of galactose, added to the fructose. The mean maximum breath hydrogen increases were 5 and 10 ppm, respectively, compared with 103 ppm after fructose alone. In one boy several tests were performed with various sugars; fructose was the only sugar incompletely absorbed, and the effect of glucose on fructose absorption was shown to be dependent on the amount added. It is concluded that children have a limited absorptive capacity for fructose. We speculate that the enhancing effect of glucose and galactose on fructose absorption may be due to activation of the fructose carrier. Apple juice in particular contains fructose in excess of glucose and could lead to abdominal symptoms in susceptible children. PMID:6476870

  13. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    PubMed Central

    Wicker, Thomas; Narechania, Apurva; Sabot, Francois; Stein, Joshua; Vu, Giang TH; Graner, Andreas; Ware, Doreen; Stein, Nils

    2008-01-01

    Background Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low cost. Based on the corresponding sequence reads a Mathematically Defined Repeat (MDR) index can be generated to map repetitive regions in genomic sequences. Results We have generated 574 Mbp of Illumina/Solexa sequences from barley total genomic DNA, representing about 10% of a genome equivalent. From these sequences we generated an MDR index which was then used to identify and mark repetitive regions in the barley genome. Comparison of the MDR plots with expert repeat annotation drawing on the information already available for known repetitive elements revealed a significant correspondence between the two methods. MDR-based annotation allowed for the identification of dozens of novel repeat sequences, though, which were not recognised by hand-annotation. The MDR data was also used to identify gene-containing regions by masking of repetitive sequences in eight de-novo sequenced bacterial artificial chromosome (BAC) clones. For half of the identified candidate gene islands indeed gene sequences could be identified. MDR data were only of limited use, when mapped on genomic sequences from the closely related species Triticum monococcum as only a fraction of the repetitive sequences was recognised. Conclusion An MDR index for barley, which was obtained by whole-genome Illumina/Solexa sequencing, proved as efficient in repeat identification as manual expert annotation. Circumventing the labour-intensive step of producing a specific repeat library for expert annotation, an MDR index provides an elegant and efficient resource for the identification of repetitive and low-copy (i.e. potentially gene-containing sequences) regions in uncharacterised genomic sequences. The restriction that a particular MDR index can not be used

  14. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  15. A statistical approach for distinguishing hybridization and incomplete lineage sorting.

    PubMed

    Joly, Simon; McLenachan, Patricia A; Lockhart, Peter J

    2009-08-01

    The extent and evolutionary significance of hybridization is difficult to evaluate because of the difficulty in distinguishing hybridization from incomplete lineage sorting. Here we present a novel parametric approach for statistically distinguishing hybridization from incomplete lineage sorting based on minimum genetic distances of a nonrecombining locus. It is based on the idea that the expected minimum genetic distance between sequences from two species is smaller for some hybridization events than for incomplete lineage sorting scenarios. When applied to empirical data sets, distributions can be generated for the minimum interspecies distances expected under incomplete lineage sorting using coalescent simulations. If the observed distance between sequences from two species is smaller than its predicted distribution, incomplete lineage sorting can be rejected and hybridization inferred. We demonstrate the power of the method using simulations and illustrate its application on New Zealand alpine buttercups (Ranunculus). The method is robust and complements existing approaches. Thus it should allow biologists to assess with greater accuracy the importance of hybridization in evolution.

  16. Molecular cloning and sequence analysis of the gene coding for the 57kDa soluble antigen of the salmonid fish pathogen Renibacterium salmoninarum

    USGS Publications Warehouse

    Chien, Maw-Sheng; Gilbert , Teresa L.; Huang, Chienjin; Landolt, Marsha L.; O'Hara, Patrick J.; Winton, James R.

    1992-01-01

    The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated Mr value of 57190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27–61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein in synthesized as a 557-amino acid precursor and processed to produce a mature protein of Mr 54505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene.

  17. Combinatorial variation in coding and promoter sequences of genes at the Tri locus in Pisum sativum accounts for variation in trypsin inhibitor activity in seeds.

    PubMed

    Page, D; Aubert, G; Duc, G; Welham, T; Domoney, C

    2002-05-01

    Cultivars of Pisum sativum that differ with respect to the quantitative expression of trypsin/chymotrypsin inhibitor proteins in seeds have been examined in terms of the structure of the corresponding genes. The patterns of divergence in the promoter and coding sequences are described, and the divergence among these exploited for the development of facile DNA-based assays to distinguish genotypes. Quantitative effects on gene expression may be attributed to the overall gene complement and to particular promoter/coding sequence combinations, as well as to the existence of distinct active-site variants that ultimately influence protein activity. Electronic supplementary material to this paper can be obtained by using the Springer LINK server located at http://dx.doi.org/10.1007/s00438-002-0667-4.

  18. Application of MELCOR Code to a French PWR 900 MWe Severe Accident Sequence and Evaluation of Models Performance Focusing on In-Vessel Thermal Hydraulic Results

    SciTech Connect

    De Rosa, Felice

    2006-07-01

    In the ambit of the Severe Accident Network of Excellence Project (SARNET), funded by the European Union, 6. FISA (Fission Safety) Programme, one of the main tasks is the development and validation of the European Accident Source Term Evaluation Code (ASTEC Code). One of the reference codes used to compare ASTEC results, coming from experimental and Reactor Plant applications, is MELCOR. ENEA is a SARNET member and also an ASTEC and MELCOR user. During the first 18 months of this project, we performed a series of MELCOR and ASTEC calculations referring to a French PWR 900 MWe and to the accident sequence of 'Loss of Steam Generator (SG) Feedwater' (known as H2 sequence in the French classification). H2 is an accident sequence substantially equivalent to a Station Blackout scenario, like a TMLB accident, with the only difference that in H2 sequence the scram is forced to occur with a delay of 28 seconds. The main events during the accident sequence are a loss of normal and auxiliary SG feedwater (0 s), followed by a scram when the water level in SG is equal or less than 0.7 m (after 28 seconds). There is also a main coolant pumps trip when {delta}Tsat < 10 deg. C, a total opening of the three relief valves when Tric (core maximal outlet temperature) is above 603 K (330 deg. C) and accumulators isolation when primary pressure goes below 1.5 MPa (15 bar). Among many other points, it is worth noting that this was the first time that a MELCOR 1.8.5 input deck was available for a French PWR 900. The main ENEA effort in this period was devoted to prepare the MELCOR input deck using the code version v.1.8.5 (build QZ Oct 2000 with the latest patch 185003 Oct 2001). The input deck, completely new, was prepared taking into account structure, data and same conditions as those found inside ASTEC input decks. The main goal of the work presented in this paper is to put in evidence where and when MELCOR provides good enough results and why, in some cases mainly referring to its

  19. Rare, Low-Frequency, and Common Variants in the Protein-Coding Sequence of Biological Candidate Genes from GWASs Contribute to Risk of Rheumatoid Arthritis

    PubMed Central

    Diogo, Dorothée; Kurreeman, Fina; Stahl, Eli A.; Liao, Katherine P.; Gupta, Namrata; Greenberg, Jeffrey D.; Rivas, Manuel A.; Hickey, Brendan; Flannick, Jason; Thomson, Brian; Guiducci, Candace; Ripke, Stephan; Adzhubey, Ivan; Barton, Anne; Kremer, Joel M.; Alfredsson, Lars; Sunyaev, Shamil; Martin, Javier; Zhernakova, Alexandra; Bowes, John; Eyre, Steve; Siminovitch, Katherine A.; Gregersen, Peter K.; Worthington, Jane; Klareskog, Lars; Padyukov, Leonid; Raychaudhuri, Soumya; Plenge, Robert M.

    2013-01-01

    The extent to which variants in the protein-coding sequence of genes contribute to risk of rheumatoid arthritis (RA) is unknown. In this study, we addressed this issue by deep exon sequencing and large-scale genotyping of 25 biological candidate genes located within RA risk loci discovered by genome-wide association studies (GWASs). First, we assessed the contribution of rare coding variants in the 25 genes to the risk of RA in a pooled sequencing study of 500 RA cases and 650 controls of European ancestry. We observed an accumulation of rare nonsynonymous variants exclusive to RA cases in IL2RA and IL2RB (burden test: p = 0.007 and p = 0.018, respectively). Next, we assessed the aggregate contribution of low-frequency and common coding variants to the risk of RA by dense genotyping of the 25 gene loci in 10,609 RA cases and 35,605 controls. We observed a strong enrichment of coding variants with a nominal signal of association with RA (p < 0.05) after adjusting for the best signal of association at the loci (penrichment = 6.4 × 10−4). For one locus containing CD2, we found that a missense variant, rs699738 (c.798C>A [p.His266Gln]), and a noncoding variant, rs624988, reside on distinct haplotypes and independently contribute to the risk of RA (p = 4.6 × 10−6). Overall, our results indicate that variants (distributed across the allele-frequency spectrum) within the protein-coding portion of a subset of biological candidate genes identified by GWASs contribute to the risk of RA. Further, we have demonstrated that very large sample sizes will be required for comprehensively identifying the independent alleles contributing to the missing heritability of RA. PMID:23261300

  20. Characterization of the FoxL2 proximal promoter and coding sequence from the common snapping turtle (Chelydra serpentina).

    PubMed

    Guo, Lei; Rhen, Turk

    2017-10-01

    Sex is determined by temperature during embryogenesis in snapping turtles, Chelydra serpentina. Previous studies in this species show that dihydrotestosterone (DHT) induces ovarian development at temperatures that normally produce males or mixed sex ratios. The feminizing effect of DHT is associated with increased expression of FoxL2, suggesting that androgens regulate transcription of FoxL2. To test this hypothesis, we cloned the proximal promoter (1.6kb) and coding sequence for snapping turtle FoxL2 (tFoxL2) in frame with mCherry to produce a fluorescent reporter. The tFoxL2-mCherry fusion plasmid or mCherry control plasmid were stably transfected into mouse KK1 granulosa cells. These cells were then treated with 0, 1, 10, or 100nM DHT to assess androgen effects on tFoxL2-mCherry expression. In contrast to the main hypothesis, DHT did not alter expression of the tFoxL2-mCherry reporter. However, normal serum increased expression of tFoxL2-mCherry when compared to charcoal-stripped serum, indicating that the cloned region of tFoxL2 contains cis regulatory elements. We also used the tFoxL2-mCherry plasmid as an expression vector to test the hypothesis that DHT and tFoxL2 interact to regulate expression of endogenous genes in granulosa cells. While tFoxL2-mCherry and DHT had independent effects on mouse FoxL2, FshR, GnRHR, and StAR expression, tFoxL2-mCherry potentiated low concentration DHT effects on mouse aromatase expression. Further studies will be required to determine whether synergistic regulation of aromatase by DHT and FoxL2 also occurs in turtle gonads during the sex-determining period, which would explain the feminizing effect of DHT in this species. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Identification of a cDNA clone that contains the complete coding sequence for a 140-kD rat NCAM polypeptide

    PubMed Central

    1987-01-01

    Neural cell adhesion molecules (NCAMs) are cell surface glycoproteins that appear to mediate cell-cell adhesion. In vertebrates NCAMs exist in at least three different polypeptide forms of apparent molecular masses 180, 140, and 120 kD. The 180- and 140-kD forms span the plasma membrane whereas the 120-kD form lacks a transmembrane region. In this study, we report the isolation of NCAM clones from an adult rat brain cDNA library. Sequence analysis indicated that the longest isolate, pR18, contains a 2,574 nucleotide open reading frame flanked by 208 bases of 5' and 409 bases of 3' untranslated sequence. The predicted polypeptide encoded by clone pR18 contains a single membrane-spanning region and a small cytoplasmic domain (120 amino acids), suggesting that it codes for a full-length 140-kD NCAM form. In Northern analysis, probes derived from 5' sequences of pR18, which presumably code for extracellular portions of the molecule hybridized to five discrete mRNA size classes (7.4, 6.7, 5.2, 4.3, and 2.9 kb) in adult rat brain but not to liver or muscle RNA. However, the 5.2- and 2.9-kb mRNA size classes did not hybridize to either a large restriction fragment or three oligonucleotides derived from the putative transmembrane coding region and regions that lie 3' to it. The 3' probes did hybridize to the 7.4-, 6.7-, and 4.3-kb message size classes. These combined results indicate that clone pR18 is derived from either the 7.4-, 6.7-, or 4.3- kb adult rat brain RNA size class. Comparison with chicken and mouse NCAM cDNA sequences suggests that pR18 represents the amino acid coding region of the 6.7- or 4.3-kb mRNA. The isolation of pR18, the first cDNA that contains the complete coding sequence of an NCAM polypeptide, unambiguously demonstrates the predicted linear amino acid sequence of this probable rat 140-kD polypeptide. This cDNA also contains a 30-base pair segment not found in NCAM cDNAs isolated from other species. The significance of this segment and other

  2. Cloning, genetic analysis, and nucleotide sequence of a determinant coding for a 19-kilodalton peptidoglycan-associated protein (Ppl) of Legionella pneumophila.

    PubMed Central

    Ludwig, B; Schmid, A; Marre, R; Hacker, J

    1991-01-01

    A genomic library of Legionella pneumophila, the causative agent of Legionnaires disease in humans, was constructed in Escherichia coli K-12, and the recombinant clones were screened by immuno-colony blots with an antiserum raised against heat-killed L. pneumophila. Twenty-three clones coding for a Legionella-specific protein of 19 kDa were isolated. The 19-kDa protein, which represents an outer membrane protein, was found to be associated with the peptidoglycan layer both in L. pneumophila and in the recombinant E. coli clones. This was shown by electrophoresis and Western immunoblot analysis of bacterial cell membrane fractions with a monospecific polyclonal 19-kDa protein-specific antiserum. The protein was termed peptidoglycan-associated protein of L. pneumophila (Ppl). The corresponding genetic determinant, ppl, was subcloned on a 1.8-kb ClaI fragment. DNA sequence studies revealed that two open reading frames, pplA and pplB, coding for putative proteins of 18.9 and 16.8 kDa, respectively, were located on the ClaI fragment. Exonuclease III digestion studies confirmed that pplA is the gene coding for the peptidoglycan-associated 19-kDa protein of L. pneumophila. The amino acid sequence of PplA exhibits a high degree of homology to the sequences of the Pal lipoproteins of E. coli K-12 and Haemophilus influenzae. Images PMID:1855972

  3. Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

    PubMed Central

    Philippe, Nicolas; Bou Samra, Elias; Boureux, Anthony; Mancheron, Alban; Rufflé, Florence; Bai, Qiang; De Vos, John; Rivals, Eric; Commes, Thérèse

    2014-01-01

    Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct. PMID:24357408

  4. Patient validation of cues and concerns identified according to Verona coding definitions of emotional sequences (VR-CoDES): a video- and interview-based approach.

    PubMed

    Eide, Hilde; Eide, Tom; Rustøen, Tone; Finset, Arnstein

    2011-02-01

    A challenging but main task for clinicians is to identify patients' concerns related to their medical conditions. The study aim was to validate a new coding scheme for identifying patients' cues and concerns. 12 videotaped consultations between nurses and pain patients were coded according to the Verona Coding Scheme for Emotional Sequences (VR-CoDES). During a metainterview each patient watched his/her own video interview with the researcher to confirm or disconfirm the identified cues and concerns. A directive or an open format was applied. Quantitative and qualitative data analyses were performed. Patients' confirmation in relation to the coding gave a sensitivity of 0.95 and specificity of 0.99 in the directive format and a sensitivity of 0.99 and specificity of 0.70 applying the open format. Through a qualitative analysis 83% of researcher-identified cues and concerns were validated. 17% were not confirmed or uncertain. The VR-CoDES seems to capture what are experienced as real concerns to patients, and proves to be a coding scheme with a high degree of ecological validity. The VR-CoDES provides a valid framework for detecting patients' cues and concerns, and should be explored as a training tool to develop clinicians' empathic accuracy. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  5. Development of the Verona coding definitions of emotional sequences to code health providers' responses (VR-CoDES-P) to patient cues and concerns.

    PubMed

    Del Piccolo, Lidia; de Haes, Hanneke; Heaven, Cathy; Jansen, Jesse; Verheul, William; Bensing, Jozien; Bergvik, Svein; Deveugele, Myriam; Eide, Hilde; Fletcher, Ian; Goss, Claudia; Humphris, Gerry; Kim, Young-Mi; Langewitz, Wolf; Mazzi, Maria Angela; Mjaaland, Trond; Moretti, Francesca; Nübling, Matthias; Rimondini, Michela; Salmon, Peter; Sibbern, Tonje; Skre, Ingunn; van Dulmen, Sandra; Wissow, Larry; Young, Bridget; Zandbelt, Linda; Zimmermann, Christa; Finset, Arnstein

    2011-02-01

    To present a method to classify health provider responses to patient cues and concerns according to the VR-CoDES-CC (Del Piccolo et al. (2009) [2] and Zimmermann et al. (submitted for publication) [3]). The system permits sequence analysis and a detailed description of how providers handle patient's expressions of emotion. The Verona-CoDES-P system has been developed based on consensus views within the "Verona Network of Sequence Analysis". The different phases of the creation process are described in detail. A reliability study has been conducted on 20 interviews from a convenience sample of 104 psychiatric consultations. The VR-CoDES-P has two main classes of provider responses, corresponding to the degree of explicitness (yes/no) and space (yes/no) that is given by the health provider to each cue/concern expressed by the patient. The system can be further subdivided into 17 individual categories. Statistical analyses showed that the VR-CoDES-P is reliable (agreement 92.86%, Cohen's kappa 0.90 (±0.04) p<0.0001). Once validity and reliability are tested in different settings, the system should be applied to investigate the relationship between provider responses to patients' expression of emotions and outcome variables. Research employing the VR-CoDES-P should be applied to develop research-based approaches to maximize appropriate responses to patients' indirect and overt expressions of emotional needs. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  6. Item Calibration in Incomplete Testing Designs

    ERIC Educational Resources Information Center

    Eggen, Theo J. H. M.; Verhelst, Norman D.

    2011-01-01

    This study discusses the justifiability of item parameter estimation in incomplete testing designs in item response theory. Marginal maximum likelihood (MML) as well as conditional maximum likelihood (CML) procedures are considered in three commonly used incomplete designs: random incomplete, multistage testing and targeted testing designs.…

  7. Semantic Borders and Incomplete Understanding.

    PubMed

    Silva-Filho, Waldomiro J; Dazzani, Maria Virgínia

    2016-03-01

    In this article, we explore a fundamental issue of Cultural Psychology, that is our "capacity to make meaning", by investigating a thesis from contemporary philosophical semantics, namely, that there is a decisive relationship between language and rationality. Many philosophers think that for a person to be described as a rational agent he must understand the semantic content and meaning of the words he uses to express his intentional mental states, e.g., his beliefs and thoughts. Our argument seeks to investigate the thesis developed by Tyler Burge, according to which our mastery or understanding of the semantic content of the terms which form our beliefs and thoughts is an "incomplete understanding". To do this, we discuss, on the one hand, the general lines of anti-individualism or semantic externalism and, on the other, criticisms of the Burgean notion of incomplete understanding - one radical and the other moderate. We defend our understanding that the content of our beliefs must be described in the light of the limits and natural contingencies of our cognitive capacities and the normative nature of our rationality. At heart, anti-individualism leads us to think about the fact that we are social creatures, living in contingent situations, with important, but limited, cognitive capacities, and that we receive the main, and most important, portion of our knowledge simply from what others tell us. Finally, we conclude that this discussion may contribute to the current debate about the notion of borders.

  8. Simulated data supporting inbreeding rate estimates from incomplete pedigrees

    USGS Publications Warehouse

    Miller, Mark P.

    2017-01-01

    This data release includes:(1) The data from simulations used to illustrate the behavior of inbreeding rate estimators. Estimating inbreeding rates is particularly difficult for natural populations because parentage information for many individuals may be incomplete. Our analyses illustrate the behavior of a newly-described inbreeding rate estimator that outperforms previously described approaches in the scientific literature.(2) Python source code ("analytical expressions", "computer simulations", and "empricial data set") that can be used to analyze these data.

  9. A bacterial genetic screen identifies functional coding sequences of the insect mariner transposable element Famar1 amplified from the genome of the earwig, Forficula auricularia.

    PubMed Central

    Barry, Elizabeth G; Witherspoon, David J; Lampe, David J

    2004-01-01

    Transposons of the mariner family are widespread in animal genomes and have apparently infected them by horizontal transfer. Most species carry only old defective copies of particular mariner transposons that have diverged greatly from their active horizontally transferred ancestor, while a few contain young, very similar, and active copies. We report here the use of a whole-genome screen in bacteria to isolate somewhat diverged Famar1 copies from the European earwig, Forficula auricularia, that encode functional transposases. Functional and nonfunctional coding sequences of Famar1 and nonfunctional copies of Ammar1 from the European honey bee, Apis mellifera, were sequenced to examine their molecular evolution. No selection for sequence conservation was detected in any clade of a tree derived from these sequences, not even on branches leading to functional copies. This agrees with the current model for mariner transposon evolution that expects neutral evolution within particular hosts, with selection for function occurring only upon horizontal transfer to a new host. Our results further suggest that mariners are not finely tuned genetic entities and that a greater amount of sequence diversification than had previously been appreciated can occur in functional copies in a single host lineage. Finally, this method of isolating active copies can be used to isolate other novel active transposons without resorting to reconstruction of ancestral sequences. PMID:15020471

  10. Sequence Diversity of the oprI Gene, Coding for Major Outer Membrane Lipoprotein I, among rRNA Group I Pseudomonads

    PubMed Central

    De Vos, Daniel; Bouton, Christiane; Sarniguet, Alain; De Vos, Paul; Vauterin, Marc; Cornelis, Pierre

    1998-01-01

    The sequence of oprI, the gene coding for the major outer membrane lipoprotein I, was determined by PCR sequencing for representatives of 17 species of rRNA group I pseudomonads, with a special emphasis on Pseudomonas aeruginosa and Pseudomonas fluorescens. Within the P. aeruginosa species, oprI sequences for 25 independent isolates were found to be identical, except for one silent substitution at position 96. The oprI sequences diverged more for the other rRNA group I pseudomonads (85 to 91% similarity with P. aeruginosa oprI). An accumulation of silent and also (but to a much lesser extent) nonsilent substitutions in the different sequences was found. A clustering according to the respective presence and/or positions of the HaeIII, PvuII, and SphI sites could also be obtained. A sequence cluster analysis showed a rather widespread distribution of P. fluorescens isolates. All other rRNA group I pseudomonads clustered in a manner that was in agreement with other studies, showing that the oprI gene can be useful as a complementary phylogenetic marker for classification of rRNA group I pseudomonads. PMID:9851998

  11. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics’ GemCode Sequencing Data

    PubMed Central

    Coombe, Lauren; Jackman, Shaun D.; Yang, Chen; Vandervalk, Benjamin P.; Moore, Richard A.; Pleasance, Stephen; Coope, Robin J.; Bohlmann, Joerg; Holt, Robert A.; Jones, Steven J. M.; Birol, Inanc

    2016-01-01

    The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly. PMID:27632164

  12. A computer program for estimation from incomplete multinomial data

    NASA Technical Reports Server (NTRS)

    Credeur, K. R.

    1978-01-01

    Coding is given for maximum likelihood and Bayesian estimation of the vector p of multinomial cell probabilities from incomplete data. Also included is coding to calculate and approximate elements of the posterior mean and covariance matrices. The program is written in FORTRAN 4 language for the Control Data CYBER 170 series digital computer system with network operating system (NOS) 1.1. The program requires approximately 44000 octal locations of core storage. A typical case requires from 72 seconds to 92 seconds on CYBER 175 depending on the value of the prior parameter.

  13. Cost-effective screening of DNMT3A coding sequence identifies somatic mutation in pediatric T-cell acute lymphoblastic leukemia.

    PubMed

    Szarzyńska-Zawadzka, Bronisława; Kosmalska, Maria; Sędek, Łukasz; Sonsala, Alicja; Twardoch, Magdalena; Kowalczyk, Jerzy R; Szczepański, Tomasz; Witt, Michał; Dawidowska, Małgorzata

    2017-09-14

    In pediatric T-cell acute lymphoblastic leukemia (T-ALL) risk assignment schemes preclude reliable prediction of outcome and thus new prognostic factors are needed. Mutations in DNMT3A are candidate prognostic and classification markers in adults with acute myeloid leukemia (AML) and T-ALL and thus were considered as candidates prognostic markers in pediatric T-ALL. DNMT3A mutational status was investigated in 74 pediatric T-ALL samples collected at diagnosis. We applied high resolution melt (HRM) analysis and Sanger sequencing to study the hotspot position (R882) within catalytic MTase domain and exons coding for other functional domains of the protein, known to be mutated in the wide spectrum of hematological malignancies. We demonstrate a low frequency of mutations in DNMT3A coding sequence in pediatric T-ALL (1.4%, n=1/74). We identified missense mutation, p.Ala644Thr, which has not been described previously in pediatric T-ALL, but is recurrent in adults with T-ALL and AML. Low frequency of DNMT3A mutations in pediatric T-ALL is in striking contrast to adult T-ALL and renders the necessity for the search of other candidate prognostic markers. Combined Sanger sequencing-HRM approach offers a cost-effective option for genotyping DNMT3A coding sequence, with potential clinical application in other hematological malignancies. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  14. Identification of a Coding Sequence and Structure Modeling of a Glycine-Rich RNA-Binding Protein (CmGRP1) from Chelidonium majus L.

    PubMed

    Nawrot, Robert; Tomaszewski, Lukasz; Czerwoniec, Anna; Goździcka-Józefiak, Anna

    2013-01-01

    The family of glycine-rich plant proteins (GRPs) is a large and complex group of proteins that share, as a common feature, the presence of glycine-rich domains arranged in (Gly)n-X repeats that are suggested to be involved in protein-protein interactions, RNA binding, and nucleolar targeting. These proteins are implicated in several independent physiological processes. Some are components of cell walls of many higher plants, while others are involved in molecular responses to environmental stress, and mediated by post-transcriptional regulatory mechanisms. The goals of this study are to identify the coding sequence of a novel glycine-rich RNA-binding protein from Chelidonium majus and to propose its structural model. DNA fragments obtained using degenerate PCR primers showed high sequence identities with glycine-rich RNA-binding protein coding sequences from different plant species. A 439-bp nucleotide sequence is identified coding for a novel polypeptide composed of 146 amino acids, designated as CmGRP1 (C. majus glycine-rich protein 1), with a calculated MW of 14,931 Da (NCBI GenBank accession no. HM173636). Using NCBI CDD and GeneSilico MetaServer, a single conserved domain, the RNA recognition motif (RRM), was detected in CmGRP1. The C-terminal region of CmGRP1 is a glycine-rich motif (GGGGxxGxGGGxxG), and it is predicted to be disordered. Based on a 1fxl crystal structure, a 3D model of CmGRP1 is proposed. CmGRP1 can be classified as a class IVa plant GRP, implicated to play a role in plant defense.

  15. Major breeding plumage color differences of male ruffs (Philomachus pugnax) are not associated with coding sequence variation in the MC1R gene.

    PubMed

    Farrell, Lindsay L; Küpper, Clemens; Burke, Terry; Lank, David B

    2015-01-01

    Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species.

  16. Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks.

    PubMed

    Peloso, Gina M; Auer, Paul L; Bis, Joshua C; Voorman, Arend; Morrison, Alanna C; Stitziel, Nathan O; Brody, Jennifer A; Khetarpal, Sumeet A; Crosby, Jacy R; Fornage, Myriam; Isaacs, Aaron; Jakobsdottir, Johanna; Feitosa, Mary F; Davies, Gail; Huffman, Jennifer E; Manichaikul, Ani; Davis, Brian; Lohman, Kurt; Joon, Aron Y; Smith, Albert V; Grove, Megan L; Zanoni, Paolo; Redon, Valeska; Demissie, Serkalem; Lawson, Kim; Peters, Ulrike; Carlson, Christopher; Jackson, Rebecca D; Ryckman, Kelli K; Mackey, Rachel H; Robinson, Jennifer G; Siscovick, David S; Schreiner, Pamela J; Mychaleckyj, Josyf C; Pankow, James S; Hofman, Albert; Uitterlinden, Andre G; Harris, Tamara B; Taylor, Kent D; Stafford, Jeanette M; Reynolds, Lindsay M; Marioni, Riccardo E; Dehghan, Abbas; Franco, Oscar H; Patel, Aniruddh P; Lu, Yingchang; Hindy, George; Gottesman, Omri; Bottinger, Erwin P; Melander, Olle; Orho-Melander, Marju; Loos, Ruth J F; Duga, Stefano; Merlini, Piera Angelica; Farrall, Martin; Goel, Anuj; Asselta, Rosanna; Girelli, Domenico; Martinelli, Nicola; Shah, Svati H; Kraus, William E; Li, Mingyao; Rader, Daniel J; Reilly, Muredach P; McPherson, Ruth; Watkins, Hugh; Ardissino, Diego; Zhang, Qunyuan; Wang, Judy; Tsai, Michael Y; Taylor, Herman A; Correa, Adolfo; Griswold, Michael E; Lange, Leslie A; Starr, John M; Rudan, Igor; Eiriksdottir, Gudny; Launer, Lenore J; Ordovas, Jose M; Levy, Daniel; Chen, Y-D Ida; Reiner, Alexander P; Hayward, Caroline; Polasek, Ozren; Deary, Ian J; Borecki, Ingrid B; Liu, Yongmei; Gudnason, Vilmundur; Wilson, James G; van Duijn, Cornelia M; Kooperberg, Charles; Rich, Stephen S; Psaty, Bruce M; Rotter, Jerome I; O'Donnell, Christopher J; Rice, Kenneth; Boerwinkle, Eric; Kathiresan, Sekar; Cupples, L Adrienne

    2014-02-06

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncertain whether the PCSK9 example represents a paradigm or an isolated exception. We used the "Exome Array" to genotype >200,000 low-frequency and rare coding sequence variants across the genome in 56,538 individuals (42,208 European ancestry [EA] and 14,330 African ancestry [AA]) and tested these variants for association with LDL-C, high-density lipoprotein cholesterol (HDL-C), and triglycerides. Although we did not identify new genes associated with LDL-C, we did identify four low-frequency (frequencies between 0.1% and 2%) variants (ANGPTL8 rs145464906 [c.361C>T; p.Gln121*], PAFAH1B2 rs186808413 [c.482C>T; p.Ser161Leu], COL18A1 rs114139997 [c.331G>A; p.Gly111Arg], and PCSK7 rs142953140 [c.1511G>A; p.Arg504His]) with large effects on HDL-C and/or triglycerides. None of these four variants was associated with risk for CHD, suggesting that examples of low-frequency coding variants with robust effects on both lipids and CHD will be limited.

  17. Next-gen sequencing identifies non-coding variation disrupting miRNA binding sites in neurological disorders

    PubMed Central

    Devanna, Paolo; Chen, Xiaowei Sylvia; Ho, Joses; Gajewski, Dario; Smith, Shelley D.; Gialluisi, Alessandro; Francks, Clyde; Fisher, Simon E.; Newbury, Dianne; Vernes, Sonja C.

    2017-01-01

    Understanding the genetic factors underlying neurodevelopmental and neuropsychiatric disorders is a major challenge given their prevalence and potential severity for quality of life. While large scale genomic screens have made major advances in this area, for many disorders the genetic underpinnings are complex and poorly understood. To date the field has focused predominantly on protein coding variation, but given the importance of tightly controlled gene expression for normal brain development and disorder, variation that affects non-coding regulatory regions of the genome are likely to play an important role in these phenotypes. Herein we show the importance of 3’UTR non-coding regulatory variants across neurodevelopmental and neuropsychiatric disorders. We devised a pipeline for identifying and functionally validating putatively pathogenic variants from NGS data. We applied this pipeline to a cohort of children with severe specific language impairment (SLI) and identified a functional, SLI-associated variant affecting gene regulation in cells and post-mortem human brain. This variant, and the affected gene (ARHGEF39), represent new putative risk factors for SLI. Furthermore, we identified 3’UTR regulatory variants across autism, schizophrenia and bipolar disorder NGS cohorts demonstrating their impact on neurodevelopmental and neuropsychiatric disorders. Our findings show the importance of investigating non-coding regulatory variants when determining risk factors contributing to neurodevelopmental and neuropsychiatric disorders. In the future, integration of such regulatory variation with protein coding changes will be essential for uncovering the genetic causes of complex neurological disorders and the fundamental mechanisms underlying health and disease. PMID:28289279

  18. Performance analysis of hybrid ARQ protocols in a slotted direct-sequence code-division multiple-access network - Jamming analysis

    NASA Astrophysics Data System (ADS)

    Hanratty, Joseph M.; Stuber, Gordon L.

    1990-05-01

    An examination is made of the performance of type-I hybrid ARQ (automatic repeat request) protocols in a slotted direct-sequence CDMA (code-division multiple access) network operating in a hostile jamming environment. The network consists of an arbitrary number of transceivers arranged in a paired-off topology. The traffic arrival process is derived by means of a Markov model. Throughput-delay expressions are derived in terms of the channel cutoff rate and capacity. The effects of jammer state information are discussed. Network design parameters are identified and their dependency on system parameters is examined in detail. It is shown that, for a given population size, traffic intensity, and bit energy/jammer noise ratio, there is an optimal probability of retransmission, code rate, and processing gain that maximizes network performance in the presence of worst-case pulse jamming.

  19. An ultraconserved Hox-Pbx responsive element resides in the coding sequence of Hoxa2 and is active in rhombomere 4.

    PubMed

    Lampe, Xavier; Samad, Omar Abdel; Guiguen, Allan; Matis, Christelle; Remacle, Sophie; Picard, Jacques J; Rijli, Filippo M; Rezsohazy, René

    2008-06-01

    The Hoxa2 gene has a fundamental role in vertebrate craniofacial and hindbrain patterning. Segmental control of Hoxa2 expression is crucial to its function and several studies have highlighted transcriptional regulatory elements governing its activity in distinct rhombomeres. Here, we identify a putative Hox-Pbx responsive cis-regulatory sequence, which resides in the coding sequence of Hoxa2 and is an important component of Hoxa2 regulation in rhombomere (r) 4. By using cell transfection and chromatin immunoprecipitation (ChIP) assays, we show that this regulatory sequence is responsive to paralogue group 1 and 2 Hox proteins and to their Pbx co-factors. Importantly, we also show that the Hox-Pbx element cooperates with a previously reported Hoxa2 r4 intronic enhancer and that its integrity is required to drive specific reporter gene expression in r4 upon electroporation in the chick embryo hindbrain. Thus, both intronic as well as exonic regulatory sequences are involved in Hoxa2 segmental regulation in the developing r4. Finally, we found that the Hox-Pbx exonic element is embedded in a larger 205-bp long ultraconserved genomic element (UCE) shared by all vertebrate genomes. In this respect, our data further support the idea that extreme conservation of UCE sequences may be the result of multiple superposed functional and evolutionary constraints.

  20. An ultraconserved Hox–Pbx responsive element resides in the coding sequence of Hoxa2 and is active in rhombomere 4

    PubMed Central

    Lampe, Xavier; Samad, Omar Abdel; Guiguen, Allan; Matis, Christelle; Remacle, Sophie; Picard, Jacques J.; Rezsohazy, René

    2008-01-01

    The Hoxa2 gene has a fundamental role in vertebrate craniofacial and hindbrain patterning. Segmental control of Hoxa2 expression is crucial to its function and several studies have highlighted transcriptional regulatory elements governing its activity in distinct rhombomeres. Here, we identify a putative Hox–Pbx responsive cis-regulatory sequence, which resides in the coding sequence of Hoxa2 and is an important component of Hoxa2 regulation in rhombomere (r) 4. By using cell transfection and chromatin immunoprecipitation (ChIP) assays, we show that this regulatory sequence is responsive to paralogue group 1 and 2 Hox proteins and to their Pbx co-factors. Importantly, we also show that the Hox–Pbx element cooperates with a previously reported Hoxa2 r4 intronic enhancer and that its integrity is required to drive specific reporter gene expression in r4 upon electroporation in the chick embryo hindbrain. Thus, both intronic as well as exonic regulatory sequences are involved in Hoxa2 segmental regulation in the developing r4. Finally, we found that the Hox–Pbx exonic element is embedded in a larger 205-bp long ultraconserved genomic element (UCE) shared by all vertebrate genomes. In this respect, our data further support the idea that extreme conservation of UCE sequences may be the result of multiple superposed functional and evolutionary constraints. PMID:18417536

  1. A Sabin 3-Derived Poliovirus Recombinant Contained a Sequence Homologous with Indigenous Human Enterovirus Species C in the Viral Polymerase Coding Region†

    PubMed Central

    Arita, Minetaro; Zhu, Shuang-Li; Yoshida, Hiromu; Yoneyama, Tetsuo; Miyamura, Tatsuo; Shimizu, Hiroyuki

    2005-01-01

    Outbreaks of poliomyelitis caused by circulating vaccine-derived polioviruses (cVDPVs) have been reported in areas where indigenous wild polioviruses (PVs) were eliminated by vaccination. Most of these cVDPVs contained unidentified sequences in the nonstructural protein coding region which were considered to be derived from human enterovirus species C (HEV-C) by recombination. In this study, we report isolation of a Sabin 3-derived PV recombinant (Cambodia-02) from an acute flaccid paralysis (AFP) case in Cambodia in 2002. We attempted to identify the putative recombination counterpart of Cambodia-02 by sequence analysis of nonpolio enterovirus isolates from AFP cases in Cambodia from 1999 to 2003. Based on the previously estimated evolution rates of PVs, the recombination event resulting in Cambodia-02 was estimated to have occurred within 6 months after the administration of oral PV vaccine (99.3% nucleotide identity in VP1 region). The 2BC and the 3Dpol coding regions of Cambodia-02 were grouped into the genetic cluster of indigenous coxsackie A virus type 17 (CAV17) (the highest [87.1%] nucleotide identity) and the cluster of indigenous CAV13-CAV18 (the highest [94.9%] nucleotide identity) by the phylogenic analysis of the HEV-C isolates in 2002, respectively. CAV13-CAV18 and CAV17 were the dominant HEV-C serotypes in 2002 but not in 2001 and in 2003. We found a putative recombination between CAV13-CAV18 and CAV17 in the 3CDpro coding region of a CAV17 isolate. These results suggested that a part of the 3Dpol coding region of PV3(Cambodia-02) was derived from a HEV-C strain genetically related to indigenous CAV13-CAV18 strains in 2002 in Cambodia. PMID:16188967

  2. Computational performance of SequenceL coding of the lattice Boltzmann method for multi-particle flow simulations

    NASA Astrophysics Data System (ADS)

    Başağaoğlu, Hakan; Blount, Justin; Blount, Jarred; Nelson, Bryant; Succi, Sauro; Westhart, Phil M.; Harwell, John R.

    2017-04-01

    This paper reports, for the first time, the computational performance of SequenceL for mesoscale simulations of large numbers of particles in a microfluidic device via the lattice-Boltzmann method. The performance of SequenceL simulations was assessed against the optimized serial and parallelized (via OpenMP directives) FORTRAN90 simulations. At present, OpenMP directives were not included in inter-particle and particle-wall repulsive (steric) interaction calculations due to difficulties that arose from inter-iteration dependencies between consecutive iterations of the do-loops. SequenceL simulations, on the other hand, relied on built-in automatic parallelism. Under these conditions, numerical simulations revealed that the parallelized FORTRAN90 outran the performance of SequenceL by a factor of 2.5 or more when the number of particles was 100 or less. SequenceL, however, outran the performance of the parallelized FORTRAN90 by a factor of 1.3 when the number of particles was 300. Our results show that when the number of particles increased by 30-fold, the computational time of SequenceL simulations increased linearly by a factor of 1.5, as compared to a 3.2-fold increase in serial and a 7.7-fold increase in parallelized FORTRAN90 simulations. Considering SequenceL's efficient built-in parallelism that led to a relatively small increase in computational time with increased number of particles, it could be a promising programming language for computationally-efficient mesoscale simulations of large numbers of particles in microfluidic experiments.

  3. Allowing for missing outcome data and incomplete uptake of randomised interventions, with application to an Internet-based alcohol trial.

    PubMed

    White, Ian R; Kalaitzaki, Eleftheria; Thompson, Simon G

    2011-11-30

    Missing outcome data and incomplete uptake of randomised interventions are common problems, which complicate the analysis and interpretation of randomised controlled trials, and are rarely addressed well in practice. To promote the implementation of recent methodological developments, we describe sequences of randomisation-based analyses that can be used to explore both issues. We illustrate these in an Internet-based trial evaluating the use of a new interactive website for those seeking help to reduce their alcohol consumption, in which the primary outcome was available for less than half of the participants and uptake of the intervention was limited. For missing outcome data, we first employ data on intermediate outcomes and intervention use to make a missing at random assumption more plausible, with analyses based on general estimating equations, mixed models and multiple imputation. We then use data on the ease of obtaining outcome data and sensitivity analyses to explore departures from the missing at random assumption. For incomplete uptake of randomised interventions, we estimate structural mean models by using instrumental variable methods. In the alcohol trial, there is no evidence of benefit unless rather extreme assumptions are made about the missing data nor an important benefit in more extensive users of the intervention. These findings considerably aid the interpretation of the trial's results. More generally, the analyses proposed are applicable to many trials with missing outcome data or incomplete intervention uptake. To facilitate use by others, Stata code is provided for all methods.

  4. Diversity of coding sequences and gene structures of the antifungal peptide mytimycin (MytM) from the Mediterranean mussel, Mytilus galloprovincialis.

    PubMed

    Sonthi, Molruedee; Toubiana, Mylène; Pallavicini, Alberto; Venier, Paola; Roch, Philippe

    2011-10-01

    Knowledge on antifungal biomolecules is limited compared to antibacterial peptides. A strictly antifungal peptide from the blue mussel, Mytilus edulis named mytimycin (MytM) was reported in 1996 as partial NH(2) 33 amino acid sequence. Using back-translations of the previous sequence, MytM-related nucleotide sequences were identified from a normalized Mytilus galloprovincialis expressed sequence tag library. Primers designed from a consensus sequence have been used to obtain a fragment of 560 nucleotides, including the complete coding sequence of 456 nucleotides. Precursor is constituted by a signal peptide of 23 amino acids, followed by MytM of 54 amino acids (6.2-6.3 kDa, 12 cysteines) and C-terminal extension of 75 amino acids. Only two major amino acid precursor sequences emerged, one shared by M. galloprovincialis from Venice and Vigo, the other belonging to M. galloprovincialis from Palavas, with nine amino acid differences between the two MytM. Predicted disulfide bonds suggested the presence of two constrained domains joined by amino acidic NIFG track. Intriguing was the presence of conserved canonical EF hand-motif located in the C-terminus extension of the precursor. The MytM gene was found interrupted by two introns. Intron 2 existed in two forms, a long (1,112 nucleotides) and a short (716 nucleotides) one resulting from the removal of the central part of the long one. Both the short (GenBank FJ804479) and the long (GenBank FJ804478) genes are simultaneously present in the mussel genome.

  5. Identification of an androgen-repressed mRNA in rat ventral prostate as coding for sulphated glycoprotein 2 by cDNA cloning and sequence analysis.

    PubMed Central

    Bettuzzi, S; Hiipakka, R A; Gilna, P; Liao, S T

    1989-01-01

    The concentrations of a small number of mRNAs in the rat ventral prostate increase after castration and then decrease upon androgen treatment. Since the repression of specific gene expression may be important in the regulation of organ growth, we have cloned a cDNA for an androgen-repressed mRNA, the concentration of which increased 17-fold 4 days after castration, and this increase was reversed rapidly by androgen treatment. By sequence analysis the androgen-repressed mRNA was identified as that coding for sulphated glycoprotein 2. Images Fig. 1. PMID:2920020

  6. DNA codes

    SciTech Connect

    Torney, D. C.

    2001-01-01

    We have begun to characterize a variety of codes, motivated by potential implementation as (quaternary) DNA n-sequences, with letters denoted A, C The first codes we studied are the most reminiscent of conventional group codes. For these codes, Hamming similarity was generalized so that the score for matched letters takes more than one value, depending upon which letters are matched [2]. These codes consist of n-sequences satisfying an upper bound on the similarities, summed over the letter positions, of distinct codewords. We chose similarity 2 for matches of letters A and T and 3 for matches of the letters C and G, providing a rough approximation to double-strand bond energies in DNA. An inherent novelty of DNA codes is 'reverse complementation'. The latter may be defined, as follows, not only for alphabets of size four, but, more generally, for any even-size alphabet. All that is required is a matching of the letters of the alphabet: a partition into pairs. Then, the reverse complement of a codeword is obtained by reversing the order of its letters and replacing each letter by its match. For DNA, the matching is AT/CG because these are the Watson-Crick bonding pairs. Reversal arises because two DNA sequences form a double strand with opposite relative orientations. Thus, as will be described in detail, because in vitro decoding involves the formation of double-stranded DNA from two codewords, it is reasonable to assume - for universal applicability - that the reverse complement of any codeword is also a codeword. In particular, self-reverse complementary codewords are expressly forbidden in reverse-complement codes. Thus, an appropriate distance between all pairs of codewords must, when large, effectively prohibit binding between the respective codewords: to form a double strand. Only reverse-complement pairs of codewords should be able to bind. For most applications, a DNA code is to be bi-partitioned, such that the reverse-complementary pairs are separated

  7. Nucleotide sequence of the GDH gene coding for the NADP-specific glutamate dehydrogenase of Saccharomyces cerevisiae.

    PubMed

    Nagasu, T; Hall, B D

    1985-01-01

    The isolation of the Saccharomyces cerevisiae gene for NADP-dependent glutamate dehydrogenase (NADP-GDH) by cross hybridization to the Neurospora crassa am gene, known to encode for NADP-GDH is described. Two DNA fragments selected from a yeast genomic library in phage lambda gt11 were shown by restriction analysis to share 2.5 kb of common sequence. A yeast shuttle vector (CV13) carrying either to the cloned fragments complements the gdh- strain of S. cerevisiae and directs substantial overproduction of NADP-GDH. One of the cloned fragments was sequenced, and the deduced amino acid (aa) sequence of the yeast NADP-GDH is 64% homologous to N. crassa, 51% to Escherichia coli and 24% to bovine NADP-GDHs.

  8. Performance Analysis of Direct-Sequence Code-Division Multiple-Access Communications with Asymmetric Quadrature Phase-Shift-Keying Modulation

    NASA Technical Reports Server (NTRS)

    Wang, C.-W.; Stark, W.

    2005-01-01

    This article considers a quaternary direct-sequence code-division multiple-access (DS-CDMA) communication system with asymmetric quadrature phase-shift-keying (AQPSK) modulation for unequal error protection (UEP) capability. Both time synchronous and asynchronous cases are investigated. An expression for the probability distribution of the multiple-access interference is derived. The exact bit-error performance and the approximate performance using a Gaussian approximation and random signature sequences are evaluated by extending the techniques used for uniform quadrature phase-shift-keying (QPSK) and binary phase-shift-keying (BPSK) DS-CDMA systems. Finally, a general system model with unequal user power and the near-far problem is considered and analyzed. The results show that, for a system with UEP capability, the less protected data bits are more sensitive to the near-far effect that occurs in a multiple-access environment than are the more protected bits.

  9. Indirect Reciprocity under Incomplete Observation

    PubMed Central

    Nakamura, Mitsuhiro; Masuda, Naoki

    2011-01-01

    Indirect reciprocity, in which individuals help others with a good reputation but not those with a bad reputation, is a mechanism for cooperation in social dilemma situations when individuals do not repeatedly interact with the same partners. In a relatively large society where indirect reciprocity is relevant, individuals may not know each other's reputation even indirectly. Previous studies investigated the situations where individuals playing the game have to determine the action possibly without knowing others' reputations. Nevertheless, the possibility that observers of the game, who generate the reputation of the interacting players, assign reputations without complete information about them has been neglected. Because an individual acts as an interacting player and as an observer on different occasions if indirect reciprocity is endogenously sustained in a society, the incompleteness of information may affect either role. We examine the game of indirect reciprocity when the reputations of players are not necessarily known to observers and to interacting players. We find that the trustful discriminator, which cooperates with good and unknown players and defects against bad players, realizes cooperative societies under seven social norms. Among the seven social norms, three of the four suspicious norms under which cooperation (defection) to unknown players leads to a good (bad) reputation enable cooperation down to a relatively small observation probability. In contrast, the three trustful norms under which both cooperation and defection to unknown players lead to a good reputation are relatively efficient. PMID:21829335

  10. Incompletely compacted equilibrated ordinary chondrites

    SciTech Connect

    Sasso, M.R.; Macke, R.J.; Boesenberg, J.S.; Britt, D.T.; Rovers, M.L.; Ebel, D.S.; Friedrich, J.M.

    2010-01-22

    We document the size distributions and locations of voids present within five highly porous equilibrated ordinary chondrites using high-resolution synchrotron X-ray microtomography ({mu}CT) and helium pycnometry. We found total porosities ranging from {approx}10 to 20% within these chondrites, and with {mu}CT we show that up to 64% of the void space is located within intergranular voids within the rock. Given the low (S1-S2) shock stages of the samples and the large voids between mineral grains, we conclude that these samples experienced unusually low amounts of compaction and shock loading throughout their entire post accretionary history. With Fe metal and FeS metal abundances and grain size distributions, we show that these chondrites formed naturally with greater than average porosities prior to parent body metamorphism. These materials were not 'fluffed' on their parent body by impact-related regolith gardening or events caused by seismic vibrations. Samples of all three chemical types of ordinary chondrites (LL, L, H) are represented in this study and we conclude that incomplete compaction is common within the asteroid belt.

  11. [Cloning of full-length coding sequence of tree shrew CD4 and prediction of its molecular characteristics].

    PubMed

    Tian, Wei-Wei; Gao, Yue-Dong; Guo, Yan; Huang, Jing-Fei; Xiao, Chang; Li, Zuo-Sheng; Zhang, Hua-Tang

    2012-02-01

    The tree shrews, as an ideal animal model receiving extensive attentions to human disease research, demands essential research tools, in particular cellular markers and monoclonal antibodies for immunological studies. In this paper, a 1 365 bp of the full-length CD4 cDNA encoding sequence was cloned from total RNA in peripheral blood of tree shrews, the sequence completes two unknown fragment gaps of tree shrews predicted CD4 cDNA in the GenBank database, and its molecular characteristics were analyzed compared with other mammals by using biology software such as Clustal W2.0 and so forth. The results showed that the extracellular and intracellular domains of tree shrews CD4 amino acid sequence are conserved. The tree shrews CD4 amino acid sequence showed a close genetic relationship with Homo sapiens and Macaca mulatta. Most regions of the tree shrews CD4 molecule surface showed positive charges as humans. However, compared with CD4 extracellular domain D1 of human, CD4 D1 surface of tree shrews showed more negative charges, and more two N-glycosylation sites, which may affect antibody binding. This study provides a theoretical basis for the preparation and functional studies of CD4 monoclonal antibody.

  12. Cloning, nucleotide sequence, and overexpression of the gene coding for delta 5-3-ketosteroid isomerase from Pseudomonas putida biotype B.

    PubMed

    Kim, S W; Kim, C Y; Benisek, W F; Choi, K Y

    1994-11-01

    The structural gene coding for the delta 5-3-ketosteroid isomerase (KSI) of Pseudomonas putida biotype B has been cloned, and its entire nucleotide sequence has been determined by a dideoxynucleotide chain termination method. A 2.1-kb DNA fragment containing the ksi gene was cloned from a P. putida biotype B genomic library in lambda gt11. The open reading frame of ksi encodes 393 nucleotides, and the amino acid sequence deduced from the nucleotide sequence agrees with the directly determined amino acid sequence (K. Linden and W. F. Benisek, J. Biol. Chem. 261:6454-6460, 1986). A putative purine-rich ribosome binding site was found 8 bp upstream of the ATG start codon. Escherichia coli BL21(DE3) transformed with the pKK-KSI plasmid containing the ksi gene expressed a high level of isomerase activity when induced by isopropyl-beta-D-thiogalactopyranoside. KSI was purified to homogeneity by a simple and rapid procedure utilizing fractional precipitation and an affinity column of deoxycholate-ethylenediamine-agarose as a major chromatographic step. The molecular weight of KSI was 14,535 (calculated, 14,536) as determined by electrospray mass spectrometry. The purified KSI showed a specific activity (39,807 mumol min-1 mg-1) and a Km (60 microM) which are close to those of KSI originally obtained from P. putida biotype B.

  13. Cloning, nucleotide sequence, and overexpression of the gene coding for delta 5-3-ketosteroid isomerase from Pseudomonas putida biotype B.

    PubMed Central

    Kim, S W; Kim, C Y; Benisek, W F; Choi, K Y

    1994-01-01

    The structural gene coding for the delta 5-3-ketosteroid isomerase (KSI) of Pseudomonas putida biotype B has been cloned, and its entire nucleotide sequence has been determined by a dideoxynucleotide chain termination method. A 2.1-kb DNA fragment containing the ksi gene was cloned from a P. putida biotype B genomic library in lambda gt11. The open reading frame of ksi encodes 393 nucleotides, and the amino acid sequence deduced from the nucleotide sequence agrees with the directly determined amino acid sequence (K. Linden and W. F. Benisek, J. Biol. Chem. 261:6454-6460, 1986). A putative purine-rich ribosome binding site was found 8 bp upstream of the ATG start codon. Escherichia coli BL21(DE3) transformed with the pKK-KSI plasmid containing the ksi gene expressed a high level of isomerase activity when induced by isopropyl-beta-D-thiogalactopyranoside. KSI was purified to homogeneity by a simple and rapid procedure utilizing fractional precipitation and an affinity column of deoxycholate-ethylenediamine-agarose as a major chromatographic step. The molecular weight of KSI was 14,535 (calculated, 14,536) as determined by electrospray mass spectrometry. The purified KSI showed a specific activity (39,807 mumol min-1 mg-1) and a Km (60 microM) which are close to those of KSI originally obtained from P. putida biotype B. Images PMID:7961420

  14. Isolation and sequencing of cDNA clones coding for the catalytic unit of glucose-6-phosphatase from two haplochromine cichlid fishes.

    PubMed

    Nagl, S; Mayer, W E; Klein, J

    1999-01-01

    Complementary DNA clones coding for the catalytic unit of the enzyme glucose-6-phosphatase (G6Pase) were obtained from Haplochromis nubilus and Haplochromis xenognathus, two cichlid fish species from Lake Victoria. The translated sequence of these two cDNAs identifies a polypeptide consisting of 352 amino acid residues and showing a 54.4% similarity to the human form of G6Pase. The amino acid sequences of the two fish species are identical. The comparison of the fish amino acid sequence with the corresponding sequences of rat, mouse, and human G6Pase revealed that the amino acid residues, which are involved in G6Pase catalysis in humans, are also conserved in fish G6Pase. Northern blot analysis showed that G6Pase is expressed at the same level in 6- and 10-day-old fish. A three base pair insertion/deletion polymorphism was found in the 3'-untranslated region of the fish G6Pase gene. The polymorphism will be a useful marker in a phylogenetic study of Lake Victoria cichlids.

  15. Cloning, nucleotide sequence, mutagenesis, and mapping of the Bacillus subtilis pbpD gene, which codes for penicillin-binding protein 4.

    PubMed Central

    Popham, D L; Setlow, P

    1994-01-01

    The gene encoding penicillin-binding protein 4 (PBP 4) of Bacillus subtilis, pbpD, was cloned by two independent methods. PBP 4 was purified, and the amino acid sequence of a cyanogen bromide digestion product was used to design an oligonucleotide probe for identification of the gene. An oligonucleotide probe designed to hybridize to genes encoding class A high-molecular-weight PBPs also identified this gene. DNA sequence analysis of the cloned DNA revealed that (i) the amino acid sequence of PBP 4 was similar to those of other class A high-molecular-weight PBPs and (ii) pbpD appeared to be cotranscribed with a downstream gene (termed orf2) of unknown function. The orf2 gene is followed by an apparent non-protein-coding region which exhibits nucleotide sequence similarity with at least two other regions of the chromosome and which has a high potential for secondary structure formation. Mutations in pbpD resulted in the disappearance of PBP 4 but had no obvious effect on growth, cell division, sporulation, spore heat resistance, or spore germination. Expression of a transcriptional fusion of pbpD to lacZ increased throughout growth, decreased during sporulation, and was induced approximately 45 min into spore germination. A single transcription start site was detected just upstream of pbpD. The pbpD locus was mapped to the 275 to 280 degrees region of the chromosomal genetic map. Images PMID:7961491

  16. Automation of a primer design and evaluation pipeline for subsequent sequencing of the coding regions of all human Refseq genes

    PubMed Central

    Lai, Daniel; Love, Donald R

    2012-01-01

    Screening for mutations in human disease-causing genes in a molecular diagnostic environment demands simplicity with a view to allowing high throughput approaches. In order to advance these requirements, we have developed and applied a primer design program, termed BatchPD, to achieve the PCR amplification of coding exons of all known human Refseq genes. Primer design, in silico PCR checks and formatted primer information for subsequent web-based interrogation are queried from existing online tools. BatchPD acts as an intermediate to automate queries and results processing and provides exon-specific information that is summarised in a spreadsheet format. PMID:22570517

  17. Development of a universal RT-PCR for amplifying and sequencing the leader and capsid-coding region of foot-and-mouth disease virus.

    PubMed

    Xu, Lizhe; Hurtle, William; Rowland, Jessica M; Casteran, Karissa A; Bucko, Stacey M; Grau, Fred R; Valdazo-González, Begoña; Knowles, Nick J; King, Donald P; Beckham, Tammy R; McIntosh, Michael T

    2013-04-01

    Foot-and-mouth disease (FMD) is a highly infectious viral disease of cloven-hoofed animals with debilitating and devastating consequences for livestock industries throughout the world. Key antigenic determinants of the causative agent, FMD virus (FMDV), reside within the surface-exposed proteins of the viral capsid. Therefore, characterization of the sequence that encodes the capsid (P1) is important for tracking the emergence or spread of FMD and for selection and development of new vaccines. Reliable methods to generate sequence for this region are challenging due to the high inter-serotypic variability between different strains of FMDV. This study describes the development and optimization of a novel, robust and universal RT-PCR method that may be used to amplify and sequence a 3kilobase (kb) fragment encompassing the leader proteinase (L) and capsid-coding portions (P1) of the FMDV genome. This new RT-PCR method was evaluated in two laboratories using RNA extracted from 134 clinical samples collected from different countries and representing a range of topotypes and lineages within each of the seven FMDV serotypes. Sequence analysis assisted in the reiterative design of primers that are suitable for routine sequencing of these RT-PCR fragments. Using this method, sequence analysis was undertaken for 49 FMD viruses collected from outbreaks in the field. This approach provides a robust tool that can be used for rapid antigenic characterization of FMDV and phylogenetic analyses and has utility for inclusion in laboratory response programs as an aid to vaccine matching or selection in the event of FMD outbreaks.

  18. Variation in seed fatty acid composition and sequence divergence in the FAD2 gene coding region between wild and cultivated sesame.

    PubMed

    Chen, Zhenbang; Tonnis, Brandon; Morris, Brad; Wang, Richard B; Zhang, Amy L; Pinnow, David; Wang, Ming Li

    2014-12-03

    Sesame germplasm harbors genetic diversity which can be useful for sesame improvement in breeding programs. Seven accessions with different levels of oleic acid were selected from the entire USDA sesame germplasm collection (1232 accessions) and planted for morphological observation and re-examination of fatty acid composition. The coding region of the FAD2 gene for fatty acid desaturase (FAD) in these accessions was also sequenced. Cultivated sesame accessions flowered and matured earlier than the wild species. The cultivated sesame seeds contained a significantly higher percentage of oleic acid (40.4%) than the seeds of the wild species (26.1%). Nucleotide polymorphisms were identified in the FAD2 gene coding region between wild and cultivated species. Some nucleotide polymorphisms led to amino acid changes, one of which was located in the enzyme active site and may contribute to the altered fatty acid composition. Based on the morphology observation, chemical analysis, and sequence analysis, it was determined that two accessions were misnamed and need to be reclassified. The results obtained from this study are useful for sesame improvement in molecular breeding programs.

  19. Tumor suppressor miR-375 regulates MYC expression via repression of CIP2A coding sequence through multiple miRNA-mRNA interactions.

    PubMed

    Jung, Hyun Min; Patel, Rushi S; Phillips, Brittany L; Wang, Hai; Cohen, Donald M; Reinhold, William C; Chang, Lung-Ji; Yang, Li-Jun; Chan, Edward K L

    2013-06-01

    MicroRNAs (miRNAs) are small, noncoding RNAs involved in posttranscriptional regulation of protein-coding genes in various biological processes. In our preliminary miRNA microarray analysis, miR-375 was identified as the most underexpressed in human oral tumor versus controls. The purpose of the present study is to examine the function of miR-375 as a candidate tumor suppressor miRNA in oral cancer. Cancerous inhibitor of PP2A (CIP2A), a guardian of oncoprotein MYC, is identified as a candidate miR-375 target based on bioinformatics. Luciferase assay accompanied by target sequence mutagenesis elucidates five functional miR-375-binding sites clustered in the CIP2A coding sequence close to the C-terminal domain. Overexpression of CIP2A is clearly demonstrated in oral cancers, and inverse correlation between miR-375 and CIP2A is observed in the tumors, as well as in NCI-60 cell lines, indicating the potential generalized involvement of the miR-375-CIP2A relationship in many other cancers. Transient transfection of miR-375 in oral cancer cells reduces the expression of CIP2A, resulting in decrease of MYC protein levels and leading to reduced proliferation, colony formation, migration, and invasion. Therefore this study shows that underexpression of tumor suppressor miR-375 could lead to uncontrolled CIP2A expression and extended stability of MYC, which contributes to promoting cancerous phenotypes.

  20. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis.

    PubMed

    Lee, Hwan Young; Yoo, Ji-Eun; Park, Myung Jin; Chung, Ukhee; Kim, Chong-Youl; Shin, Kyoung-Jin

    2006-11-01

    The present study analyzed 21 coding region SNP markers and one deletion motif for the determination of East Asian mitochondrial DNA (mtDNA) haplogroups by designing three multiplex systems which apply single base extension methods. Using two multiplex systems, all 593 Korean mtDNAs were allocated into 15 haplogroups: M, D, D4, D5, G, M7, M8, M9, M10, M11, R, R9, B, A, and N9. As the D4 haplotypes occurred most frequently in Koreans, the third multiplex system was used to further define D4 subhaplogroups: D4a, D4b, D4e, D4g, D4h, and D4j. This method allowed the complementation of coding region information with control region mutation motifs and the resultant findings also suggest reliable control region mutation motifs for the assignment of East Asian mtDNA haplogroups. These three multiplex systems produce good results in degraded samples as they contain small PCR products (101-154 bp) for single base extension reactions. SNP scoring was performed in 101 old skeletal remains using these three systems to prove their utility in degraded samples. The sequence analysis of mtDNA control region with high incidence of haplogroup-specific mutations and the selective scoring of highly informative coding region SNPs using the three multiplex systems are useful tools for most applications involving East Asian mtDNA haplogroup determination and haplogroup-directed stringent quality control.

  1. Simulation of Loss of RHRS Sequences in the Almaraz NPP during Mid-loop Operation using TRACE Code

    SciTech Connect

    Queral, Cesar; Gonzalez, Isaac; Exposito, Antonio

    2006-07-01

    In the framework of different international and national projects sponsored by the Spanish nuclear regulatory body, Consejo de Seguridad Nuclear, and the energy industry of Spain, UNESA, one of the most important objectives is the maintenance and developing of Spanish NPP models for different codes, such as RELAP5 and TRACE. In this context, and due to the risk importance of the loss of RHRS at mid-loop conditions, our group has developed a mid-loop model of Almaraz NPP with the TRACE code. During this kind of transients the reflux condensation is one of the cooling mechanisms anticipated in the abnormal operational procedure of loss of RHRS at mid-loop level. In this sense, several simulations of loss of the RHRS are being performed attending to different plant states, such as primary closed or open (different path vents were considered), availability of steam generators, power levels, primary inventory and different secondary conditions. These parametric analyses allow us to check the capability of this cooling mechanism at different plant configurations and to apply them to the success criteria of the reflux condensation mechanism. (authors)

  2. Second-generation sequencing of entire mitochondrial coding-regions (∼15.4 kb) holds promise for study of the phylogeny and taxonomy of human body lice and head lice.

    PubMed

    Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C

    2014-08-01

    The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407.

  3. Analysis of exome sequencing data sets reveals structural variation in the coding region of ABO in individuals of African ancestry.

    PubMed

    Fox, Keolu; Johnsen, Jill M; Coe, Bradley P; Frazar, Chris D; Reiner, Alexander P; Eichler, Evan E; Nickerson, Deborah A

    2016-11-01

    ABO is a blood group system of high clinical significance due to the prevalence of ABO variation that can cause major, potentially life-threatening, transfusion reactions. Using multiple large-scale next-generation sequence data sets, we demonstrate the application of read-depth approaches to discover previously unsuspected structural variation (SV) in the ABO gene in individuals of African ancestry. Our analysis of SV in the ABO gene across 6432 exomes reveals a partial deletion in the ABO gene in 32 individuals of African ancestry that predicts a novel O allele. Our study demonstrates the power that analyses of large-scale sequencing data, particularly data sets containing underrepresented populations, can provide in identifying novel SVs. © 2016 AABB.

  4. Cloning, sequencing, and expression of the gene coding for the human platelet. cap alpha. /sub 2/-adrenergic receptor

    SciTech Connect

    Kobilka, B.K.; Matsui, H.; Kobilka, T.S.; Yang-Feng, T.L.; Francke, U.; Caron, M.G.; Lefkowitz, R.J.; Regan, J.W.

    1987-10-30

    The gene for the human platelet ..cap alpha../sub 2/-adrenergic receptor has been cloned with oligonucleotides corresponding to the partial amino acid sequence of the purified receptor. The identity of this gene has been confirmed by the binding of ..cap alpha../sub 2/-adrenergic ligands to the cloned receptor expressed in Xenopus laevis oocytes. The deduced amino acid sequence is most similar to the recently cloned human ..beta../sub 2/- and ..beta../sub 1/-adrenergic receptors; however, similarities to the muscarinic cholinergic receptors are also evident. Two related genes have been identified by low stringency Southern blot analysis. These genes may represent additional ..cap alpha../sub 2/-adrenergic receptor subtypes.

  5. A Single Point Mutation within the Coding Sequence of Cholera Toxin B Subunit Will Increase Its Expression Yield

    PubMed Central

    Bakhshi, Bita; Boustanshenas, Mina; Ghorbani, Masoud

    2014-01-01

    Background: Cholera toxin B subunit (CTB) has been extensively considered as an immunogenic and adjuvant protein, but its yield of expression is not satisfactory in many studies. The aim of this study was to compare the expression of native and mutant recombinant CTB (rCTB) in pQE vector. Methods: ctxB fragment from Vibrio cholerae O1 ATCC14035 containing the substitution of mutant ctxB for amino acid S128T was amplified by PCR and cloned in pGETM-T easy vector. It was then transformed to E. coli Top 10F' and cultured on LB agar plate containing ampicillin. Sequence analysis confirmed the mature ctxB gene sequence and the mutant one in both constructs which were further subcloned to pQE-30 vector. Both constructs were subsequently transformed to E. coli M15 (pREP4) for expression of mature and mutant rCTB. Results: SDS-PAGE analysis showed the maximum expression of rCTB in both systems at 5 hours after induction and Western-blot analysis confirmed the presence of rCTB in blotting membranes. The expression of mutant rCTB was much higher than mature rCTB, which may be the result of serine-to-threonine substitution at position 128 of mature rCTB amino acid sequence created by PCR mutagenesis. The mutant rCTB retained pentameric stability and its ability to bind to anti- cholera toxin IgG antibodies. Conclusion: Point mutation in ctxB sequence resulted in over-expression of rCTB, probably due to the increase of solubility of produced rCTB. Consequently, this expression system can be used to produce rCTB in high yield. PMID:24842138

  6. Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs.

    PubMed

    Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H; Gregory, Brian D; Wang, Li-San

    2014-05-01

    Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (<50nt) present in the cell, including fragments derived from snoRNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/.

  7. [Analysis of the molecular characteristics and cloning of full-length coding sequence of interleukin-2 in tree shrews].

    PubMed

    Huang, Xiao-Yan; Li, Ming-Li; Xu, Juan; Gao, Yue-Dong; Wang, Wen-Guang; Yin, An-Guo; Li, Xiao-Fei; Sun, Xiao-Mei; Xia, Xue-Shan; Dai, Jie-Jie

    2013-04-01

    While the tree shrew (Tupaia belangeri chinensis) is an excellent animal model for studying the mechanisms of human diseases, but few studies examine interleukin-2 (IL-2), an important immune factor in disease model evaluation. In this study, a 465 bp of the full-length IL-2 cDNA encoding sequence was cloned from the RNA of tree shrew spleen lymphocytes, which were then cultivated and stimulated with ConA (concanavalin). Clustal W 2.0 was used to compare and analyze the sequence and molecular characteristics, and establish the similarity of the overall structure of IL-2 between tree shrews and other mammals. The homology of the IL-2 nucleotide sequence between tree shrews and humans was 93%, and the amino acid homology was 80%. The phylogenetic tree results, derived through the Neighbour-Joining method using MEGA5.0, indicated a close genetic relationship between tree shrews, Homo sapiens, and Macaca mulatta. The three-dimensional structure analysis showed that the surface charges in most regions of tree shrew IL-2 were similar to between tree shrews and humans; however, the N-glycosylation sites and local structures were different, which may affect antibody binding. These results provide a fundamental basis for the future study of IL-2 monoclonal antibody in tree shrews, thereby improving their utility as a model.

  8. Humans and chimpanzees differ in their cellular response to DNA damage and non-coding sequence elements of DNA repair-associated genes.

    PubMed

    Weis, E; Galetzka, D; Herlyn, H; Schneider, E; Haaf, T

    2008-01-01

    Compared to humans, chimpanzees appear to be less susceptible to many types of cancer. Because DNA repair defects lead to accumulation of gene and chromosomal mutations, species differences in DNA repair are one plausible explanation. Here we analyzed the repair kinetics of human and chimpanzee cells after cisplatin treatment and irradiation. Dot blots for the quantification of single-stranded (ss) DNA repair intermediates revealed a biphasic response of human and chimpanzee lymphoblasts to cisplatin-induced damage. The early phase of DNA repair was identical in both species with a peak of ssDNA intermediates at 1 h after DNA damage induction. However, the late phase differed between species. Human cells showed a second peak of ssDNA intermediates at 6 h, chimpanzee cells at 5 h. One of four analyzed DNA repair-associated genes, UBE2A, was differentially expressed in human and chimpanzee cells at 5 h after cisplatin treatment. Immunofluorescent staining of gammaH2AX foci demonstrated equally high numbers of DNA strand breaks in human and chimpanzee cells at 30 min after irradiation and equally low numbers at 2 h. However, at 1 h chimpanzee cells had significantly less DNA breaks than human cells. Comparative sequence analyses of approximately 100 DNA repair-associated genes in human and chimpanzee revealed 13% and 32% genes, respectively, with evidence for an accelerated evolution in promoter regions and introns. This is strikingly contrasting to the 3% of DNA repair-associated genes with positive selection in the coding sequence. Compared to the rhesus macaque as an outgroup, chimpanzees have a higher accelerated evolution in non-coding sequences than humans. The TRF1-interacting, ankyrin-related ADP-ribose polymerase (TNKS) gene showed an accelerated intraspecific evolution among humans. Our results are consistent with the view that chimpanzee cells repair different types of DNA damage faster than human cells, whereas the overall repair capacity is similar in

  9. Analyzing incomplete longitudinal clinical trial data.

    PubMed

    Molenberghs, Geert; Thijs, Herbert; Jansen, Ivy; Beunckens, Caroline; Kenward, Michael G; Mallinckrodt, Craig; Carroll, Raymond J

    2004-07-01

    Using standard missing data taxonomy, due to Rubin and co-workers, and simple algebraic derivations, it is argued that some simple but commonly used methods to handle incomplete longitudinal clinical trial data, such as complete case analyses and methods based on last observation carried forward, require restrictive assumptions and stand on a weaker theoretical foundation than likelihood-based methods developed under the missing at random (MAR) framework. Given the availability of flexible software for analyzing longitudinal sequences of unequal length, implementation of likelihood-based MAR analyses is not limited by computational considerations. While such analyses are valid under the comparatively weak assumption of MAR, the possibility of data missing not at random (MNAR) is difficult to rule out. It is argued, however, that MNAR analyses are, themselves, surrounded with problems and therefore, rather than ignoring MNAR analyses altogether or blindly shifting to them, their optimal place is within sensitivity analysis. The concepts developed here are illustrated using data from three clinical trials, where it is shown that the analysis method may have an impact on the conclusions of the study.

  10. Influence of incomplete fusion on complete fusion at energies above the Coulomb barrier

    NASA Astrophysics Data System (ADS)

    Shuaib, Mohd; Sharma, Vijay R.; Yadav, Abhishek; Sharma, Manoj Kumar; Singh, Pushpendra P.; Singh, Devendra P.; Kumar, R.; Singh, R. P.; Muralithar, S.; Singh, B. P.; Prasad, R.

    2017-10-01

    In the present work, excitation functions of several reaction residues in the system 19F+169Tm, populated via the complete and incomplete fusion processes, have been measured using off-line γ-ray spectroscopy. The analysis of excitation functions has been done within the framework of statistical model code pace4. The excitation functions of residues populated via xn and pxn channels are found to be in good agreement with those estimated by the theoretical model code, which confirms the production of these residues solely via complete fusion process. However, a significant enhancement has been observed in the cross-sections of residues involving α-emitting channels as compared to the theoretical predictions. The observed enhancement in the cross-sections has been attributed to the incomplete fusion processes. In order to have a better insight into the onset and strength of incomplete fusion, the incomplete fusion strength function has been deduced. At present, there is no theoretical model available which can satisfactorily explain the incomplete fusion reaction data at energies ≈4–6 MeV/nucleon. In the present work, the influence of incomplete fusion on complete fusion in the 19F+169Tm system has also been studied. The measured cross-section data may be important for the development of reactor technology as well. It has been found that the incomplete fusion strength function strongly depends on the α-Q value of the projectile, which is found to be in good agreement with the existing literature data. The analysis strongly supports the projectile-dependent mass-asymmetry systematics. In order to study the influence of Coulomb effect ({Z}{{P}}{Z}{{T}}) on incomplete fusion, the deduced strength function for the present work is compared with the nearby projectile–target combinations. The incomplete fusion strength function is found to increase linearly with {Z}{{P}}{Z}{{T}}, indicating a strong influence of Coulomb effect in the incomplete fusion reactions.

  11. Increasing the Yield in Targeted Next-Generation Sequencing by Implicating CNV Analysis, Non-Coding Exons and the Overall Variant Load: The Example of Retinal Dystrophies

    PubMed Central

    Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O.; Decker, Christian; Preising, Markus N.; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Issa, Peter Charbel; Holz, Frank G.; Baig, Shahid M.; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y.; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S.; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J.

    2013-01-01

    Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover “hidden mutations” such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5′ exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5′-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even

  12. Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies.

    PubMed

    Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O; Decker, Christian; Preising, Markus N; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Charbel Issa, Peter; Holz, Frank G; Baig, Shahid M; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J

    2013-01-01

    Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover "hidden mutations" such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5' exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5'-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even

  13. Analysis of coding variants identified from exome sequencing resources for association with diabetic and non-diabetic nephropathy in African Americans.

    PubMed

    Cooke Bailey, Jessica N; Palmer, Nicholette D; Ng, Maggie C Y; Bonomo, Jason A; Hicks, Pamela J; Hester, Jessica M; Langefeld, Carl D; Freedman, Barry I; Bowden, Donald W

    2014-06-01

    Prior studies have identified common genetic variants influencing diabetic and non-diabetic nephropathy, diseases which disproportionately affect African Americans. Recently, exome sequencing techniques have facilitated identification of coding variants on a genome-wide basis in large samples. Exonic variants in known or suspected end-stage kidney disease (ESKD) or nephropathy genes can be tested for their ability to identify association either singly or in combination with known associated common variants. Coding variants in genes with prior evidence for association with ESKD or nephropathy were identified in the NHLBI-ESP GO database and genotyped in 5,045 African Americans (3,324 cases with type 2 diabetes associated nephropathy [T2D-ESKD] or non-T2D ESKD, and 1,721 controls) and 1,465 European Americans (568 T2D-ESKD cases and 897 controls). Logistic regression analyses were performed to assess association, with admixture and APOL1 risk status incorporated as covariates. Ten of 31 SNPs were associated in African Americans; four replicated in European Americans. In African Americans, SNPs in OR2L8, OR2AK2, C6orf167 (MMS22L), LIMK2, APOL3, APOL2, and APOL1 were nominally associated (P = 1.8 × 10(-4)-0.044). Haplotype analysis of common and coding variants increased evidence of association at the OR2L13 and APOL1 loci (P = 6.2 × 10(-5) and 4.6 × 10(-5), respectively). SNPs replicating in European Americans were in OR2AK2, LIMK2, and APOL2 (P = 0.0010-0.037). Meta-analyses highlighted four SNPs associated in T2D-ESKD and all-cause ESKD. Results from this study suggest a role for coding variants in the development of diabetic, non-diabetic, and/or all-cause ESKD in African Americans and/or European Americans.

  14. Filling the gaps - the generation of full genomic sequences for 15 common and well-documented HLA class I alleles using next-generation sequencing technology.

    PubMed

    Lind, C; Ferriola, D; Mackiewicz, K; Papazoglou, A; Sasson, A; Monos, D

    2013-03-01

    Many common and well-documented (CWD) HLA alleles have only been partially characterized. The DNA sequence of these incomplete alleles, as published in the IMGT/HLA database, is most often limited to exons that code for the extracellular domains of the mature protein. Here we describe the application of next-generation sequencing technology to obtain full length genomic sequence from a single long-range PCR amplicon for 15 common and well-documented HLA Class I alleles. This technology is well suited to fill in the gaps of the current HLA allele sequence database which is largely incomplete. A more comprehensive catalog of HLA allele sequences would be beneficial in the evaluation of mismatches in transplantation, studies of population genetics, the evolution of HLAs, regulatory mechanisms and HLA expression, and issues related to the genomic organization of the MHC. Copyright © 2012. Published by Elsevier Inc.

  15. RNA sequencing and functional analysis implicate the regulatory role of long non-coding RNAs in tomato fruit ripening.

    PubMed

    Zhu, Benzhong; Yang, Yongfang; Li, Ran; Fu, Daqi; Wen, Liwei; Luo, Yunbo; Zhu, Hongliang

    2015-08-01

    Recently, long non-coding RNAs (lncRNAs) have been shown to play critical regulatory roles in model plants, such as Arabidopsis, rice, and maize. However, the presence of lncRNAs and how they function in fleshy fruit ripening are still largely unknown because fleshy fruit ripening is not present in the above model plants. Tomato is the model system for fruit ripening studies due to its dramatic ripening process. To investigate further the role of lncRNAs in fruit ripening, it is necessary and urgent to discover and identify novel lncRNAs and understand the function of lncRNAs in tomato fruit ripening. Here it is reported that 3679 lncRNAs were discovered from wild-type tomato and ripening mutant fruit. The lncRNAs are transcribed from all tomato chromosomes, 85.1% of which came from intergenic regions. Tomato lncRNAs are shorter and have fewer exons than protein-coding genes, a situation reminiscent of lncRNAs from other model plants. It was also observed that 490 lncRNAs were significantly up-regulated in ripening mutant fruits, and 187 lncRNAs were down-regulated, indicating that lncRNAs could be involved in the regulation of fruit ripening. In line with this, silencing of two novel tomato intergenic lncRNAs, lncRNA1459 and lncRNA1840, resulted in an obvious delay of ripening of wild-type fruit. Overall, the results indicated that lncRNAs might be essential regulators of tomato fruit ripening, which sheds new light on the regulation of fruit ripening. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  16. RNA sequencing and functional analysis implicate the regulatory role of long non-coding RNAs in tomato fruit ripening

    PubMed Central

    Zhu, Benzhong; Yang, Yongfang; Li, Ran; Fu, Daqi; Wen, Liwei; Luo, Yunbo; Zhu, Hongliang

    2015-01-01

    Recently, long non-coding RNAs (lncRNAs) have been shown to play critical regulatory roles in model plants, such as Arabidopsis, rice, and maize. However, the presence of lncRNAs and how they function in fleshy fruit ripening are still largely unknown because fleshy fruit ripening is not present in the above model plants. Tomato is the model system for fruit ripening studies due to its dramatic ripening process. To investigate further the role of lncRNAs in fruit ripening, it is necessary and urgent to discover and identify novel lncRNAs and understand the function of lncRNAs in tomato fruit ripening. Here it is reported that 3679 lncRNAs were discovered from wild-type tomato and ripening mutant fruit. The lncRNAs are transcribed from all tomato chromosomes, 85.1% of which came from intergenic regions. Tomato lncRNAs are shorter and have fewer exons than protein-coding genes, a situation reminiscent of lncRNAs from other model plants. It was also observed that 490 lncRNAs were significantly up-regulated in ripening mutant fruits, and 187 lncRNAs were down-regulated, indicating that lncRNAs could be involved in the regulation of fruit ripening. In line with this, silencing of two novel tomato intergenic lncRNAs, lncRNA1459 and lncRNA1840, resulted in an obvious delay of ripening of wild-type fruit. Overall, the results indicated that lncRNAs might be essential regulators of tomato fruit ripening, which sheds new light on the regulation of fruit ripening. PMID:25948705

  17. Single-tube, non-isotopic, multiplex PCR/OLA assay and sequence-coded separation for simultaneous screening of 31 cystic fibrosis mutations

    SciTech Connect

    Brinson, E.C.; Adriano, T.; Bloch, W.

    1994-09-01

    We have developed a rapid, single-tube, non-isotopic assay that screens a patient sample for the presence of 31 cystic fibrosis (CF) mutations. This assay can identify these mutations in a single reaction tube and a single electrophoresis run. Sample preparation is a simple, boil-and-go procedure, completed in less than an hour. The assay is composed of a 15-plex PCR, followed by a 61-plex oligonucleotide ligation assay (OLA), and incorporates a novel detection scheme, Sequence Coded Separation. Initially, the multiplex PCR amplifies 15 relevant segments of the CFTR gene, simultaneously. These PCR amplicons serve as templates for the multiplex OLA, which detects the normal or mutant allele at all loci, simultaneously. Each polymorphic site is interrogated by three oligonucleotide probes, a common probe and two allele-specific probes. Each common probe is tagged with a fluorescent dye, and the competing normal and mutant allelic probes incorporate different, non-nucleotide, mobility modifiers. These modifiers are composed of hexaethylene oxide (HEO) units, incorporated as HEO phosphoramidite monomers during automated DNA synthesis. The OLA is based on both probe hybridization and the ability of DNA ligase to discriminate single base mismatches at the junction between paired probes. Each single tube assay is electrophoresed in a single gel lane of a 4-color fluorescent DNA sequencer (Applied Biosystems, Model 373A). Each of the ligation products is identified by its unique combination of electrophoretic mobility and one of three colors. The fourth color is reserved for the in-lane size standard, used by GENESCAN{sup TM} software (Applied Biosystems) to size the OLA electrophoresis products. The Genotyper{sub TM} software (Applied Biosystems) decodes these Sequence-Coded-Separation data to create a patient summary report for all loci tested.

  18. Triple trans-splicing adeno-associated virus vectors capable of transferring the coding sequence for full-length dystrophin protein into dystrophic mice.

    PubMed

    Koo, Taeyoung; Popplewell, Linda; Athanasopoulos, Takis; Dickson, George

    2014-02-01

    Recombinant adeno-associated virus (rAAV) vectors have been shown to permit very efficient widespread transgene expression in skeletal muscle after systemic delivery, making these increasingly attractive as vectors for Duchenne muscular dystrophy (DMD) gene therapy. DMD is a severe muscle-wasting disorder caused by DMD gene mutations leading to complete loss of dystrophin protein. One of the major issues associated with delivery of the DMD gene, as a therapeutic approach for DMD, is its large open reading frame (ORF; 11.1 kb). A series of truncated microdystrophin cDNAs (delivered via a single AAV) and minidystrophin cDNAs (delivered via dual-AAV trans-spliced/overlapping reconstitution) have thus been extensively tested in DMD animal models. However, critical rod and hinge domains of dystrophin required for interaction with components of the dystrophin-associated protein complex, such as neuronal nitric oxide synthase, syntrophin, and dystrobrevin, are missing; these dystrophin domains may still need to be incorporated to increase dystrophin functionality and stabilize membrane rigidity. Full-length DMD gene delivery using AAV vectors remains elusive because of the limited single-AAV packaging capacity (4.7 kb). Here we developed a novel method for the delivery of the full-length DMD coding sequence to skeletal muscles in dystrophic mdx mice using a triple-AAV trans-splicing vector system. We report for the first time that three independent AAV vectors carrying "in tandem" sequential exonic parts of the human DMD coding sequence enable the expression of the full-length protein as a result of trans-splicing events cojoining three vectors via their inverted terminal repeat sequences. This method of triple-AAV-mediated trans-splicing could be applicable to the delivery of any large therapeutic gene (≥11 kb ORF) into postmitotic tissues (muscles or neurons) for the treatment of various inherited metabolic and genetic diseases.

  19. 32 CFR 651.44 - Incomplete information.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 4 2013-07-01 2013-07-01 false Incomplete information. 651.44 Section 651.44 National Defense Department of Defense (Continued) DEPARTMENT OF THE ARMY (CONTINUED) ENVIRONMENTAL QUALITY ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) Environmental Impact Statement § 651.44 Incomplete information...

  20. 32 CFR 651.44 - Incomplete information.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 4 2014-07-01 2013-07-01 true Incomplete information. 651.44 Section 651.44 National Defense Department of Defense (Continued) DEPARTMENT OF THE ARMY (CONTINUED) ENVIRONMENTAL QUALITY ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) Environmental Impact Statement § 651.44 Incomplete information...

  1. 32 CFR 651.44 - Incomplete information.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 4 2012-07-01 2011-07-01 true Incomplete information. 651.44 Section 651.44 National Defense Department of Defense (Continued) DEPARTMENT OF THE ARMY (CONTINUED) ENVIRONMENTAL QUALITY ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) Environmental Impact Statement § 651.44 Incomplete information...

  2. 32 CFR 651.44 - Incomplete information.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 4 2010-07-01 2010-07-01 true Incomplete information. 651.44 Section 651.44 National Defense Department of Defense (Continued) DEPARTMENT OF THE ARMY (CONTINUED) ENVIRONMENTAL QUALITY ENVIRONMENTAL ANALYSIS OF ARMY ACTIONS (AR 200-2) Environmental Impact Statement § 651.44 Incomplete information...

  3. Deep sequencing for de novo construction of a marine fish (Sparus aurata) transcriptome database with a large coverage of protein-coding transcripts.

    PubMed

    Calduch-Giner, Josep A; Bermejo-Nogales, Azucena; Benedito-Palos, Laura; Estensoro, Itziar; Ballester-Lozano, Gabriel; Sitjà-Bobadilla, Ariadna; Pérez-Sánchez, Jaume

    2013-03-15

    The gilthead sea bream (Sparus aurata) is the main fish species cultured in the Mediterranean area and constitutes an interesting model of research. Nevertheless, transcriptomic and genomic data are still scarce for this highly valuable species. A transcriptome database was constructed by de novo assembly of gilthead sea bream sequences derived from public repositories of mRNA and collections of expressed sequence tags together with new high-quality reads from five cDNA 454 normalized libraries of skeletal muscle (1), intestine (1), head kidney (2) and blood (1). Sequencing of the new 454 normalized libraries produced 2,945,914 high-quality reads and the de novo global assembly yielded 125,263 unique sequences with an average length of 727 nt. Blast analysis directed to protein and nucleotide databases annotated 63,880 sequences encoding for 21,384 gene descriptions, that were curated for redundancies and frameshifting at the homopolymer regions of open reading frames, and hosted at http://www.nutrigroup-iats.org/seabreamdb. Among the annotated gene descriptions, 16,177 were mapped in the Ingenuity Pathway Analysis (IPA) database, and 10,899 were eligible for functional analysis with a representation in 341 out of 372 IPA canonical pathways. The high representation of randomly selected stickleback transcripts by Blast search in the nucleotide gilthead sea bream database evidenced its high coverage of protein-coding transcripts. The newly assembled gilthead sea bream transcriptome represents a progress in genomic resources for this species, as it probably contains more than 75% of actively transcribed genes, constituting a valuable tool to assist studies on functional genomics and future genome projects.

  4. Deep sequencing for de novo construction of a marine fish (Sparus aurata) transcriptome database with a large coverage of protein-coding transcripts

    PubMed Central

    2013-01-01

    Background The gilthead sea bream (Sparus aurata) is the main fish species cultured in the Mediterranean area and constitutes an interesting model of research. Nevertheless, transcriptomic and genomic data are still scarce for this highly valuable species. A transcriptome database was constructed by de novo assembly of gilthead sea bream sequences derived from public repositories of mRNA and collections of expressed sequence tags together with new high-quality reads from five cDNA 454 normalized libraries of skeletal muscle (1), intestine (1), head kidney (2) and blood (1). Results Sequencing of the new 454 normalized libraries produced 2,945,914 high-quality reads and the de novo global assembly yielded 125,263 unique sequences with an average length of 727 nt. Blast analysis directed to protein and nucleotide databases annotated 63,880 sequences encoding for 21,384 gene descriptions, that were curated for redundancies and frameshifting at the homopolymer regions of open reading frames, and hosted at http://www.nutrigroup-iats.org/seabreamdb. Among the annotated gene descriptions, 16,177 were mapped in the Ingenuity Pathway Analysis (IPA) database, and 10,899 were eligible for functional analysis with a representation in 341 out of 372 IPA canonical pathways. The high representation of randomly selected stickleback transcripts by Blast search in the nucleotide gilthead sea bream database evidenced its high coverage of protein-coding transcripts. Conclusions The newly assembled gilthead sea bream transcriptome represents a progress in genomic resources for this species, as it probably contains more than 75% of actively transcribed genes, constituting a valuable tool to assist studies on functional genomics and future genome projects. PMID:23497320

  5. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols.

    PubMed

    Martínez-García, Pedro J; Crepeau, Marc W; Puiu, Daniela; Gonzalez-Ibeas, Daniel; Whalen, Jeanne; Stevens, Kristian A; Paul, Robin; Butterfield, Timothy S; Britton, Monica T; Reagan, Russell L; Chakraborty, Sandeep; Walawage, Sriema L; Vasquez-Gross, Hans A; Cardeno, Charis; Famula, Randi A; Pratt, Kevin; Kuruganti, Sowmya; Aradhya, Mallikarjuna K; Leslie, Charles A; Dandekar, Abhaya M; Salzberg, Steven L; Wegrzyn, Jill L; Langley, Charles H; Neale, David B

    2016-09-01

    The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-β-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-β-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.

  6. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase.

    PubMed Central

    Kawazu, T; Nakanishi, Y; Uozumi, N; Sasaki, T; Yamagata, H; Tsukagoshi, N; Udaka, S

    1987-01-01

    The gene encoding beta-amylase was cloned from Bacillus polymyxa 72 into Escherichia coli HB101 by inserting HindIII-generated DNA fragments into the HindIII site of pBR322. The 4.8-kilobase insert was shown to direct the synthesis of beta-amylase. A 1.8-kilobase AccI-AccI fragment of the donor strain DNA was sufficient for the beta-amylase synthesis. Homologous DNA was found by Southern blot analysis to be present only in B. polymyxa 72 and not in other bacteria such as E. coli or B. subtilis. B. polymyxa, as well as E. coli harboring the cloned DNA, was found to produce enzymatically active fragments of beta-amylases (70,000, 56,000, or 58,000, and 42,000 daltons), which were detected in situ by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Nucleotide sequence analysis of the cloned 3.1-kilobase DNA revealed that it contains one open reading frame of 2,808 nucleotides without a translational stop codon. The deduced amino acid sequence for these 2,808 nucleotides encoding a secretory precursor of the beta-amylase protein is 936 amino acids including a signal peptide of 33 or 35 residues at its amino-terminal end. The existence of a beta-amylase of larger than 100,000 daltons, which was predicted on the basis of the results of nucleotide sequence analysis of the gene, was confirmed by examining culture supernatants after various cultivation periods. It existed only transiently during cultivation, but the multiform beta-amylases described above existed for a long time. The large beta-amylase (approximately 160,000 daltons) existed for longer in the presence of a protease inhibitor such as chymostatin, suggesting that proteolytic cleavage is the cause of the formation of multiform beta-amylases. Images PMID:2435707

  7. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase.

    PubMed

    Kawazu, T; Nakanishi, Y; Uozumi, N; Sasaki, T; Yamagata, H; Tsukagoshi, N; Udaka, S

    1987-04-01

    The gene encoding beta-amylase was cloned from Bacillus polymyxa 72 into Escherichia coli HB101 by inserting HindIII-generated DNA fragments into the HindIII site of pBR322. The 4.8-kilobase insert was shown to direct the synthesis of beta-amylase. A 1.8-kilobase AccI-AccI fragment of the donor strain DNA was sufficient for the beta-amylase synthesis. Homologous DNA was found by Southern blot analysis to be present only in B. polymyxa 72 and not in other bacteria such as E. coli or B. subtilis. B. polymyxa, as well as E. coli harboring the cloned DNA, was found to produce enzymatically active fragments of beta-amylases (70,000, 56,000, or 58,000, and 42,000 daltons), which were detected in situ by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Nucleotide sequence analysis of the cloned 3.1-kilobase DNA revealed that it contains one open reading frame of 2,808 nucleotides without a translational stop codon. The deduced amino acid sequence for these 2,808 nucleotides encoding a secretory precursor of the beta-amylase protein is 936 amino acids including a signal peptide of 33 or 35 residues at its amino-terminal end. The existence of a beta-amylase of larger than 100,000 daltons, which was predicted on the basis of the results of nucleotide sequence analysis of the gene, was confirmed by examining culture supernatants after various cultivation periods. It existed only transiently during cultivation, but the multiform beta-amylases described above existed for a long time. The large beta-amylase (approximately 160,000 daltons) existed for longer in the presence of a protease inhibitor such as chymostatin, suggesting that proteolytic cleavage is the cause of the formation of multiform beta-amylases.

  8. Browning in Annona cherimola fruit: role of polyphenol oxidase and characterization of a coding sequence of the enzyme.

    PubMed

    Prieto, Humberto; Utz, Daniella; Castro, Alvaro; Aguirre, Carlos; González-Agüero, Mauricio; Valdés, Héctor; Cifuentes, Nicolas; Defilippi, Bruno G; Zamora, Pablo; Zúñiga, Gustavo; Campos-Vargas, Reinaldo

    2007-10-31

    Cherimoya (Annona cherimola Mill.) fruit is an attractive candidate for food processing applications as fresh cut. However, along with its desirable delicate taste, cherimoya shows a marked susceptibility to browning. This condition is mainly attributed to polyphenol oxidase activity (PPO). A general lack of knowledge regarding PPO and its role in the oxidative loss of quality in processed cherimoya fruit requires a better understanding of the mechanisms involved. The work carried out included the cloning of a full-length cDNA, an analysis of its properties in the deduced amino sequence, and linkage of its mRNA levels with enzyme activity in mature and ripe fruits after wounding. The results showed one gene different at the nucleotide level when compared with previously reported genes, but a well-conserved protein, either in functional and in structural terms. Cherimoya PPO gene (Ac-ppo, GenBank DQ990911) showed to be present apparently in one copy of the genome, and its transcripts could be significantly detected in leaves and less abundantly in flowers and fruits. Analysis of wounded matured and ripened fruits revealed an inductive behavior for mRNA levels in the flesh of mature cherimoya after 16 h. Although the highest enzymatic activity was observed on rind, a consistent PPO activity was detected on flesh samples. A lack of correlation between PPO mRNA level and PPO activity was observed, especially in flesh tissue. This is probably due to the presence of monophenolic substrates inducing a lag period, enzyme inhibitors and/or diphenolic substrates causing suicide inactivation, and proenzyme or latent isoforms of PPO. To our knowledge this is the first report of a complete PPO sequence in cherimoya. Furthermore, the gene is highly divergent from known nucleotide sequences but shows a well conserved protein in terms of its function, deduced structure, and physiological role.

  9. Natural haplotypes of FLM non-coding sequences fine-tune flowering time in ambient spring temperatures in Arabidopsis

    PubMed Central

    Lutz, Ulrich; Nussbaumer, Thomas; Spannagl, Manuel; Diener, Julia; Mayer, Klaus FX; Schwechheimer, Claus

    2017-01-01

    Cool ambient temperatures are major cues determining flowering time in spring. The mechanisms promoting or delaying flowering in response to ambient temperature changes are only beginning to be understood. In Arabidopsis thaliana, FLOWERING LOCUS M (FLM) regulates flowering in the ambient temperature range and FLM is transcribed and alternatively spliced in a temperature-dependent manner. We identify polymorphic promoter and intronic sequences required for FLM expression and splicing. In transgenic experiments covering 69% of the available sequence variation in two distinct sites, we show that variation in the abundance of the FLM-ß splice form strictly correlate (R2 = 0.94) with flowering time over an extended vegetative period. The FLM polymorphisms lead to changes in FLM expression (PRO2+) but may also affect FLM intron 1 splicing (INT6+). This information could serve to buffer the anticipated negative effects on agricultural systems and flowering that may occur during climate change. DOI: http://dx.doi.org/10.7554/eLife.22114.001 PMID:28294941

  10. HIV blocking antibodies following immunisation with chimaeric peptides coding a short N-terminal sequence of the CCR5 receptor

    PubMed Central

    Chain, Benjamin M.; Noursadeghi, Mahdad; Gardener, Michelle; Tsang, Jhen; Wright, Edward

    2008-01-01

    The chemokine receptor CCR5 is required for cellular entry by many strains of HIV, and provides a potential target for molecules, including antibodies, designed to block HIV transmission. This study investigates a novel approach to stimulate antibodies to CCR5. Rabbits were immunised with chimaeric peptides which encode a short fragment of the N-terminal sequence of CCR5, as well as an unrelated T cell epitope from Tetanus toxoid. Immunisation with these chimaeric peptides generates a strong antibody response which is highly focused on the N-terminal CCR5 sequence. The antibody to the chimaeric peptide containing an N-terminal methionine also recognises the full length CCR5 receptor on the cell surface, albeit at higher concentrations. Further comparison of binding to intact CCR5 with binding to CCR5 peptide suggest that the receptor specific antibody generated represents a very small fragment of the total anti-peptide antibody. These findings are consistent with the hypothesis that the N-terminal peptide in the context of the intact receptor has a different structure to that of the synthetic peptide. Finally, the antibody was able to block HIV infection of macrophages in vitro. Thus results of this study suggest that N-terminal fragments of CCR5 may provide potential immunogens with which to generate blocking antibodies to this receptor, while avoiding the dangers of including T cell auto-epitopes. PMID:18765264

  11. Molecular cloning of the goose ACSL3 and ACSL5 coding domain sequences and their expression characteristics during goose fatty liver development.

    PubMed

    He, H; Liu, H H; Wang, J W; Lv, J; Li, L; Pan, Z X

    2014-01-01

    It has been demonstrated that ACSL3 and ACSL5 play important roles in fat metabolism. To investigate the primary functions of ACSL3 and ACSL5 and to evaluate their expression levels during goose fatty liver development, we cloned the ACSL3 and ACSL5 coding domain sequences (CDSs) of geese using RT-PCR and analyzed their expression characteristics under different conditions using qRT-PCR. The results showed that the goose ACSL3 (JX511975) and ACSL5 (JX511976) sequences have high similarities with the chicken sequences both at the nucleotide and amino acid levels. Both ACSL3 and ACSL5 have high expression levels in goose liver. The expression levels of ACSL3 and ACSL5 in goose liver and hepatocytes can be changed by overfeeding geese and by treatment with unsaturated fatty acids, respectively. Together, these results indicate that ACSL3 and ACSL5 play important roles during fatty liver development. The different expression characteristics of goose ACSL3 and ACSL5 suggest that these two genes may be responsible for specific functions.

  12. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R).

    PubMed

    Derelle, Romain; Kondrashov, Fyodor A; Arkhipov, Vladimir Y; Corbel, Hélène; Frantz, Adrien; Gasparini, Julien; Jacquin, Lisa; Jacob, Gwenaël; Thibault, Sophie; Baudry, Emmanuelle

    2013-08-05

    Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons.

  13. Coding sequences and levels of expression of Hsc70t are identical in mice with different Orch-1 alleles

    SciTech Connect

    Snoek, M.; Vugt, H. van; Olavesen, M.G.; Milner, C.M.; Campbell, R.D.; Teuscher, C.

    1994-12-31

    Experimental allergic orchitis (EAO) is an autoimmune disease of the testis that is controlled by multple genes. The use of recombinant mouse strains has defined the map position of the H-2-associated locus controlling disease susceptibility, Orch-1, within the H-2S/H-2D interval. Over the last few years the definition of the structural organization of the C4-H-2D segment and identification of the recombination sites of the various intra-H-2 recombinations has reduced the map position of Orch-1 to the Hsp70.1-G7 interval. Three Hsp70 genes, Hsp70.1, Hsp70.3, and Hsc70t, and the genes G7b and G7a are located in this segment of DNA. In order to investigate whether Hsc70t is a suitable candidate for Orch-1 we have compared the sequence of the gene from a susceptible and a resistant haplotype.

  14. New insights into flavivirus evolution, taxonomy and biogeographic history, extended by analysis of canonical and alternative coding sequences.

    PubMed

    Moureau, Gregory; Cook, Shelley; Lemey, Philippe; Nougairede, Antoine; Forrester, Naomi L; Khasnatinov, Maxim; Charrel, Remi N; Firth, Andrew E; Gould, Ernest A; de Lamballerie, Xavier

    2015-01-01

    To generate the most diverse phylogenetic dataset for the flaviviruses to date, we determined the genomic sequences and phylogenetic relationships of 14 flaviviruses, of which 10 are primarily associated with Culex spp. mosquitoes. We analyze these data, in conjunction with a comprehensive collection of flavivirus genomes, to characterize flavivirus evolutionary and biogeographic history in unprecedented detail and breadth. Based on the presumed introduction of yellow fever virus into the Americas via the transatlantic slave trade, we extrapolated a timescale for a relevant subset of flaviviruses whose evolutionary history, shows that different Culex-spp. associated flaviviruses have been introduced from the Old World to the New World on at least five separate occasions, with 2 different sets of factors likely to have contributed to the dispersal of the different viruses. We also discuss the significance of programmed ribosomal frameshifting in a central region of the polyprotein open reading frame in some mosquito-associated flaviviruses.

  15. New Insights into Flavivirus Evolution, Taxonomy and Biogeographic History, Extended by Analysis of Canonical and Alternative Coding Sequences

    PubMed Central

    Moureau, Gregory; Cook, Shelley; Lemey, Philippe; Nougairede, Antoine; Forrester, Naomi L.; Khasnatinov, Maxim; Charrel, Remi N.; Firth, Andrew E.; Gould, Ernest A.; de Lamballerie, Xavier

    2015-01-01

    To generate the most diverse phylogenetic dataset for the flaviviruses to date, we determined the genomic sequences and phylogenetic relationships of 14 flaviviruses, of which 10 are primarily associated with Culex spp. mosquitoes. We analyze these data, in conjunction with a comprehensive collection of flavivirus genomes, to characterize flavivirus evolutionary and biogeographic history in unprecedented detail and breadth. Based on the presumed introduction of yellow fever virus into the Americas via the transatlantic slave trade, we extrapolated a timescale for a relevant subset of flaviviruses whose evolutionary history, shows that different Culex-spp. associated flaviviruses have been introduced from the Old World to the New World on at least five separate occasions, with 2 different sets of factors likely to have contributed to the dispersal of the different viruses. We also discuss the significance of programmed ribosomal frameshifting in a central region of the polyprotein open reading frame in some mosquito-associated flaviviruses. PMID:25719412

  16. Sequence Diversity in Coding Regions of Candidate Genes in the Glycoalkaloid Biosynthetic Pathway of Wild Potato Species

    PubMed Central

    Manrique-Carpintero, Norma C.; Tokuhisa, James G.; Ginzberg, Idit; Holliday, Jason A.; Veilleux, Richard E.

    2013-01-01

    Natural variation in five candidate genes of the steroidal glycoalkaloid (SGA) metabolic pathway and whole-genome single nucleotide polymorphism (SNP) genotyping were studied in six wild [Solanum chacoense (chc 80-1), S. commersonii, S. demissum, S. sparsipilum, S. spegazzinii, S. stoloniferum] and cultivated S. tuberosum Group Phureja (phu DH) potato species with contrasting levels of SGAs. Amplicons were sequenced for five candidate genes: 3-hydroxy-3-methylglutaryl coenzyme A reductase 1 and 2 (HMG1, HMG2) and 2.3-squalene epoxidase (SQE) of primary metabolism, and solanidine galactosyltransferase (SGT1), and glucosyltransferase (SGT2) of secondary metabolism. SNPs (n = 337) producing 354 variations were detected within 3.7 kb of sequenced DNA. More polymorphisms were found in introns than exons and in genes of secondary compared to primary metabolism. Although no significant deviation from neutrality was found, dN/dS ratios < 1 and negative values of Tajima’s D test suggested purifying selection and genetic hitchhiking in the gene fragments. In addition, patterns of dN/dS ratios across the SGA pathway suggested constraint by natural selection. Comparison of nucleotide diversity estimates and dN/dS ratios showed stronger selective constraints for genes of primary rather than secondary metabolism. SNPs (n = 24) with an exclusive genotype for either phu DH (low SGA) or chc 80-1 (high SGA) were identified for HMG2, SQE, SGT1 and SGT2. The SolCAP 8303 Illumina Potato SNP chip genotyping revealed eight informative SNPs on six pseudochromosomes, with homozygous and heterozygous genotypes that discriminated high, intermediate and low levels of SGA accumulation. These results can be used to evaluate SGA accumulation in segregating or association mapping populations. PMID:23853090

  17. Deep Sequencing Reveals Predominant Expression of miR-21 Amongst the Small Non-Coding RNAs in Retinal Microvascular Endothelial Cells

    PubMed Central

    Guduric-Fuchs, Jasenka; O'Connor, Anna; Cullen, Angela; Harwood, Laura; Medina, Reinhold J; O'Neill, Christina L; Stitt, Alan W; Curtis, Tim M; Simpson, David A

    2012-01-01

    The retinal vascular endothelium is essential for angiogenesis and is involved in maintaining barrier selectivity and vascular tone. The aim of this study was to identify and quantify microRNAs and other small regulatory non-coding RNAs (ncRNAs) which may regulate these crucial functions. Primary bovine retinal microvascular endothelial cells (RMECs) provide a well-characterized in vitro system for studying angiogenesis. RNA extracted from RMECs was used to prepare a small RNA library for deep sequencing (Illumina Genome Analyzer). A total of 6.8 million reads were mapped to 250 known microRNAs in miRBase (release 16). In many cases, the most frequent isomiR differed from the sequence reported in miRBase. In addition, five novel microRNAs, 13 novel bovine orthologs of known human microRNAs and multiple new members of the miR-2284/2285 family were detected. Several ∼30 nucleotide sno-miRNAs were identified, with the most highly expressed being derived from snoRNA U78. Highly expressed microRNAs previously associated with endothelial cells included miR-126 and miR-378, but the most highly expressed was miR-21, comprising more than one-third of all mapped reads. Inhibition of miR-21 with an LNA inhibitor significantly reduced proliferation, migration, and tube-forming capacity of RMECs. The independence from prior sequence knowledge provided by deep sequencing facilitates analysis of novel microRNAs and other small RNAs. This approach also enables quantitative evaluation of microRNA expression, which has highlighted the predominance of a small number of microRNAs in RMECs. Knockdown of miR-21 suggests a role for this microRNA in regulation of angiogenesis in the retinal microvasculature. J. Cell. Biochem. 113: 2098–2111, 2012. © 2012 Wiley Periodicals, Inc. PMID:22298343

  18. Analysis of five presumptive protein-coding sequences clustered between the primosome genes, 41 and 61, of bacteriophages T4, T2, and T6.

    PubMed Central

    Selick, H E; Stormo, G D; Dyson, R L; Alberts, B M

    1993-01-01

    In bacteriophage T4, there is a strong tendency for genes that encode interacting proteins to be clustered on the chromosome. There is 1.6 kb of DNA between the DNA helicase (gene 41) and the DNA primase (gene 61) genes of this virus. The DNA sequence of this region suggests that it contains five genes, designated as open reading frames (ORFs) 61.1 to 61.5, predicted to encode proteins ranging in size from 5.94 to 22.88 kDa. Are these ORFs actually genes? As one test, we compared the DNA sequence of this region in bacteriophages T2, T4, and T6 and found that ORFs 61.1, 61.3, 61.4, and 61.5 are highly conserved among the three closely related viruses. In contrast, ORF 61.2 is conserved between phages T4 and T6 yet is absent from phage T2, where it is replaced by another ORF, T2 ORF 61.2, which is not found in the T4 and T6 genomes. As a second, independent test for coding sequences, we calculated the codon base position preferences for all ORFs in this region that could encode proteins that contain at least 30 amino acids. Both the T4/T6 and T2 versions of ORF 61.2, as well as the other ORFs, have codon base position preferences that are indistinguishable from those of known T4 genes (coefficients of 0.81 to 0.94); the six other possible ORFs of at least 90 bp in this region are ruled out as genes by this test (coefficients less than zero). Thus, both evolutionary conservation and codon usage patterns lead us to conclude that ORFs 61.1 to 61.5 represent important protein-coding sequences for this family of bacteriophages. Because they are located between the genes that encode the two interacting proteins of the T4 primosome (DNA helicase plus DNA primase), one or more may function in DNA replication by modulating primosome function. Images PMID:8383243

  19. Reticulate evolution and incomplete lineage sorting among the ponderosa pines.

    PubMed

    Willyard, Ann; Cronn, Richard; Liston, Aaron

    2009-08-01

    Interspecific gene flow via hybridization may play a major role in evolution by creating reticulate rather than hierarchical lineages in plant species. Occasional diploid pine hybrids indicate the potential for introgression, but reticulation is hard to detect because ancestral polymorphism is still shared across many groups of pine species. Nucleotide sequences for 53 accessions from 17 species in subsection Ponderosae (Pinus) provide evidence for reticulate evolution. Two discordant patterns among independent low-copy nuclear gene trees and a chloroplast haplotype are better explained by introgression than incomplete lineage sorting or other causes of incongruence. Conflicting resolution of three monophyletic Pinus coulteri accessions is best explained by ancient introgression followed by a genetic bottleneck. More recent hybridization transferred a chloroplast from P. jeffreyi to a sympatric P. washoensis individual. We conclude that incomplete lineage sorting could account for other examples of non-monophyly, and caution against any analysis based on single-accession or single-locus sampling in Pinus.

  20. Human transforming growth factor type. cap alpha. coding sequence is not a directed-acting oncogene when overexpressed in NIH 3T3 cells

    SciTech Connect

    Finzi, E.; Fleming, T.; Segatto, O.; Pennington, C.Y.; Bringman, T.S.; Derynck, R.; Aaronson, S.A.

    1987-06-01

    A peptide secreted by some tumor cells in vitro imparts anchorage-independent growth to normal rat kidney (NRK) cells and has been termed transforming growth factor type ..cap alpha.. (TGF-..cap alpha..). To directly investigate the transforming properties of this factor, the human sequence coding for TGF-..cap alpha.. was placed under the control of either a metallothionein promoter or a retroviral long terminal repeat. These constructs failed to induce morphological transformation upon transfection of NIH 3T3 cells, whereas viral oncogenes encoding a truncated form of its cognate receptor, the EGF receptor, or another growth factor, sis/platelet-derived growth factor 2, efficiently induced transformed foci. Binding assays were done using (/sup 125/I)-EGF. When NIH 3T3 clonal sublines were selected by transfection of TGF-..cap alpha.. expression vectors in the presence of a dominant selectable market, they were shown to secrete large amounts of TGF-..cap alpha.. into the medium, to have downregulated EGF receptors, and to be inhibited in growth by TGF-..cap alpha.. monoclonal antibody. These results indicated that secreted TGF-..cap alpha.. interacts with its receptor at a cell surface location. Single cell-derived TGF-..cap alpha..-expressing sublines grew to high saturation density in culture. These and other results imply that TGF-..cap alpha.. exerts a growth-promoting effect on the entire NIH 3T3 cell population after secretion into the medium but little, if any, effect on the individual cell synthesizing this factor. It is concluded that the normal coding sequence for TGF-..cap alpha.. is not a direct-acting oncogene when overexpressed in NIH 3T3 cells.

  1. The human transforming growth factor type alpha coding sequence is not a direct-acting oncogene when overexpressed in NIH 3T3 cells.

    PubMed Central

    Finzi, E; Fleming, T; Segatto, O; Pennington, C Y; Bringman, T S; Derynck, R; Aaronson, S A

    1987-01-01

    A peptide secreted by some tumor cells in vitro imparts anchorage-independent growth to normal rat kidney (NRK) cells and has been termed transforming growth factor type alpha (TGF-alpha). To directly investigate the transforming properties of this factor, the human sequence coding for TGF-alpha was placed under the control of either a metallothionein promoter or a retroviral long terminal repeat. These constructs failed to induce morphological transformation upon transfection of NIH 3T3 cells, whereas viral oncogenes encoding a truncated form of its cognate receptor, the EGF receptor, or another growth factor, sis/platelet-derived growth factor 2, efficiently induced transformed foci. When NIH 3T3 clonal sublines were selected by transfection of TGF-alpha expression vectors in the presence of a dominant selectable marker, they were shown to secrete large amounts of TGF-alpha into the medium, to have downregulated EGF receptors, and to be inhibited in growth by TGF-alpha monoclonal antibody. These results indicated that secreted TGF-alpha interacts with its receptor at a cell surface location. Single cell-derived TGF-alpha-expressing sublines grew to high saturation density in culture. However, when plated as single cells on contact-inhibited monolayers of NIH 3T3 cells, they failed to form colonies, whereas v-sis- and v-erbB-transfected cells formed transformed colonies under the same conditions. Moreover, TGF-alpha-expressing sublines were not tumorigenic in nude mice. These and other results imply that TGF-alpha exerts a growth-promoting effect on the entire NIH 3T3 cell population after secretion into the medium but little, if any, effect on the individual cell synthesizing this factor. It is concluded that the normal coding sequence for TGF-alpha is not a direct-acting oncogene when overexpressed in NIH 3T3 cells. Images PMID:3035551

  2. Investigation of complete and incomplete fusion in 20Ne + 51V system using recoil range measurement

    NASA Astrophysics Data System (ADS)

    Ali, Sabir; Ahmad, Tauseeef; Kumar, Kamal; Rizvi, I. A.; Agarwal, Avinash; Ghugre, S. S.; Sinha, A. K.; Chaubey, A. K.

    2015-01-01

    Recoil range distributions of evaporation residues, populated in 20Ne + 51V reaction at Elab ≈ 145 MeV, have been studied to determine the degree of momentum transferred through the complete and incomplete fusion reactions. Evaporation residues (ERs) populated through the complete and incomplete fusion reactions have been identified on the basis of their recoil range in the Al catcher medium. Measured recoil range of evaporation residues have been compared with the theoretical value calculated using the code SRIM. Range integrated cross section of observed ERs have been compared with the value predicted by statistical model code PACE4.

  3. Analysing how negative emotions emerge and are addressed in veterinary consultations, using the Verona Coding Definitions of Emotional Sequences (VR-CoDES).

    PubMed

    Vijfhuizen, Malou; Bok, Harold; Matthew, Susan M; Del Piccolo, Lidia; McArthur, Michelle

    2017-04-01

    To explore the applicability, need for modifications and reliability of the VR-CoDES in a veterinary setting while also gaining a deeper understanding of clients' expressions of negative emotion and how they are addressed by veterinarians. The Verona Coding Definitions of Emotional Sequences for client cues and concerns (VR-CoDES-CC) and health provider responses (VR-CoDES-P) were used to analyse 20 audiotaped veterinary consultations. Inter-rater reliability was established. The applicability of definitions of the VR-CoDES was identified, together with the need for specific modifications to suit veterinary consultations. The VR-CoDES-CC and VR-CoDES-P generally applied to veterinary consultations. Cue and concern reliability was found satisfactory for most types of cues, but not for concerns. Response reliability was satisfactory for explicitness, and for providing and reducing space for further disclosure. Modifications to the original coding system were necessary to accurately reflect the veterinary context and included minor additions to the VR-CoDES-CC. Using minor additions to the VR-CoDES including guilt, reassurance and cost discussions it can be reliably adopted to assess clients' implicit expressions of negative emotion and veterinarians' responses. The modified VR-CoDES could be of great value when combined with existing frameworks used for teaching and researching veterinary communication. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. Variations in endothelin receptor B subtype 2 (EDNRB2) coding sequences and mRNA expression levels in 4 Muscovy duck plumage colour phenotypes.

    PubMed

    Wu, N; Qin, H; Wang, M; Bian, Y; Dong, B; Sun, G; Zhao, W; Chang, G; Xu, Q; Chen, G

    2017-04-01

    1. Endothelin receptor B subtype 2 (EDNRB2) is a paralog of EDNRB, which encodes a 7-transmembrane G-protein coupled receptor. Previous studies reported that EDNRB was essential for melanoblast migration in mammals and ducks. 2. Muscovy ducks have different plumage colour phenotypes. Variations in EDNRB2 coding sequences (CDSs) and mRNA expression levels were investigated in 4 different Muscovy duck plumage colour phenotypes, including black, black mutant, silver and white head. 3. The EDNRB2 gene from Muscovy duck was cloned; it had a length of 6435 bp and encoded 437 amino acids. The coding region was screened and potential single nucleotide polymorphisms were identified. Eight mutations were obtained, including one missense variant (c.64C > T) and 7 synonymous substitutions. The substitutions were associated with plumage colour phenotypes. 4. The EDNRB2 mRNA expression levels were compared between feather pulp from black birds and black mutant birds. The results indicated that EDNRB2 transcripts in feather pulp were significantly higher in black feathers than in white feathers. 5. The results determined the variation of EDNRB2 CDS and mRNA expression in Muscovy ducks of various plumage colours.

  5. Older persons' expressions of emotional cues and concerns during home care visits. Application of the Verona coding definitions of emotional sequences (VR-CoDES) in home care.

    PubMed

    Sundler, Annelie J; Höglander, Jessica; Eklund, Jakob Håkansson; Eide, Hilde; Holmström, Inger K

    2017-02-01

    This study aims to a) explore to what extent older persons express emotional cues and concerns during home care visits; b) describe what cues and concerns these older persons expressed, and c) explore who initiated these cues and concerns. A descriptive and cross-sectional study was conducted. Data consisted of 188 audio recorded home care visits with older persons and registered nurses or nurse assistants, coded with the Verona coding definitions on emotional sequences (VR-CoDES). Emotional expressions of cues and concerns occurred in 95 (51%) of the 188 recorded home care visits. Most frequent were implicit expressions of cues (n=292) rather than explicit concerns (n=24). Utterances with hints to hidden concerns (63,9%, n=202) were most prevalent, followed by vague or unspecific expressions of emotional worries (15,8%, n=50). Most of these were elicited by the nursing staff (63%, n=200). Emotional needs expressed by the older persons receiving home care were mainly communicated implicitly. To be attentive to such vaguely expressed emotions may demand nursing staff to be sensitive and open. The VR-CoDES can be applied on audio recorded home care visits to analyse verbal and emotional communication, and may allow comparative research. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  6. The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short non-coding sequences.

    PubMed

    Leushkin, Evgeny V; Sutormin, Roman A; Nabieva, Elena R; Penin, Aleksey A; Kondrashov, Alexey S; Logacheva, Maria D

    2013-07-15

    Genlisea aurea (Lentibulariaceae) is a carnivorous plant with unusually small genome size - 63.6 Mb - one of the smallest known among higher plants. Data on the genome sizes and the phylogeny of Genlisea suggest that this is a derived state within the genus. Thus, G. aurea is an excellent model organism for studying evolutionary mechanisms of genome contraction. Here we report sequencing and de novo draft assembly of G. aurea genome. The assembly consists of 10,687 contigs of the total length of 43.4 Mb and includes 17,755 complete and partial protein-coding genes. Its comparison with the genome of Mimulus guttatus, another representative of higher core Lamiales clade, reveals striking differences in gene content and length of non-coding regions. Genome contraction was a complex process, which involved gene loss and reduction of lengths of introns and intergenic regions, but not intron loss. The gene loss is more frequent for the genes that belong to multigenic families indicating that genetic redundancy is an important prerequisite for genome size reduction.

  7. Cloning, sequence analysis, and expression in Escherichia coli of a gene coding for a beta-mannanase from the extremely thermophilic bacterium "Caldocellum saccharolyticum".

    PubMed Central

    Lüthi, E; Jasmat, N B; Grayling, R A; Love, D R; Bergquist, P L

    1991-01-01

    A lambda recombinant phage expressing beta-mannanase activity in Escherichia coli has been isolated from a genomic library of the extremely thermophilic anaerobe "Caldocellum saccharolyticum." The gene was cloned into pBR322 on a 5-kb BamHI fragment, and its location was obtained by deletion analysis. The sequence of a 2.1-kb fragment containing the mannanase gene has been determined. One open reading frame was found which could code for a protein of Mr 38,904. The mannanase gene (manA) was overexpressed in E. coli by cloning the gene downstream from the lacZ promoter of pUC18. The enzyme was most active at pH 6 and 80 degrees C and degraded locust bean gum, guar gum, Pinus radiata glucomannan, and konjak glucomannan. The noncoding region downstream from the mannanase gene showed strong homology to celB, a gene coding for a cellulase from the same organism, suggesting that the manA gene might have been inserted into its present position on the "C. saccharolyticum" genome by homologous recombination. Images PMID:2039230

  8. Plasmid- and chromosome-coded aerobactin synthesis in enteric bacteria: insertion sequences flank operon in plasmid-mediated systems.

    PubMed Central

    McDougall, S; Neilands, J B

    1984-01-01

    Large plasmids were detected in two aerobactin-producing enteric bacterial species (Aerobacter aerogenes 62-I, Salmonella arizona SA1, and S. arizona SL5301) and designated pSMN1, pSMN2, and pSMN3, respectively. Other Salmonella spp., namely, S. arizona SL5302, S. arizona SLS, Salmonella austin, and Salmonella memphis, formed aerobactin but contained no detectable large plasmids. S. arizona SL5283 made no aerobactin. A probe consisting of the aerobactin biosynthetic genes cloned on plasmid pABN5 hybridized to a HindIII digest of pSMN1 but not to digests of pSMN2 or pSMN3. A larger probe, the insert of pABN1 containing the complete aerobactin operon, hybridized to four fragments in HindIII digests of the parent plasmid, pColV-K30. A 2.0-kilobase PvuII fragment responsible for this multiple-hybridization pattern was cloned into vector pUC9 to form pSMN30. The latter was mapped and shown to correspond to either IS1 or to a closely related insertion sequence. Images PMID:6330037

  9. Sequence analysis of coding DNA fragments of pfcrt and pfmdr-1 genes in Plasmodium falciparum isolates from Odisha, India.

    PubMed

    Sutar, Sasmita Kumari Das; Gupta, Bhavna; Ranjit, Manoranjan; Kar, Shantanu Kumar; Das, Aparup

    2011-02-01

    The global emergence and spread of malaria parasites resistant to antimalarial drugs is the major problem in malaria control. The genetic basis of the parasite's resistance to the antimalarial drug chloroquine (CQ) is well-documented, allowing for the analysis of field isolates of malaria parasites to address evolutionary questions concerning the origin and spread of CQ-resistance. Here, we present DNA sequence analyses of both the second exon of the Plasmodium falciparum CQ-resistance transporter (pfcrt) gene and the 5' end of the P. falciparum multidrug-resistance 1 (pfmdr-1) gene in 40 P. falciparum field isolates collected from eight different localities of Odisha, India. First, we genotyped the samples for the pfcrt K76T and pfmdr-1 N86Y mutations in these two genes, which are the mutations primarily implicated in CQ-resistance. We further analyzed amino acid changes in codons 72-76 of the pfcrt haplotypes. Interestingly, both the K76T and N86Y mutations were found to co-exist in 32 out of the total 40 isolates, which were of either the CVIET or SVMNT haplotype, while the remaining eight isolates were of the CVMNK haplotype. In total, eight nonsynonymous single nucleotide polymorphisms (SNPs) were observed, six in the pfcrt gene and two in the pfmdr-1 gene. One poorly studied SNP in the pfcrt gene (A97T) was found at a high frequency in many P. falciparum samples. Using population genetics to analyze these two gene fragments, we revealed comparatively higher nucleotide diversity in the pfcrt gene than in the pfmdr-1 gene. Furthermore, linkage disequilibrium was found to be tight between closely spaced SNPs of the pfcrt gene. Finally, both the pfcrt and the pfmdr-1 genes were found to evolve under the standard neutral model of molecular evolution.

  10. 7 CFR 763.8 - Incomplete applications.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... days of receipt of an incomplete application, the Agency will provide the seller and buyer written notice of any additional information that must be provided. The seller or buyer, as applicable,...

  11. Identification of the 3' and 5' terminal sequences of the 8 rna genome segments of european and north american genotypes of infectious salmon anemia virus (an orthomyxovirus) and evidence for quasispecies based on the non-coding sequences of transcripts

    PubMed Central

    2010-01-01

    Background Infectious salmon anemia (ISA) virus (ISAV) is a pathogen of marine-farmed Atlantic salmon (Salmo salar); a disease first diagnosed in Norway in 1984. This virus, which was first characterized following its isolation in cell culture in 1995, belongs to the family Orthomyxoviridae, genus, Isavirus. The Isavirus genome consists of eight single-stranded RNA segments of negative sense, each with one to three open reading frames flanked by 3' and 5' non-coding regions (NCRs). Although the terminal sequences of other members of the family Orthomyxoviridae such as Influenzavirus A have been extensively analyzed, those of Isavirus remain largely unknown, and the few reported are from different ISAV strains and on different ends of the different RNA segments. This paper describes a comprehensive analysis of the 3' and 5' end sequences of the eight RNA segments of ISAV of both European and North American genotypes, and evidence of quasispecies of ISAV based on sequence variation in the untranslated regions (UTRs) of transcripts. Results Two different ISAV strains and two different RNA preparations were used in this study. ISAV strain ADL-PM 3205 ISAV-07 (ADL-ISAV-07) of European genotype was the source of total RNA extracted from ISAV-infected TO cells, which contained both viral mRNA and cRNA. ISAV strain NBISA01 of North American genotype was the source of vRNA extracted from purified virus. The NCRs of each segment were identified by sequencing cDNA prepared by three different methods, 5' RACE (Rapid amplification of cDNA ends), 3' RACE, and RNA ligation mediated PCR. Sequence analysis of five clones each derived from one RT-PCR product from each NCR of ISAV transcripts of segments 1 to 8 revealed significant heterogeneity among the clones of the same segment end, providing unequivocal evidence for presence of intra-segment ISAV quasispecies. Both RNA preparations (mRNA/cRNA and vRNA) yielded complementary sequence information, allowing the simultaneous

  12. Genome-Wide Detection of Predicted Non-coding RNAs Related to the Adhesion Process in Vibrio alginolyticus Using High-Throughput Sequencing

    PubMed Central

    Huang, Lixing; Hu, Jiao; Su, Yongquan; Qin, Yingxue; Kong, Wendi; Zhao, Lingmin; Ma, Ying; Xu, Xiaojin; Lin, Mao; Zheng, Jiang; Yan, Qingpi

    2016-01-01

    The ability of bacteria to adhere to fish mucus can be affected by environmental conditions and is considered to be a key virulence factor of Vibrio alginolyticus. However, the molecular mechanism underlying this ability remains unclear. Our previous study showed that stress conditions such as exposure to Cu, Pb, Hg, and low pH are capable of reducing the adhesion ability of V. alginolyticus. Non-coding RNAs (ncRNAs) play a crucial role in the intricate regulation of bacterial gene expression, thereby affecting bacterial pathogenicity. Thus, we hypothesized that ncRNAs play a key role in the V. alginolyticus adhesion process. To validate this, we combined high-throughput sequencing with computational techniques to detect ncRNA dynamics in samples after stress treatments. The expression of randomly selected novel ncRNAs was confirmed by QPCR. Among the significantly altered ncRNAs, 30 were up-regulated and 2 down-regulated by all stress treatments. The QPCR results reinforced the reliability of the sequencing data. Target prediction and KEGG pathway analysis indicated that these ncRNAs are closely related to pathways associated with in vitro adhesion, and our results indicated that chemical stress-induced reductions in the adhesion ability of V. alginolyticus might be due to the perturbation of ncRNA expression. Our findings provide important information for further functional characterization of ncRNAs during the adhesion process of V. alginolyticus. PMID:27199948

  13. SHAPE Analysis of the RNA Secondary Structure of the Mouse Hepatitis Virus 5′ Untranslated Region and N-Terminal Nsp1 Coding Sequences

    PubMed Central

    Yang, Dong; Liu, Pinghua; Wudeck, Elyse V.; Giedroc, David P.; Leibowitz, Julian L.

    2014-01-01

    SHAPE technology was used to analyze RNA secondary structure of the 5′ most 474 nts of the MHV-A59 genome encompassing the minimal 5′ cis-acting region required for defective interfering RNA replication. The structures generated were in agreement with previous characterizations of SL1 through SL4 and two recently predicted secondary structure elements, S5 and SL5A. SHAPE provided biochemical support for four additional stem-loops not previously functionally investigated in MHV. Secondary structure predictions for 5′ regions of MHV-A59, BCoV and SARS-CoV were similar despite high sequence divergence. The pattern of SHAPE reactivity of in virio genomic RNA, ex virio genomic RNA, and in vitro synthesized RNA were similar, suggesting that binding of N protein or other proteins to virion RNA fails to protect the RNA from reaction with lipid permeable SHAPE reagent. Reverse genetic experiments suggested that SL5C and SL6 within the nsp1 coding sequence are not required for viral replication. PMID:25462342

  14. The Human CCHC-type Zinc Finger Nucleic Acid-Binding Protein Binds G-Rich Elements in Target mRNA Coding Sequences and Promotes Translation.

    PubMed

    Benhalevy, Daniel; Gupta, Sanjay K; Danan, Charles H; Ghosal, Suman; Sun, Hong-Wei; Kazemier, Hinke G; Paeschke, Katrin; Hafner, Markus; Juranek, Stefan A

    2017-03-21

    The CCHC-type zinc finger nucleic acid-binding protein (CNBP/ZNF9) is conserved in eukaryotes and is essential for embryonic development in mammals. It has been implicated in transcriptional, as well as post-transcriptional, gene regulation; however, its nucleic acid ligands and molecular function remain elusive. Here, we use multiple systems-wide approaches to identify CNBP targets and function. We used photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) to identify 8,420 CNBP binding sites on 4,178 mRNAs. CNBP preferentially bound G-rich elements in the target mRNA coding sequences, most of which were previously found to form G-quadruplex and other stable structures in vitro. Functional analyses, including RNA sequencing, ribosome profiling, and quantitative mass spectrometry, revealed that CNBP binding did not influence target mRNA abundance but rather increased their translational efficiency. Considering that CNBP binding prevented G-quadruplex structure formation in vitro, we hypothesize that CNBP is supporting translation by resolving stable structures on mRNAs.

  15. Functional Anthology of Intrinsic Disorder. II. Cellular Components, Domains, Technical Terms, Developmental Processes and Coding Sequence Diversities Correlated with Long Disordered Regions

    PubMed Central

    Vucetic, Slobodan; Xie, Hongbo; Iakoucheva, Lilia M.; Oldfield, Christopher J.; Dunker, A. Keith; Obradov