Sample records for coding sequence incompleteness

  1. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    PubMed

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  2. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  3. Motion Detection in Ultrasound Image-Sequences Using Tensor Voting

    NASA Astrophysics Data System (ADS)

    Inba, Masafumi; Yanagida, Hirotaka; Tamura, Yasutaka

    2008-05-01

    Motion detection in ultrasound image sequences using tensor voting is described. We have been developing an ultrasound imaging system adopting a combination of coded excitation and synthetic aperture focusing techniques. In our method, frame rate of the system at distance of 150 mm reaches 5000 frame/s. Sparse array and short duration coded ultrasound signals are used for high-speed data acquisition. However, many artifacts appear in the reconstructed image sequences because of the incompleteness of the transmitted code. To reduce the artifacts, we have examined the application of tensor voting to the imaging method which adopts both coded excitation and synthetic aperture techniques. In this study, the basis of applying tensor voting and the motion detection method to ultrasound images is derived. It was confirmed that velocity detection and feature enhancement are possible using tensor voting in the time and space of simulated ultrasound three-dimensional image sequences.

  4. Curated eutherian third party data gene data sets.

    PubMed

    Premzl, Marko

    2016-03-01

    The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.

  5. Identification and Classification of New Transcripts in Dorper and Small-Tailed Han Sheep Skeletal Muscle Transcriptomes.

    PubMed

    Chao, Tianle; Wang, Guizhi; Wang, Jianmin; Liu, Zhaohua; Ji, Zhibin; Hou, Lei; Zhang, Chunlan

    2016-01-01

    High-throughput mRNA sequencing enables the discovery of new transcripts and additional parts of incompletely annotated transcripts. Compared with the human and cow genomes, the reference annotation level of the sheep genome is still low. An investigation of new transcripts in sheep skeletal muscle will improve our understanding of muscle development. Therefore, applying high-throughput sequencing, two cDNA libraries from the biceps brachii of small-tailed Han sheep and Dorper sheep were constructed, and whole-transcriptome analysis was performed to determine the unknown transcript catalogue of this tissue. In this study, 40,129 transcripts were finally mapped to the sheep genome. Among them, 3,467 transcripts were determined to be unannotated in the current reference sheep genome and were defined as new transcripts. Based on protein-coding capacity prediction and comparative analysis of sequence similarity, 246 transcripts were classified as portions of unannotated genes or incompletely annotated genes. Another 1,520 transcripts were predicted with high confidence to be long non-coding RNAs. Our analysis also revealed 334 new transcripts that displayed specific expression in ruminants and uncovered a number of new transcripts without intergenus homology but with specific expression in sheep skeletal muscle. The results confirmed a complex transcript pattern of coding and non-coding RNA in sheep skeletal muscle. This study provided important information concerning the sheep genome and transcriptome annotation, which could provide a basis for further study.

  6. Complete mitochondrial genome of Bactrocera arecae (Insecta: Tephritidae) by next-generation sequencing and molecular phylogeny of Dacini tribe

    PubMed Central

    Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip

    2015-01-01

    The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633

  7. When Genomics Is Not Enough: Experimental Evidence for a Decrease in LINE-1 Activity During the Evolution of Australian Marsupials

    PubMed Central

    Gallus, Susanne; Lammers, Fritjof

    2016-01-01

    The autonomous transposable element LINE-1 is a highly abundant element that makes up between 15% and 20% of therian mammal genomes. Since their origin before the divergence of marsupials and placental mammals, LINE-1 elements have contributed actively to the genome landscape. A previous in silico screen of the Tasmanian devil genome revealed a lack of functional coding LINE-1 sequences. In this study we present the results of an in vitro analysis from a partial LINE-1 reverse transcriptase coding sequence in five marsupial species. Our experimental screen supports the in silico findings of the genome-wide degradation of LINE-1 sequences in the Tasmanian devil, and identifies a high frequency of degraded LINE-1 sequences in other Australian marsupials. The comparison between the experimentally obtained LINE-1 sequences and reference genome assemblies suggests that conclusions from in silico analyses of retrotransposition activity can be influenced by incomplete genome assemblies from short reads. PMID:27389686

  8. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-02-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.

  9. Cloning and expression of a cDNA coding for a human monocyte-derived plasminogen activator inhibitor.

    PubMed Central

    Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P

    1988-01-01

    Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578

  10. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  11. Sequence Analysis of Mitochondrial Genome of Toxascaris leonina from a South China Tiger.

    PubMed

    Li, Kangxin; Yang, Fang; Abdullahi, A Y; Song, Meiran; Shi, Xianli; Wang, Minwei; Fu, Yeqi; Pan, Weida; Shan, Fang; Chen, Wu; Li, Guoqing

    2016-12-01

    Toxascaris leonina is a common parasitic nematode of wild mammals and has significant impacts on the protection of rare wild animals. To analyze population genetic characteristics of T. leonina from South China tiger, its mitochondrial (mt) genome was sequenced. Its complete circular mt genome was 14,277 bp in length, including 12 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 2 non-coding regions. The nucleotide composition was biased toward A and T. The most common start codon and stop codon were TTG and TAG, and 4 genes ended with an incomplete stop codon. There were 13 intergenic regions ranging 1 to 10 bp in size. Phylogenetically, T. leonina from a South China tiger was close to canine T. leonina . This study reports for the first time a complete mt genome sequence of T. leonina from the South China tiger, and provides a scientific basis for studying the genetic diversity of nematodes between different hosts.

  12. The flaA locus of Bacillus subtilis is part of a large operon coding for flagellar structures, motility functions, and an ATPase-like polypeptide.

    PubMed Central

    Albertini, A M; Caramori, T; Crabb, W D; Scoffone, F; Galizzi, A

    1991-01-01

    We cloned and sequenced 8.3 kb of Bacillus subtilis DNA corresponding to the flaA locus involved in flagellar biosynthesis, motility, and chemotaxis. The DNA sequence revealed the presence of 10 complete and 2 incomplete open reading frames. Comparison of the deduced amino acid sequences to data banks showed similarities of nine of the deduced products to a number of proteins of Escherichia coli and Salmonella typhimurium for which a role in flagellar functioning has been directly demonstrated. In particular, the sequence data suggest that the flaA operon codes for the M-ring protein, components of the motor switch, and the distal part of the basal-body rod. The gene order is remarkably similar to that described for region III of the enterobacterial flagellar regulon. One of the open reading frames was translated into a protein with 48% amino acid identity to S. typhimurium FliI and 29% identity to the beta subunit of E. coli ATP synthase. PMID:1828465

  13. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409 T with an incomplete denitrification pathway

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, En -Min; Murugapiran, Senthil K.; Mefferd, Chrisabelle C.

    Thermus amyloliquefaciens type strain YIM 77409 T is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409 T together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transportersmore » and enzymes for heterotrophy highlight the broad heterotrophic lifestyle of this organism. Furthermore, a denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain.« less

  14. High-quality draft genome sequence of the Thermus amyloliquefaciens type strain YIM 77409 T with an incomplete denitrification pathway

    DOE PAGES

    Zhou, En -Min; Murugapiran, Senthil K.; Mefferd, Chrisabelle C.; ...

    2016-02-27

    Thermus amyloliquefaciens type strain YIM 77409 T is a thermophilic, Gram-negative, non-motile and rod-shaped bacterium isolated from Niujie Hot Spring in Eryuan County, Yunnan Province, southwest China. In the present study we describe the features of strain YIM 77409 T together with its genome sequence and annotation. The genome is 2,160,855 bp long and consists of 6 scaffolds with 67.4 % average GC content. A total of 2,313 genes were predicted, comprising 2,257 protein-coding and 56 RNA genes. The genome is predicted to encode a complete glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle. Additionally, a large number of transportersmore » and enzymes for heterotrophy highlight the broad heterotrophic lifestyle of this organism. Furthermore, a denitrification gene cluster included genes predicted to encode enzymes for the sequential reduction of nitrate to nitrous oxide, consistent with the incomplete denitrification phenotype of this strain.« less

  15. The Limits of Coding with Joint Constraints on Detected and Undetected Error Rates

    NASA Technical Reports Server (NTRS)

    Dolinar, Sam; Andrews, Kenneth; Pollara, Fabrizio; Divsalar, Dariush

    2008-01-01

    We develop a remarkably tight upper bound on the performance of a parameterized family of bounded angle maximum-likelihood (BA-ML) incomplete decoders. The new bound for this class of incomplete decoders is calculated from the code's weight enumerator, and is an extension of Poltyrev-type bounds developed for complete ML decoders. This bound can also be applied to bound the average performance of random code ensembles in terms of an ensemble average weight enumerator. We also formulate conditions defining a parameterized family of optimal incomplete decoders, defined to minimize both the total codeword error probability and the undetected error probability for any fixed capability of the decoder to detect errors. We illustrate the gap between optimal and BA-ML incomplete decoding via simulation of a small code.

  16. Student Use of Physics to Make Sense of Incomplete but Functional VPython Programs in a Lab Setting

    NASA Astrophysics Data System (ADS)

    Weatherford, Shawn A.

    2011-12-01

    Computational activities in Matter & Interactions, an introductory calculus-based physics course, have the instructional goal of providing students with the experience of applying the same set of a small number of fundamental principles to model a wide range of physical systems. However there are significant instructional challenges for students to build computer programs under limited time constraints, especially for students who are unfamiliar with programming languages and concepts. Prior attempts at designing effective computational activities were successful at having students ultimately build working VPython programs under the tutelage of experienced teaching assistants in a studio lab setting. A pilot study revealed that students who completed these computational activities had significant difficultly repeating the exact same tasks and further, had difficulty predicting the animation that would be produced by the example program after interpreting the program code. This study explores the interpretation and prediction tasks as part of an instructional sequence where students are asked to read and comprehend a functional, but incomplete program. Rather than asking students to begin their computational tasks with modifying program code, we explicitly ask students to interpret an existing program that is missing key lines of code. The missing lines of code correspond to the algebraic form of fundamental physics principles or the calculation of forces which would exist between analogous physical objects in the natural world. Students are then asked to draw a prediction of what they would see in the simulation produced by the VPython program and ultimately run the program to evaluate the students' prediction. This study specifically looks at how the participants use physics while interpreting the program code and creating a whiteboard prediction. This study also examines how students evaluate their understanding of the program and modification goals at the beginning of the modification task. While working in groups over the course of a semester, study participants were recorded while they completed three activities using these incomplete programs. Analysis of the video data showed that study participants had little difficulty interpreting physics quantities, generating a prediction, or determining how to modify the incomplete program. Participants did not base their prediction solely from the information from the incomplete program. When participants tried to predict the motion of the objects in the simulation, many turned to their knowledge of how the system would evolve if it represented an analogous real-world physical system. For example, participants attributed the real-world behavior of springs to helix objects even though the program did not include calculations for the spring to exert a force when stretched. Participants rarely interpreted lines of code in the computational loop during the first computational activity, but this changed during latter computational activities with most participants using their physics knowledge to interpret the computational loop. Computational activities in the Matter & Interactions curriculum were revised in light of these findings to include an instructional sequence of tasks to build a comprehension of the example program. The modified activities also ask students to create an additional whiteboard prediction for the time-evolution of the real-world phenomena which the example program will eventually model. This thesis shows how comprehension tasks identified by Palinscar and Brown (1984) as effective in improving reading comprehension are also effective in helping students apply their physics knowledge to interpret a computer program which attempts to model a real-world phenomena and identify errors in their understanding of the use, or omission, of fundamental physics principles in a computational model.

  17. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.

    PubMed

    Nagy, Alinda; Hegyi, Hédi; Farkas, Krisztina; Tordai, Hedvig; Kozma, Evelin; Bányai, László; Patthy, László

    2008-08-27

    Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.

  18. Complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus).

    PubMed

    Li, Linmiao; Li, Min; Wu, Zhengjun; Chen, Jinping

    2015-01-01

    We have characterized the complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus) and described its organization in this study. The total length of C. sphinx complete mitochondrial genome was 16,895 bp with the base composition of 32.54% A, 14.05% G, 25.82% T and 27.59% C. The complete mitochondrial genome included 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA) and 1 control region (D-loop). The control region was 1435 bp long with the sequence CATACG repeat 64 times. Three protein-coding genes (ND1, COI and ND4) were ended with incomplete stop codon TA or T.

  19. Expanding the proteome: disordered and alternatively-folded proteins

    PubMed Central

    Dyson, H. Jane

    2011-01-01

    Proteins provide much of the scaffolding for life, as well as undertaking a variety of essential catalytic reactions. These characteristic functions have led us to presuppose that proteins are in general functional only when well-structured and correctly folded. As we begin to explore the repertoire of possible protein sequences inherent in the human and other genomes, two stark facts that belie this supposition become clear: firstly, the number of apparent open reading frames in the human genome is significantly smaller than appears to be necessary to code for all of the diverse proteins in higher organisms, and secondly that a significant proportion of the protein sequences that would be coded by the genome would not be expected to form stable three-dimensional structures. Clearly the genome must include coding for a multitude of alternative forms of proteins, some of which may be partly or fully disordered or incompletely structured in their functional states. At the same time as this likelihood was recognized, experimental studies also began to uncover examples of important protein molecules and domains that were incompletely structured or completely disordered in solution, yet remained perfectly functional. In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence. Answers to the question “why would a particular domain need to be unstructured?” are as varied as the systems where such domains are found. This review provides a survey of recent new directions in this field, and includes an evaluation of the role not only of intrinsically disordered proteins but of partially structured and highly dynamic members of the disorder-order continuum. PMID:21729349

  20. The complete mitochondrial genome of the Aluterus monoceros.

    PubMed

    Li, Wenshen; Zhang, Guoqing; Wen, Xin; Wang, Qian; Chen, Guohua

    2016-07-01

    The complete mitochondrial genome of Aluterus monoceros (A. monoceros) has been sequenced. The mitochondrial genome of A. monoceros is 16,429 bp in length, consisting of 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and a D-loop region (Gen Bank accession number KP637022). The base A + T of the mitochondrial genome is 63.25%, including 33.16% of A, 30.09% of T and 20.74% of C. Twelve protein-coding genes start with a standard ATG as the initiation codon, expect for the COXI, which begins with GTG. Some of the termination codons are incomplete T or TA, except for the ND1, COXI, ATP8, ND4L1, ND5 and ND6, which stop with TAA. Construction of phylogenetic trees based on the entire mitochondrial genome sequence of 14 Tetrodontiformes species constructed has suggested that A. monoceros has closer relationship with Acreichthys tomentosus and Monacanthus chinensis, and they constitute a sister group.

  1. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

    PubMed

    Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong

    2012-01-01

    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.

  2. Identification and subspecific differentiation of Mycobacterium scrofulaceum by automated sequencing of a region of the gene (hsp65) encoding a 65-kilodalton heat shock protein.

    PubMed Central

    Swanson, D S; Pan, X; Musser, J M

    1996-01-01

    Mycobacterium scrofulaceum is most commonly recovered from children with cervical lymphadenitis, although it also accounts for approximately 2% of the mycobacterial infections in AIDS patients. Species assignment of M. scrofulaceum isolated by conventional techniques can be difficult and time-consuming. To develop a strategy for rapid species assignment of these organisms, a 360-bp region of the gene (hsp65) encoding a 65-kDa heat shock protein in 37 isolates from diverse sources was sequenced. Eight hsp65 alleles were identified, and these sequences formed phylogenetic clusters and lineages largely distinct from other Mycobacterium species. There was incomplete correlation between serovar designation and hsp65 allele assignment. The hsp65 data correlated strongly with the results of sequence analysis of the gene coding for 16S rRNA. Automated DNA sequencing of a 360-bp region of the hsp65 gene provides a rapid and unambiguous method for species assignment of these acid-fast organisms for diagnostic purposes. PMID:8940463

  3. The complete mitochondrial genome of Gryllotalpa unispina Saussure, 1874 (Orthoptera: Gryllotalpoidea: Gryllotalpidae).

    PubMed

    Zhang, Yulong; Shao, Dandan; Cai, Miao; Yin, Hong; Zhang, Daochuan

    2016-01-01

    The complete mitochondrial genome of Gryllotalpa unispina was 15,513 bp in length and contained 70.9% AT. All G. unispina protein-coding sequences except for the nad2 started with a typical ATN codon. The usual termination codons (TAA) and incomplete stop codons (T) were found from 13 protein-coding genes. All tRNA genes were folded into the typical cloverleaf secondary structure, except trnS(AGN) lacking the dihydrouridine arm. The sizes of the large and small ribosomal RNA genes were 1245 and 725 bp, respectively. The A + T-rich region was 917 bp in length with 76.8%. The orientation and gene order of the G. unispina mitogenome were identical to the G. orientalis and G. pluvialis, there was no phenomenon of "DK rearrangement" which has been widely reported in Caelifera.

  4. Structure and evolution of the mitochondrial genome of Exorista sorbillans: the Tachinidae (Diptera: Calyptratae) perspective.

    PubMed

    Shao, Yuan-jun; Hu, Xian-qiong; Peng, Guang-da; Wang, Rui-xian; Gao, Rui-na; Lin, Chao; Shen, Wei-de; Li, Rui; Li, Bing

    2012-12-01

    The first complete mitochondrial genome (mitogenome) of Tachinidae Exorista sorbillans (Diptera) is sequenced by PCR-based approach. The circular mitogenome is 14,960 bp long and has the representative mitochondrial gene (mt gene) organization and order of Diptera. All protein-coding sequences are initiated with ATN codon; however, the only exception is Cox I gene, which has a 4-bp ATCG putative start codon. Ten of the thirteen protein-coding genes have a complete termination codon (TAA), but the rest are seated on the H strand with incomplete codons. The mitogenome of E. sorbillans is biased toward A+T content at 78.4 %, and the strand-specific bias is in reflection of the third codon positions of mt genes, and their T/C ratios as strand indictor are higher on the H strand more than those on the L strand pointing at any strain of seven Diptera flies. The length of the A+T-rich region of E. sorbillans is 106 bp, including a tandem triple copies of a13-bp fragment. Compared to Haematobia irritans, E. sorbillans holds distant relationship with Drosophila. Phylogenetic topologies based on the amino acid sequences, supporting that E. sorbillans (Tachinidae) is clustered with strains of Calliphoridae and Oestridae, and superfamily Oestroidea are polyphyletic groups with Muscidae in a clade.

  5. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

    PubMed Central

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-01-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579

  6. Localization, structure and polymorphism of two paralogous Xenopus laevis mitochondrial malate dehydrogenase genes.

    PubMed

    Tlapakova, Tereza; Krylov, Vladimir; Macha, Jaroslav

    2005-01-01

    Two paralogous mitochondrial malate dehydrogenase 2 (Mdh2) genes of Xenopus laevis have been cloned and sequenced, revealing 95% identity. Fluorescence in-situ hybridization (FISH) combined with tyramide amplification discriminates both genes; Mdh2a was localized into chromosome q3 and Mdh2b into chromosome q8. One kb cDNA probes detect both genes with 85% accuracy. The remaining signals were on the paralogous counterpart. Introns interrupt coding sequences at the same nucleotide as defined for mouse. Restriction polymorphism has been detected in the first intron of Mdh2a, while the individual variability in intron 6 of Mdh2b gene is represented by an insertion of incomplete retrotransposon L1Xl. Rates of nucleotide substitutions indicate that both genes are under similar evolutionary constraints. X. laevis Mdh2 genes can be used as markers for physical mapping and linkage analysis.

  7. Complete mitochondrial genome of the monogonont rotifer, Brachionus koreanus (Rotifera, Brachionidae).

    PubMed

    Hwang, Dae-Sik; Suga, Koushirou; Sakakura, Yoshitaka; Park, Heum Gi; Hagiwara, Atsushi; Rhee, Jae-Sung; Lee, Jae-Seong

    2014-02-01

    The complete mitochondrial genome was obtained from the assembled genome data sequenced by next generation sequencing (NGS) technology from the monogonont rotifer Brachionus koreanus. The mitochondrial genome of B. koreanus was composed of two circular chromosomes designated as mtDNA-I (10,421 bp) and mtDNA-II (11,923 bp). The gene contents of B. koreanus were identical with previously reported B. plicatilis mitochondrial genomes. However, gene orders of B. koreanus showed one rearrangement between the two species. Of 12 protein-coding genes (PCGs), 3 genes (ATP6, ND1, and ND3) had an incomplete stop codon. The A + T base composition of B. koreanus mitochondrial genome was high (68.81%). They also showed anti-G bias (12.03% and 10.97%) on the second and third position of PCGs as well as slight anti-C bias (15.96% and 14.31%) on the first and third position of PCGs.

  8. Whole-Genome Sequencing Suggests Schizophrenia Risk Mechanisms in Humans with 22q11.2 Deletion Syndrome.

    PubMed

    Merico, Daniele; Zarrei, Mehdi; Costain, Gregory; Ogura, Lucas; Alipanahi, Babak; Gazzellone, Matthew J; Butcher, Nancy J; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Chow, Eva W C; Andrade, Danielle M; Frey, Brendan J; Marshall, Christian R; Scherer, Stephen W; Bassett, Anne S

    2015-09-16

    Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes. Copyright © 2015 Merico et al.

  9. Whole-Genome Sequencing Suggests Schizophrenia Risk Mechanisms in Humans with 22q11.2 Deletion Syndrome

    PubMed Central

    Merico, Daniele; Zarrei, Mehdi; Costain, Gregory; Ogura, Lucas; Alipanahi, Babak; Gazzellone, Matthew J.; Butcher, Nancy J.; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Chow, Eva W. C.; Andrade, Danielle M.; Frey, Brendan J.; Marshall, Christian R.; Scherer, Stephen W.; Bassett, Anne S.

    2015-01-01

    Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes. PMID:26384369

  10. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets.

    PubMed

    Springer, Mark S; Gatesy, John

    2018-02-26

    coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation).

  11. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets

    PubMed Central

    Springer, Mark S.; Gatesy, John

    2018-01-01

    Summary coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset—the ‘recombination ratchet’—is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d’etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation). PMID:29495400

  12. Phylogenomic analysis of the Chilean clade of Liolaemus lizards (Squamata: Liolaemidae) based on sequence capture data.

    PubMed

    Panzera, Alejandra; Leaché, Adam D; D'Elía, Guillermo; Victoriano, Pedro F

    2017-01-01

    The genus Liolaemus is one of the most ecologically diverse and species-rich genera of lizards worldwide. It currently includes more than 250 recognized species, which have been subject to many ecological and evolutionary studies. Nevertheless, Liolaemus lizards have a complex taxonomic history, mainly due to the incongruence between morphological and genetic data, incomplete taxon sampling, incomplete lineage sorting and hybridization. In addition, as many species have restricted and remote distributions, this has hampered their examination and inclusion in molecular systematic studies. The aims of this study are to infer a robust phylogeny for a subsample of lizards representing the Chilean clade (subgenus Liolaemus sensu stricto ), and to test the monophyly of several of the major species groups. We use a phylogenomic approach, targeting 541 ultra-conserved elements (UCEs) and 44 protein-coding genes for 16 taxa. We conduct a comparison of phylogenetic analyses using maximum-likelihood and several species tree inference methods. The UCEs provide stronger support for phylogenetic relationships compared to the protein-coding genes; however, the UCEs outnumber the protein-coding genes by 10-fold. On average, the protein-coding genes contain over twice the number of informative sites. Based on our phylogenomic analyses, all the groups sampled are polyphyletic. Liolaemus tenuis tenuis is difficult to place in the phylogeny, because only a few loci (nine) were recovered for this species. Topologies or support values did not change dramatically upon exclusion of L. t. tenuis from analyses, suggesting that missing data did not had a significant impact on phylogenetic inference in this data set. The phylogenomic analyses provide strong support for sister group relationships between L. fuscus , L. monticola , L. nigroviridis and L. nitidus , and L. platei and L. velosoi . Despite our limited taxon sampling, we have provided a reliable starting hypothesis for the relationships among many major groups of the Chilean clade of Liolaemus that will help future work aimed at resolving the Liolaemus phylogeny.

  13. Effects of unconventional breakup modes on incomplete fusion of weakly bound nuclei

    NASA Astrophysics Data System (ADS)

    Diaz-Torres, Alexis; Quraishi, Daanish

    2018-02-01

    The incomplete fusion dynamics of 6Li+209Bi collisions at energies above the Coulomb barrier is investigated. The classical dynamical model implemented in the platypus code is used to understand and quantify the impact of both 6Li resonance states and transfer-triggered breakup modes (involving short-lived projectile-like nuclei such as 8Be and 5Li) on the formation of incomplete fusion products. Model calculations explain the experimental incomplete-fusion excitation function fairly well, indicating that (i) delayed direct breakup of 6Li reduces the incomplete fusion cross sections and (ii) the neutron-stripping channel practically determines those cross sections.

  14. A computer program for estimation from incomplete multinomial data

    NASA Technical Reports Server (NTRS)

    Credeur, K. R.

    1978-01-01

    Coding is given for maximum likelihood and Bayesian estimation of the vector p of multinomial cell probabilities from incomplete data. Also included is coding to calculate and approximate elements of the posterior mean and covariance matrices. The program is written in FORTRAN 4 language for the Control Data CYBER 170 series digital computer system with network operating system (NOS) 1.1. The program requires approximately 44000 octal locations of core storage. A typical case requires from 72 seconds to 92 seconds on CYBER 175 depending on the value of the prior parameter.

  15. Influence of incomplete fusion on complete fusion at energies above the Coulomb barrier

    NASA Astrophysics Data System (ADS)

    Shuaib, Mohd; Sharma, Vijay R.; Yadav, Abhishek; Sharma, Manoj Kumar; Singh, Pushpendra P.; Singh, Devendra P.; Kumar, R.; Singh, R. P.; Muralithar, S.; Singh, B. P.; Prasad, R.

    2017-10-01

    In the present work, excitation functions of several reaction residues in the system 19F+169Tm, populated via the complete and incomplete fusion processes, have been measured using off-line γ-ray spectroscopy. The analysis of excitation functions has been done within the framework of statistical model code pace4. The excitation functions of residues populated via xn and pxn channels are found to be in good agreement with those estimated by the theoretical model code, which confirms the production of these residues solely via complete fusion process. However, a significant enhancement has been observed in the cross-sections of residues involving α-emitting channels as compared to the theoretical predictions. The observed enhancement in the cross-sections has been attributed to the incomplete fusion processes. In order to have a better insight into the onset and strength of incomplete fusion, the incomplete fusion strength function has been deduced. At present, there is no theoretical model available which can satisfactorily explain the incomplete fusion reaction data at energies ≈4-6 MeV/nucleon. In the present work, the influence of incomplete fusion on complete fusion in the 19F+169Tm system has also been studied. The measured cross-section data may be important for the development of reactor technology as well. It has been found that the incomplete fusion strength function strongly depends on the α-Q value of the projectile, which is found to be in good agreement with the existing literature data. The analysis strongly supports the projectile-dependent mass-asymmetry systematics. In order to study the influence of Coulomb effect ({Z}{{P}}{Z}{{T}}) on incomplete fusion, the deduced strength function for the present work is compared with the nearby projectile-target combinations. The incomplete fusion strength function is found to increase linearly with {Z}{{P}}{Z}{{T}}, indicating a strong influence of Coulomb effect in the incomplete fusion reactions.

  16. Thermal Timescale Mass Transfer In Binary Population Synthesis

    NASA Astrophysics Data System (ADS)

    Justham, S.; Kolb, U.

    2004-07-01

    Studies of binary evolution have, until recently, neglected thermal timescale mass transfer (TTMT). Recent work has suggested that this previously poorly studied area is crucial in the understanding of systems across the compact binary spectrum. We use the state-of-the-art binary population synthesis code BiSEPS (Willems and Kolb, 2002, MNRAS 337 1004-1016). However, the present treatment of TTMT is incomplete due to the nonlinear behaviour of stars in their departure from gravothermal `equilibrium'. Here we show work that should update the ultrafast stellar evolution algorithms within BiSEPS to make it the first pseudo-analytic code that can follow TTMT properly. We have generated fits to a set of over 300 Case B TTMT sequences with a range of intermediate-mass donors. These fits produce very good first approximations to both HR diagrams and mass-transfer rates (see figures 1 and 2), which we later hope to improve and extend. They are already a significant improvement over the previous fits.

  17. The complete mitochondrial genome of the Longnose skate: Raja rhina (Rajiformes, Rajidae).

    PubMed

    Jeong, Dageum; Lee, Youn-Ho

    2015-02-01

    The complete sequence of mitochondrial DNA of a longnose skate, Raja rhina was determined for the first time. It is 16,910 bp in length containing 2 rRNA, 22 tRNA and 13 protein coding genes with the same gene order and structure as those of other Rajidae species. The nucleotide of L-strand is composed of 30.1% A, 27.2% C, 28.5% T and 14.2% G, showing a slight A + T bias. The G is the least used base and markedly lower at the third codon position (5.4%). Twelve of the 13 protein coding genes use ATG as their start codon while the COX1 starts with GTG. As for stop codon, only ND4 shows incomplete stop codon TA. This mitogenome is the first report for a species of the genus Raja, and providing a valuable resource of genetic information for understanding the phylogenetic relationship and the evolution of the genus Raja as well as the family, Rajidae.

  18. Complete mitochondrial genome of the Yellownose skate: Zearaja chilensis (Rajiformes, Rajidae).

    PubMed

    Jeong, Dageum; Lee, Youn-Ho

    2016-01-01

    The complete sequence of mitochondrial DNA of a Yellownose skate, Zearaja chilensis was determined for the first time. It is 16,909 bp in length covering 2 rRNA, 22 tRNA and 13 protein coding genes with the identical gene order and structure as those of other Rajidae species. The nucleotide of L-strand is composed of low G (14.3%), and slightly high A + T (58.9%) nucleotides. The strong codon usage bias against the use of G (6.0%) is found at the third codon positions. Twelve of the 13 protein coding genes use ATG as the start codon while COX1 starts with GTG. As for the stop codon, only ND4 shows an incomplete stop codon TA. This is the first report of the mitogenome for a species in the genus Zearaja, providing a valuable source of genetic information on the evolution of the family Rajidae and the genus Zearaja as well as for establishment of a sustainble fishery management plan of the species.

  19. Disruption of hierarchical predictive coding during sleep

    PubMed Central

    Strauss, Melanie; Sitt, Jacobo D.; King, Jean-Remi; Elbaz, Maxime; Azizi, Leila; Buiatti, Marco; Naccache, Lionel; van Wassenhove, Virginie; Dehaene, Stanislas

    2015-01-01

    When presented with an auditory sequence, the brain acts as a predictive-coding device that extracts regularities in the transition probabilities between sounds and detects unexpected deviations from these regularities. Does such prediction require conscious vigilance, or does it continue to unfold automatically in the sleeping brain? The mismatch negativity and P300 components of the auditory event-related potential, reflecting two steps of auditory novelty detection, have been inconsistently observed in the various sleep stages. To clarify whether these steps remain during sleep, we recorded simultaneous electroencephalographic and magnetoencephalographic signals during wakefulness and during sleep in normal subjects listening to a hierarchical auditory paradigm including short-term (local) and long-term (global) regularities. The global response, reflected in the P300, vanished during sleep, in line with the hypothesis that it is a correlate of high-level conscious error detection. The local mismatch response remained across all sleep stages (N1, N2, and REM sleep), but with an incomplete structure; compared with wakefulness, a specific peak reflecting prediction error vanished during sleep. Those results indicate that sleep leaves initial auditory processing and passive sensory response adaptation intact, but specifically disrupts both short-term and long-term auditory predictive coding. PMID:25737555

  20. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

    PubMed

    Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory

    2017-12-01

    Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.

  1. Complete mitochondrial genome of the mottled skate: Raja pulchra (Rajiformes, Rajidae).

    PubMed

    Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Myoung, Jung-Goo; Lee, Youn-Ho

    2016-05-01

    The complete sequence of mitochondrial DNA of a mottled skate, Raja pulchra was sequenced as being circular molecules of 16,907 bp including 2 rRNA, 22 tRNA, 13 protein-coding genes (PCGs), and an AT-rich control region. The organization of the PCGs is the same as those found in other Rajidae species. The nucleotide of L-strand is composed of 29.8% A, 28.0% C, 27.9% T, and 14.3% G with a bias toward A + T slightly. Twelve of 13 PCGs are initiated by the ATG codon while COX1 starts with GTG. Only ND4 harbors the incomplete termination codon, TA. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA with the exception of [Formula: see text] which has a reduced DHU arm. This mitogenome will provide essential information for better phylogenetic resolution and precision of the family Rajidae and the genus Raja as well as for establishment of a fish stock recovery plan of the species.

  2. A discovery of novel microRNAs in the silkworm (Bombyx mori) genome.

    PubMed

    Yu, Xiaomin; Zhou, Qing; Cai, Yimei; Luo, Qibin; Lin, Hongbin; Hu, Songnian; Yu, Jun

    2009-12-01

    MicroRNAs (miRNAs) are pivotal regulators involved in various physiological and pathological processes via their post-transcriptional regulation of gene expressions. We sequenced 14 libraries of small RNAs constructed from samples spanning the life cycle of silkworms, and discovered 50 novel miRNAs previously not known in animals and verified 43 of them using stem-loop RT-PCR. Our genome-wide analyses of 27 species-specific miRNAs suggest they arise from transposable elements, protein-coding genes duplication/transposition and random foldback sequences; which is consistent with the idea that novel animal miRNAs may evolve from incomplete self-complementary transcripts and become fixed in the process of co-adaptation with their targets. Computational prediction suggests that the silkworm-specific miRNAs may have a preference of regulating genes that are related to life-cycle-associated traits, and these genes can serve as potential targets for subsequent studies of the modulating networks in the development of Bombyx mori.

  3. The nagA gene of Penicillium chrysogenum encoding beta-N-acetylglucosaminidase.

    PubMed

    Díez, Bruno; Rodríguez-Sáiz, Marta; de la Fuente, Juan Luis; Moreno, Miguel Angel; Barredo, José Luis

    2005-01-15

    We purified the beta-N-acetylglucosaminidase from the filamentous fungus Penicillium chrysogenum and its N-terminal sequence was determined, showing the presence of a mixture of two proteins (P1 and P2). A genomic DNA fragment was cloned by using degenerated oligonucleotides from the Nt sequences. The nucleotide sequence showed the presence of an ORF (nagA gene) lacking introns, with a length of 1791 bp, and coding for a protein of 66.5 kDa showing similarity to acetylglucosaminidases. The NagA deduced protein includes P1 and P2 as incomplete forms of the mature protein, and contains putative features for protein maturation: an 18-amino acid signal peptide, a KEX2 processing site, and four glycosylation motifs. The sequence just after the signal peptide corresponds to P2 and that after the KEX2 site to P1. The nagA transcript has a size of about 2.1 kb and is present until the end of the fermentation process for penicillin production. NagA is one of the most largely represented proteins in P. chrysogenum, increasing along the fermentation process. The suitability of the nagA promoter (PnagA) for gene expression in fungi was demonstrated by expressing the bleomycin resistance gene (ble(R)) from Streptoalloteichus hindustanus in P. chrysogenum.

  4. Crossing the LINE toward genomic instability: LINE-1 retrotransposition in cancer

    NASA Astrophysics Data System (ADS)

    Kemp, Jacqueline; Longworth, Michelle

    2015-12-01

    Retrotransposons are repetitive DNA sequences that are positioned throughout the human genome. Retrotransposons are capable of copying themselves and mobilizing new copies to novel genomic locations in a process called retrotransposition. While most retrotransposon sequences in the human genome are incomplete and incapable of mobilization, the LINE-1 retrotransposon, which comprises approximately 17% of the human genome, remains active. The disruption of cellular mechanisms that suppress retrotransposon activity is linked to the generation of aneuploidy, a potential driver of tumor development. When retrotransposons insert into a novel genomic region, they have the potential to disrupt the coding sequence of endogenous genes and alter gene expression, which can lead to deleterious consequences for the organism. Additionally, increased LINE-1 copy numbers provide more chances for recombination events to occur between retrotransposons, which can lead to chromosomal breaks and rearrangements. LINE-1 activity is increased in various cancer cell lines and in patient tissues resected from primary tumors. LINE-1 activity also correlates with increased cancer metastasis. This review aims to give a brief overview of the connections between LINE-1 retrotransposition and the loss of genome stability. We will also discuss the mechanisms that repress retrotransposition in human cells and their links to cancer.

  5. An incomplete assembly with thresholding algorithm for systems of reaction-diffusion equations in three space dimensions IAT for reaction-diffusion systems

    NASA Astrophysics Data System (ADS)

    Moore, Peter K.

    2003-07-01

    Solving systems of reaction-diffusion equations in three space dimensions can be prohibitively expensive both in terms of storage and CPU time. Herein, I present a new incomplete assembly procedure that is designed to reduce storage requirements. Incomplete assembly is analogous to incomplete factorization in that only a fixed number of nonzero entries are stored per row and a drop tolerance is used to discard small values. The algorithm is incorporated in a finite element method-of-lines code and tested on a set of reaction-diffusion systems. The effect of incomplete assembly on CPU time and storage and on the performance of the temporal integrator DASPK, algebraic solver GMRES and preconditioner ILUT is studied.

  6. Complete mitogenome of the semi-aquatic grasshopper Oxya intricate (Stål.) (Insecta: Orthoptera: Catantopidae).

    PubMed

    Dong, Jia-Jia; Guan, De-Long; Xu, Sheng-Quan

    2016-09-01

    The complete mitogenome of Oxya intricate (Stål.) has been reconstructed from whole-genome Illumina sequencing data with an average coverage of 294×. The circular genome is 15,466 bp in length, and consists of 22 transfer RNAs (tRNAs), 13 protein-coding genes (PCGs), 2 ribosomal RNAs (rRNAs) and 1 D-loop region. All PCGs are initiated with ATN codons, and are terminated with TAR codons except for ND5 with the incomplete stop codon T. The nucleotide composition is asymmetric (42.5%A, 14.6%C, 10.6%G, 32.3%T) with an overall GC content of 25.2%. These data would contribute to the design of novel molecular markers for population and evolutionary studies of this and related orthopteran species.

  7. The complete mitochondrial genome of the longhorn beetle Xylotrechus grayii (Coleoptera: Cerambycidae).

    PubMed

    Guo, Kun; Chen, Jun; Xu, Chang-Qing; Qiao, Hai-Li; Xu, Rong; Zhao, Xiang-Jian

    2016-05-01

    We sequenced the complete mitochondrial genome of the longhorn beetle, Xylotrechus grayii. The total length of the X. grayii mitogenome was 15,540 bp with an A + T content of 75.29%, consisting of 13 protein-coding genes (PCGs), 22 tRNA genes, 2 rRNA genes and an A + T-rich region. All the genes were arranged in the same order as that of the ancestral insect. All PCGs started with a typical ATN codon except for cox1 and nad1, which used TTG as start codon. Ten out of 13 PCGs terminated with incomplete codons (TA or T). The A + T-rich region was 893 bp in length with an A + T content of 85.89 %.

  8. Deleterious ABCA7 mutations and transcript rescue mechanisms in early onset Alzheimer's disease.

    PubMed

    De Roeck, Arne; Van den Bossche, Tobi; van der Zee, Julie; Verheijen, Jan; De Coster, Wouter; Van Dongen, Jasper; Dillen, Lubina; Baradaran-Heravi, Yalda; Heeman, Bavo; Sanchez-Valle, Raquel; Lladó, Albert; Nacmias, Benedetta; Sorbi, Sandro; Gelpi, Ellen; Grau-Rivera, Oriol; Gómez-Tortosa, Estrella; Pastor, Pau; Ortega-Cubero, Sara; Pastor, Maria A; Graff, Caroline; Thonberg, Håkan; Benussi, Luisa; Ghidoni, Roberta; Binetti, Giuliano; de Mendonça, Alexandre; Martins, Madalena; Borroni, Barbara; Padovani, Alessandro; Almeida, Maria Rosário; Santana, Isabel; Diehl-Schmid, Janine; Alexopoulos, Panagiotis; Clarimon, Jordi; Lleó, Alberto; Fortea, Juan; Tsolaki, Magda; Koutroumani, Maria; Matěj, Radoslav; Rohan, Zdenek; De Deyn, Peter; Engelborghs, Sebastiaan; Cras, Patrick; Van Broeckhoven, Christine; Sleegers, Kristel

    2017-09-01

    Premature termination codon (PTC) mutations in the ATP-Binding Cassette, Sub-Family A, Member 7 gene (ABCA7) have recently been identified as intermediate-to-high penetrant risk factor for late-onset Alzheimer's disease (LOAD). High variability, however, is observed in downstream ABCA7 mRNA and protein expression, disease penetrance, and onset age, indicative of unknown modifying factors. Here, we investigated the prevalence and disease penetrance of ABCA7 PTC mutations in a large early onset AD (EOAD)-control cohort, and examined the effect on transcript level with comprehensive third-generation long-read sequencing. We characterized the ABCA7 coding sequence with next-generation sequencing in 928 EOAD patients and 980 matched control individuals. With MetaSKAT rare variant association analysis, we observed a fivefold enrichment (p = 0.0004) of PTC mutations in EOAD patients (3%) versus controls (0.6%). Ten novel PTC mutations were only observed in patients, and PTC mutation carriers in general had an increased familial AD load. In addition, we observed nominal risk reducing trends for three common coding variants. Seven PTC mutations were further analyzed using targeted long-read cDNA sequencing on an Oxford Nanopore MinION platform. PTC-containing transcripts for each investigated PTC mutation were observed at varying proportion (5-41% of the total read count), implying incomplete nonsense-mediated mRNA decay (NMD). Furthermore, we distinguished and phased several previously unknown alternative splicing events (up to 30% of transcripts). In conjunction with PTC mutations, several of these novel ABCA7 isoforms have the potential to rescue deleterious PTC effects. In conclusion, ABCA7 PTC mutations play a substantial role in EOAD, warranting genetic screening of ABCA7 in genetically unexplained patients. Long-read cDNA sequencing revealed both varying degrees of NMD and transcript-modifying events, which may influence ABCA7 dosage, disease severity, and may create opportunities for therapeutic interventions in AD.

  9. Java Source Code Analysis for API Migration to Embedded Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Winter, Victor; McCoy, James A.; Guerrero, Jonathan

    Embedded systems form an integral part of our technological infrastructure and oftentimes play a complex and critical role within larger systems. From the perspective of reliability, security, and safety, strong arguments can be made favoring the use of Java over C in such systems. In part, this argument is based on the assumption that suitable subsets of Java’s APIs and extension libraries are available to embedded software developers. In practice, a number of Java-based embedded processors do not support the full features of the JVM. For such processors, source code migration is a mechanism by which key abstractions offered bymore » APIs and extension libraries can made available to embedded software developers. The analysis required for Java source code-level library migration is based on the ability to correctly resolve element references to their corresponding element declarations. A key challenge in this setting is how to perform analysis for incomplete source-code bases (e.g., subsets of libraries) from which types and packages have been omitted. This article formalizes an approach that can be used to extend code bases targeted for migration in such a manner that the threats associated the analysis of incomplete code bases are eliminated.« less

  10. Accurate Classification of RNA Structures Using Topological Fingerprints

    PubMed Central

    Li, Kejie; Gribskov, Michael

    2016-01-01

    While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint. PMID:27755571

  11. The complete mitochondrial genome of the Korean skate: Hongeo koreana (Rajiformes, Rajidae).

    PubMed

    Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Lee, Youn-Ho

    2014-12-01

    The complete mitochondrial genome of the Korean skate, Hongeo koreana, the sole member of its genus, is investigated for the first time. The genome consists of 16,906 bp in length including 2 rRNA, 22 tRNA and 13 protein coding genes with the same gene order and structure of the genome as those of other Rajidae species. The overall nucleotide composition of the L-strand is A = 29.8%, C = 27.9%, T = 27.9% and G = 14.3%, showing a high A + T bias. The anti-G bias (6.0%) is more significant in the third codon position. Twelve of the 13 protein-coding genes use ATG as their start codon while the COX1 gene starts with GTG. For stop codon, ND3 and ND4 genes show incomplete stop codon T. The mitogenome sequence of H. koreana will provide important information on the evolution and the phylogenetic relation of the genus Hongeo in relation to the other genera of the family Rajidae.

  12. Whole Genome Complete Resequencing of Bacillus subtilis Natto by Combining Long Reads with High-Quality Short Reads

    PubMed Central

    Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi

    2014-01-01

    De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome. PMID:25329997

  13. Ancient DNA sequence revealed by error-correcting codes.

    PubMed

    Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

    2015-07-10

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

  14. Ancient DNA sequence revealed by error-correcting codes

    PubMed Central

    Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

    2015-01-01

    A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228

  15. Two-terminal video coding.

    PubMed

    Yang, Yang; Stanković, Vladimir; Xiong, Zixiang; Zhao, Wei

    2009-03-01

    Following recent works on the rate region of the quadratic Gaussian two-terminal source coding problem and limit-approaching code designs, this paper examines multiterminal source coding of two correlated, i.e., stereo, video sequences to save the sum rate over independent coding of both sequences. Two multiterminal video coding schemes are proposed. In the first scheme, the left sequence of the stereo pair is coded by H.264/AVC and used at the joint decoder to facilitate Wyner-Ziv coding of the right video sequence. The first I-frame of the right sequence is successively coded by H.264/AVC Intracoding and Wyner-Ziv coding. An efficient stereo matching algorithm based on loopy belief propagation is then adopted at the decoder to produce pixel-level disparity maps between the corresponding frames of the two decoded video sequences on the fly. Based on the disparity maps, side information for both motion vectors and motion-compensated residual frames of the right sequence are generated at the decoder before Wyner-Ziv encoding. In the second scheme, source splitting is employed on top of classic and Wyner-Ziv coding for compression of both I-frames to allow flexible rate allocation between the two sequences. Experiments with both schemes on stereo video sequences using H.264/AVC, LDPC codes for Slepian-Wolf coding of the motion vectors, and scalar quantization in conjunction with LDPC codes for Wyner-Ziv coding of the residual coefficients give a slightly lower sum rate than separate H.264/AVC coding of both sequences at the same video quality.

  16. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development

    PubMed Central

    2011-01-01

    Background We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. Results The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Conclusions Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution. PMID:21854559

  17. Exome capture from the spruce and pine giga-genomes.

    PubMed

    Suren, H; Hodgins, K A; Yeaman, S; Nurkowski, K A; Smets, P; Rieseberg, L H; Aitken, S N; Holliday, J A

    2016-09-01

    Sequence capture is a flexible tool for generating reduced representation libraries, particularly in species with massive genomes. We used an exome capture approach to sequence the gene space of two of the dominant species in Canadian boreal and montane forests - interior spruce (Picea glauca x engelmanii) and lodgepole pine (Pinus contorta). Transcriptome data generated with RNA-seq were coupled with draft genome sequences to design baits corresponding to 26 824 genes from pine and 28 649 genes from spruce. A total of 579 samples for spruce and 631 samples for pine were included, as well as two pine congeners and six spruce congeners. More than 50% of targeted regions were sequenced at >10× depth in each species, while ~12% captured near-target regions within 500 bp of a bait position were sequenced to a depth >10×. Much of our read data arose from off-target regions, which was likely due to the fragmented and incomplete nature of the draft genome assemblies. Capture in general was successful for the related species, suggesting that baits designed for a single species are likely to successfully capture sequences from congeners. From these data, we called approximately 10 million SNPs and INDELs in each species from coding regions, introns, untranslated and flanking regions, as well as from the intergenic space. Our study demonstrates the utility of sequence capture for resequencing in complex conifer genomes, suggests guidelines for improving capture efficiency and provides a rich resource of genetic variants for studies of selection and local adaptation in these species. © 2016 John Wiley & Sons Ltd.

  18. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.

    PubMed

    Renfree, Marilyn B; Papenfuss, Anthony T; Deakin, Janine E; Lindsay, James; Heider, Thomas; Belov, Katherine; Rens, Willem; Waters, Paul D; Pharo, Elizabeth A; Shaw, Geoff; Wong, Emily S W; Lefèvre, Christophe M; Nicholas, Kevin R; Kuroki, Yoko; Wakefield, Matthew J; Zenger, Kyall R; Wang, Chenwei; Ferguson-Smith, Malcolm; Nicholas, Frank W; Hickford, Danielle; Yu, Hongshi; Short, Kirsty R; Siddle, Hannah V; Frankenberg, Stephen R; Chew, Keng Yih; Menzies, Brandon R; Stringer, Jessica M; Suzuki, Shunsuke; Hore, Timothy A; Delbridge, Margaret L; Patel, Hardip R; Mohammadi, Amir; Schneider, Nanette Y; Hu, Yanqiu; O'Hara, William; Al Nadaf, Shafagh; Wu, Chen; Feng, Zhi-Ping; Cocks, Benjamin G; Wang, Jianghui; Flicek, Paul; Searle, Stephen M J; Fairley, Susan; Beal, Kathryn; Herrero, Javier; Carone, Dawn M; Suzuki, Yutaka; Sugano, Sumio; Toyoda, Atsushi; Sakaki, Yoshiyuki; Kondo, Shinji; Nishida, Yuichiro; Tatsumoto, Shoji; Mandiou, Ion; Hsu, Arthur; McColl, Kaighin A; Lansdell, Benjamin; Weinstock, George; Kuczek, Elizabeth; McGrath, Annette; Wilson, Peter; Men, Artem; Hazar-Rethinam, Mehlika; Hall, Allison; Davis, John; Wood, David; Williams, Sarah; Sundaravadanam, Yogi; Muzny, Donna M; Jhangiani, Shalini N; Lewis, Lora R; Morgan, Margaret B; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Nazareth, Lynne; Cree, Andrew; Fowler, Gerald; Kovar, Christie L; Dinh, Huyen H; Joshi, Vandita; Jing, Chyn; Lara, Fremiet; Thornton, Rebecca; Chen, Lei; Deng, Jixin; Liu, Yue; Shen, Joshua Y; Song, Xing-Zhi; Edson, Janette; Troon, Carmen; Thomas, Daniel; Stephens, Amber; Yapa, Lankesha; Levchenko, Tanya; Gibbs, Richard A; Cooper, Desmond W; Speed, Terence P; Fujiyama, Asao; Graves, Jennifer A M; O'Neill, Rachel J; Pask, Andrew J; Forrest, Susan M; Worley, Kim C

    2011-08-29

    We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.

  19. HYBRIDCHECK: software for the rapid detection, visualization and dating of recombinant regions in genome sequence data.

    PubMed

    Ward, Ben J; van Oosterhout, Cock

    2016-03-01

    HYBRIDCHECK is a software package to visualize the recombination signal in large DNA sequence data set, and it can be used to analyse recombination, genetic introgression, hybridization and horizontal gene transfer. It can scan large (multiple kb) contigs and whole-genome sequences of three or more individuals. HYBRIDCHECK is written in the r software for OS X, Linux and Windows operating systems, and it has a simple graphical user interface. In addition, the r code can be readily incorporated in scripts and analysis pipelines. HYBRIDCHECK implements several ABBA-BABA tests and visualizes the effects of hybridization and the resulting mosaic-like genome structure in high-density graphics. The package also reports the following: (i) the breakpoint positions, (ii) the number of mutations in each introgressed block, (iii) the probability that the identified region is not caused by recombination and (iv) the estimated age of each recombination event. The divergence times between the donor and recombinant sequence are calculated using a JC, K80, F81, HKY or GTR correction, and the dating algorithm is exceedingly fast. By estimating the coalescence time of introgressed blocks, it is possible to distinguish between hybridization and incomplete lineage sorting. HYBRIDCHECK is libré software and it and its manual are free to download from http://ward9250.github.io/HybridCheck/. © 2015 John Wiley & Sons Ltd.

  20. Information preserving coding for multispectral data

    NASA Technical Reports Server (NTRS)

    Duan, J. R.; Wintz, P. A.

    1973-01-01

    A general formulation of the data compression system is presented. A method of instantaneous expansion of quantization levels by reserving two codewords in the codebook to perform a folding over in quantization is implemented for error free coding of data with incomplete knowledge of the probability density function. Results for simple DPCM with folding and an adaptive transform coding technique followed by a DPCM technique are compared using ERTS-1 data.

  1. Characterization of 25 full-length S-RNase alleles, including flanking regions, from a pool of resequenced apple cultivars.

    PubMed

    De Franceschi, Paolo; Bianco, Luca; Cestaro, Alessandro; Dondini, Luca; Velasco, Riccardo

    2018-06-01

    Data obtained from Illumina resequencing of 63 apple cultivars were used to obtain full-length S-RNase sequences using a strategy based on both alignment and de novo assembly of reads. The reproductive biology of apple is regulated by the S-RNase-based gametophytic self-incompatibility system, that is genetically controlled by the single, multi-genic and multi-allelic S locus. Resequencing of apple cultivars provided a huge amount of genetic data, that can be aligned to the reference genome in order to characterize variation to a genome-wide level. However, this approach is not immediately adaptable to the S-locus, due to some peculiar features such as the high degree of polymorphism, lack of colinearity between haplotypes and extensive presence of repetitive elements. In this study we describe a dedicated procedure aimed at characterizing S-RNase alleles from resequenced cultivars. The S-genotype of 63 apple accessions is reported; the full length coding sequence was determined for the 25 S-RNase alleles present in the 63 resequenced cultivars; these included 10 previously incomplete sequences (S 5 , S 6a , S 6b , S 8 , S 11 , S 23 , S 39 , S 46 , S 50 and S 58 ). Moreover, sequence divergence clearly suggests that alleles S 6a and S 6b , proposed to be neutral variants of the same alleles, should be instead considered different specificities. The promoter sequences have also been analyzed, highlighting regions of homology conserved among all the alleles.

  2. Role of tunnelling in complete and incomplete fusion induced by 9Be on 169Tm and 187Re targets at around barrier energies

    NASA Astrophysics Data System (ADS)

    Kharab, Rajesh; Chahal, Rajiv; Kumar, Rajiv

    2017-04-01

    We have analyzed the complete and incomplete fusion excitation function for 9Be +169Tm, 187Re reactions at around barrier energies using the code PLATYPUS based on classical dynamical model. The quantum mechanical tunnelling correction is incorporated at near and sub barrier energies which significantly improves the matching between the data and prediction.

  3. The mitochondrial genome of the quiet-calling katydids, Xizicus fascipes (Orthoptera: Tettigoniidae: Meconematinae).

    PubMed

    Yang, Ming Ru; Zhou, Zhi Jun; Chang, Yan Lin; Zhao, Le Hong

    2012-08-01

    To help determine whether the typical arthropod arrangement was a synapomorphy for the whole Tettigoniidae, we sequenced the mitochondrial genome (mitogenome) of the quiet-calling katydids, Xizicus fascipes (Orthoptera: Tettigoniidae: Meconematinae). The 16,166-bp nucleotide sequences of X. fascipes mitogenome contains the typical gene content, gene order, base composition, and codon usage found in arthropod mitogenomes. As a whole, the X. fascipes mitogenome contains a lower A+T content (70.2%) found in the complete orthopteran mitogenomes determined to date. All protein-coding genes started with a typical ATN codon. Ten of the 13 protein-coding genes have a complete termination codon, but the remaining three genes (COIII, ND5 and ND4) terminate with incomplete T. All tRNAs have the typical clover-leaf structure of mitogenome tRNA, except for tRNA(Ser(AGN)), in which lengthened anticodon stem (9 bp) with a bulged nuleotide in the middle, an unusual T-stem (6 bp in constrast to the normal 5 bp), a mini DHU arm (2 bp) and no connector nucleotides. In the A+T-rich region, two (TA)n conserved blocks that were previously described in Ensifera and two 150-bp tandem repeats plus a partial copy of the composed at 61 bp of the beginning were present. Phylogenetic analysis found: i) the monophyly of Conocephalinae was interrupted by Elimaea cheni from Phaneropterinae; and ii) Meconematinae was the most basal group among these five subfamilies.

  4. A murC gene in Porphyromonas gingivalis 381.

    PubMed

    Ansai, T; Yamashita, Y; Awano, S; Shibata, Y; Wachi, M; Nagai, K; Takehara, T

    1995-09-01

    The gene encoding a 51 kDa polypeptide of Porphyromonas gingivalis 381 was isolated by immunoblotting using an antiserum raised against P. gingivalis alkaline phosphatase. DNA sequence analysis of a 2.5 kb DNA fragment containing a gene encoding the 51 kDa protein revealed one complete and two incomplete ORFs. Database searches using the FASTA program revealed significant homology between the P. gingivalis 51 kDa protein and the MurC protein of Escherichia coli, which functions in peptidoglycan synthesis. The cloned 51 kDa protein encoded a functional product that complemented an E. coli murC mutant. Moreover, the ORF just upstream of murC coded for a protein that was 31% homologous with the E. coli MurG protein. The ORF just downstream of murC coded for a protein that was 17% homologous with the Streptococcus pneumoniae penicillin-binding protein 2B (PBP2B), which functions in peptidoglycan synthesis and is responsible for antibiotic resistance. These results suggest that P. gingivalis contains a homologue of the E. coli peptidoglycan synthesis gene murC and indicate the possibility of a cluster of genes responsible for cell division and cell growth, as in the E. coli mra region.

  5. A symmetry model for genetic coding via a wallpaper group composed of the traditional four bases and an imaginary base E: towards category theory-like systematization of molecular/genetic biology.

    PubMed

    Sawamura, Jitsuki; Morishita, Shigeru; Ishigooka, Jun

    2014-05-07

    Previously, we suggested prototypal models that describe some clinical states based on group postulates. Here, we demonstrate a group/category theory-like model for molecular/genetic biology as an alternative application of our previous model. Specifically, we focus on deoxyribonucleic acid (DNA) base sequences. We construct a wallpaper pattern based on a five-letter cruciform motif with letters C, A, T, G, and E. Whereas the first four letters represent the standard DNA bases, the fifth is introduced for ease in formulating group operations that reproduce insertions and deletions of DNA base sequences. A basic group Z5 = {r, u, d, l, n} of operations is defined for the wallpaper pattern, with which a sequence of points can be generated corresponding to changes of a base in a DNA sequence by following the orbit of a point of the pattern under operations in group Z5. Other manipulations of DNA sequence can be treated using a vector-like notation 'Dj' corresponding to a DNA sequence but based on the five-letter base set; also, 'Dj's are expressed graphically. Insertions and deletions of a series of letters 'E' are admitted to assist in describing DNA recombination. Likewise, a vector-like notation Rj can be constructed for sequences of ribonucleic acid (RNA). The wallpaper group B = {Z5×∞, ●} (an ∞-fold Cartesian product of Z5) acts on Dj (or Rj) yielding changes to Dj (or Rj) denoted by 'Dj◦B(j→k) = Dk' (or 'Rj◦B(j→k) = Rk'). Based on the operations of this group, two types of groups-a modulo 5 linear group and a rotational group over the Gaussian plane, acting on the five bases-are linked as parts of the wallpaper group for broader applications. As a result, changes, insertions/deletions and DNA (RNA) recombination (partial/total conversion) are described. As an exploratory study, a notation for the canonical "central dogma" via a category theory-like way is presented for future developments. Despite the large incompleteness of our methodology, there is fertile ground to consider a symmetry model for genetic coding based on our specific wallpaper group. A more integrated formulation containing "central dogma" for future molecular/genetic biology remains to be explored.

  6. Improving performance of DS-CDMA systems using chaotic complex Bernoulli spreading codes

    NASA Astrophysics Data System (ADS)

    Farzan Sabahi, Mohammad; Dehghanfard, Ali

    2014-12-01

    The most important goal of spreading spectrum communication system is to protect communication signals against interference and exploitation of information by unintended listeners. In fact, low probability of detection and low probability of intercept are two important parameters to increase the performance of the system. In Direct Sequence Code Division Multiple Access (DS-CDMA) systems, these properties are achieved by multiplying the data information in spreading sequences. Chaotic sequences, with their particular properties, have numerous applications in constructing spreading codes. Using one-dimensional Bernoulli chaotic sequence as spreading code is proposed in literature previously. The main feature of this sequence is its negative auto-correlation at lag of 1, which with proper design, leads to increase in efficiency of the communication system based on these codes. On the other hand, employing the complex chaotic sequences as spreading sequence also has been discussed in several papers. In this paper, use of two-dimensional Bernoulli chaotic sequences is proposed as spreading codes. The performance of a multi-user synchronous and asynchronous DS-CDMA system will be evaluated by applying these sequences under Additive White Gaussian Noise (AWGN) and fading channel. Simulation results indicate improvement of the performance in comparison with conventional spreading codes like Gold codes as well as similar complex chaotic spreading sequences. Similar to one-dimensional Bernoulli chaotic sequences, the proposed sequences also have negative auto-correlation. Besides, construction of complex sequences with lower average cross-correlation is possible with the proposed method.

  7. Exome sequencing for prenatal diagnosis of fetuses with sonographic abnormalities.

    PubMed

    Drury, Suzanne; Williams, Hywel; Trump, Natalie; Boustred, Christopher; Lench, Nicholas; Scott, Richard H; Chitty, Lyn S

    2015-10-01

    In the absence of aneuploidy or other pathogenic cytogenetic abnormality, fetuses with increased nuchal translucency (NT ≥ 3.5 mm) and/or other sonographic abnormalities have a greater incidence of genetic syndromes, but defining the underlying pathology can be challenging. Here, we investigate the value of whole exome sequencing in fetuses with sonographic abnormalities but normal microarray analysis. Whole exome sequencing was performed on DNA extracted from chorionic villi or amniocytes in 24 fetuses with unexplained ultrasound findings. In the first 14 cases sequencing was initially performed on fetal DNA only. For the remaining 10, the trio of fetus, mother and father was sequenced simultaneously. In 21% (5/24) cases, exome sequencing provided definitive diagnoses (Milroy disease, hypophosphatasia, achondrogenesis type 2, Freeman-Sheldon syndrome and Baraitser-Winter Syndrome). In a further case, a plausible diagnosis of orofaciodigital syndrome type 6 was made. In two others, a single mutation in an autosomal recessive gene was identified, but incomplete sequencing coverage precluded exclusion of the presence of a second mutation. Whole exome sequencing improves prenatal diagnosis in euploid fetuses with abnormal ultrasound scans. In order to expedite interpretation of results, trio sequencing should be employed, but interpretation can still be compromised by incomplete coverage of relevant genes. © 2015 John Wiley & Sons, Ltd.

  8. 12 CFR Appendix A to Part 203 - Form and Instructions for Completion of HMDA Loan/Application Register

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... conduct or sponsor, and an organization is not required to respond to, a collection of information unless... application incomplete Code 8—Mortgage insurance denied Code 9—Other 2. If your institution uses the model... institution is regulated by the Department of Housing and Urban Development, then you should use the Internet...

  9. Sequence analysis of the PIP5K locus in Eimeria maxima provides further evidence for eimerian genome plasticity and segmental organization.

    PubMed

    Song, B K; Pan, M Z; Lau, Y L; Wan, K L

    2014-07-29

    Commercial flocks infected by Eimeria species parasites, including Eimeria maxima, have an increased risk of developing clinical or subclinical coccidiosis; an intestinal enteritis associated with increased mortality rates in poultry. Currently, infection control is largely based on chemotherapy or live vaccines; however, drug resistance is common and vaccines are relatively expensive. The development of new cost-effective intervention measures will benefit from unraveling the complex genetic mechanisms that underlie host-parasite interactions, including the identification and characterization of genes encoding proteins such as phosphatidylinositol 4-phosphate 5-kinase (PIP5K). We previously identified a PIP5K coding sequence within the E. maxima genome. In this study, we analyzed two bacterial artificial chromosome clones presenting a ~145-kb E. maxima (Weybridge strain) genomic region spanning the PIP5K gene locus. Sequence analysis revealed that ~95% of the simple sequence repeats detected were located within regions comparable to the previously described feature-rich segments of the Eimeria tenella genome. Comparative sequence analysis with the orthologous E. maxima (Houghton strain) region revealed a moderate level of conserved synteny. Unique segmental organizations and telomere-like repeats were also observed in both genomes. A number of incomplete transposable elements were detected and further scrutiny of these elements in both orthologous segments revealed interesting nesting events, which may play a role in facilitating genome plasticity in E. maxima. The current analysis provides more detailed information about the genome organization of E. maxima and may help to reveal genotypic differences that are important for expression of traits related to pathogenicity and virulence.

  10. Whole exome sequencing in an Italian family with isolated maxillary canine agenesis and canine eruption anomalies.

    PubMed

    Barbato, Ersilia; Traversa, Alice; Guarnieri, Rosanna; Giovannetti, Agnese; Genovesi, Maria Luce; Magliozzi, Maria Rosa; Paolacci, Stefano; Ciolfi, Andrea; Pizzi, Simone; Di Giorgio, Roberto; Tartaglia, Marco; Pizzuti, Antonio; Caputo, Viviana

    2018-07-01

    The aim of this study was the clinical and molecular characterization of a family segregating a trait consisting of a phenotype specifically involving the maxillary canines, including agenesis, impaction and ectopic eruption, characterized by incomplete penetrance and variable expressivity. Clinical standardized assessment of 14 family members and a whole-exome sequencing (WES) of three affected subjects were performed. WES data analyses (sequence alignment, variant calling, annotation and prioritization) were carried out using an in-house implemented pipeline. Variant filtering retained coding and splice-site high quality private and rare variants. Variant prioritization was performed taking into account both the disruptive impact and the biological relevance of individual variants and genes. Sanger sequencing was performed to validate the variants of interest and to carry out segregation analysis. Prioritization of variants "by function" allowed the identification of multiple variants contributing to the trait, including two concomitant heterozygous variants in EDARADD (c.308C>T, p.Ser103Phe) and COL5A1 (c.1588G>A, p.Gly530Ser), specifically associated with a more severe phenotype (i.e. canine agenesis). Differently, heterozygous variants in genes encoding proteins with a role in the WNT pathway were shared by subjects showing a phenotype of impacted/ectopic erupted canines. This study characterized the genetic contribution underlying a complex trait consisting of isolated canine anomalies in a medium-sized family, highlighting the role of WNT and EDA cell signaling pathways in tooth development. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Phylogenetic Relationships and Species Delimitation in Pinus Section Trifoliae Inferrred from Plastid DNA

    PubMed Central

    Hernández-León, Sergio; Gernandt, David S.; Pérez de la Rosa, Jorge A.; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities. PMID:23936218

  12. Phylogenetic relationships and species delimitation in pinus section trifoliae inferrred from plastid DNA.

    PubMed

    Hernández-León, Sergio; Gernandt, David S; Pérez de la Rosa, Jorge A; Jardón-Barbolla, Lev

    2013-01-01

    Recent diversification followed by secondary contact and hybridization may explain complex patterns of intra- and interspecific morphological and genetic variation in the North American hard pines (Pinus section Trifoliae), a group of approximately 49 tree species distributed in North and Central America and the Caribbean islands. We concatenated five plastid DNA markers for an average of 3.9 individuals per putative species and assessed the suitability of the five regions as DNA bar codes for species identification, species delimitation, and phylogenetic reconstruction. The ycf1 gene accounted for the greatest proportion of the alignment (46.9%), the greatest proportion of variable sites (74.9%), and the most unique sequences (75 haplotypes). Phylogenetic analysis recovered clades corresponding to subsections Australes, Contortae, and Ponderosae. Sequences for 23 of the 49 species were monophyletic and sequences for another 9 species were paraphyletic. Morphologically similar species within subsections usually grouped together, but there were exceptions consistent with incomplete lineage sorting or introgression. Bayesian relaxed molecular clock analyses indicated that all three subsections diversified relatively recently during the Miocene. The general mixed Yule-coalescent method gave a mixed model estimate of only 22 or 23 evolutionary entities for the plastid sequences, which corresponds to less than half the 49 species recognized based on morphological species assignments. Including more unique haplotypes per species may result in higher estimates, but low mutation rates, recent diversification, and large effective population sizes may limit the effectiveness of this method to detect evolutionary entities.

  13. phiC31 Integrase-Mediated Site-Specific Recombination in Barley

    PubMed Central

    Rubtsova, Myroslava; Kumlehn, Jochen; Gils, Mario

    2012-01-01

    The Streptomyces phage phiC31 integrase was tested for its feasibility in excising transgenes from the barley genome through site-specific recombination. We produced transgenic barley plants expressing an active phiC31 integrase and crossed them with transgenic barley plants carrying a target locus for recombination. The target sequence involves a reporter gene encoding green fluorescent protein (GFP), which is flanked by the attB and attP recognition sites for the phiC31 integrase. This sequence disruptively separates a gusA coding sequence from an upstream rice actin promoter. We succeeded in producing site-specific recombination events in the hybrid progeny of 11 independent barley plants carrying the above target sequence after crossing with plants carrying a phiC31 expression cassette. Some of the hybrids displayed fully executed recombination. Excision of the GFP gene fostered activation of the gusA gene, as visualized in tissue of hybrid plants by histochemical staining. The recombinant loci were detected in progeny of selfed F1, even in individuals lacking the phiC31 transgene, which provides evidence of stability and generative transmission of the recombination events. In several plants that displayed incomplete recombination, extrachromosomal excision circles were identified. Besides the technical advance achieved in this study, the generated phiC31 integrase-expressing barley plants provide foundational stock material for use in future approaches to barley genetic improvement, such as the production of marker-free transgenic plants or switching transgene activity. PMID:23024817

  14. [Transposition errors during learning to reproduce a sequence by the right- and the left-hand movements: simulation of positional and movement coding].

    PubMed

    Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N

    2012-01-01

    Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.

  15. Informational structure of genetic sequences and nature of gene splicing

    NASA Astrophysics Data System (ADS)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  16. PROKR2 and PROK2 mutations cause isolated congenital anosmia without gonadotropic deficiency.

    PubMed

    Moya-Plana, Antoine; Villanueva, Carine; Laccourreye, Ollivier; Bonfils, Pierre; de Roux, Nicolas

    2013-01-01

    Isolated congenital anosmia (ICA) is a rare phenotype defined as absent recall of any olfactory sensations since birth and the absence of any disease known to cause anosmia. Although most cases of ICA are sporadic, reports of familial cases suggest a genetic cause. ICA due to olfactory bulb agenesis and associated to hypogonadotropic hypogonadism defines Kallmann syndrome (KS), in which several gene defects have been described. In KS families, the phenotype may be restricted to ICA. We therefore hypothesized that mutations in KS genes cause ICA in patients, even in the absence of family history of reproduction disorders. In 25 patients with ICA and olfactory bulb agenesis, a detailed phenotype analysis was conducted and the coding sequences of KAL1, FGFR1, FGF8, PROKR2, and PROK2 were sequenced. Three PROKR2 mutations previously described in KS and one new PROK2 mutation were found. Investigation of the families showed incomplete penetrance of these mutations. This study is the first to report genetic causes of ICA and indicates that KS genes must be screened in patients with ICA. It also confirms the considerable complexity of GNRH neuron development in humans.

  17. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    NASA Astrophysics Data System (ADS)

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  18. SEQassembly: A Practical Tools Program for Coding Sequences Splicing

    NASA Astrophysics Data System (ADS)

    Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

    CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.

  19. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  20. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  1. Genome-Wide Networks of Amino Acid Covariances Are Common among Viruses

    PubMed Central

    Donlin, Maureen J.; Szeto, Brandon; Gohara, David W.; Aurora, Rajeev

    2012-01-01

    Coordinated variation among positions in amino acid sequence alignments can reveal genetic dependencies at noncontiguous positions, but methods to assess these interactions are incompletely developed. Previously, we found genome-wide networks of covarying residue positions in the hepatitis C virus genome (R. Aurora, M. J. Donlin, N. A. Cannon, and J. E. Tavis, J. Clin. Invest. 119:225–236, 2009). Here, we asked whether such networks are present in a diverse set of viruses and, if so, what they may imply about viral biology. Viral sequences were obtained for 16 viruses in 13 species from 9 families. The entire viral coding potential for each virus was aligned, all possible amino acid covariances were identified using the observed-minus-expected-squared algorithm at a false-discovery rate of ≤1%, and networks of covariances were assessed using standard methods. Covariances that spanned the viral coding potential were common in all viruses. In all cases, the covariances formed a single network that contained essentially all of the covariances. The hepatitis C virus networks had hub-and-spoke topologies, but all other networks had random topologies with an unusually large number of highly connected nodes. These results indicate that genome-wide networks of genetic associations and the coordinated evolution they imply are very common in viral genomes, that the networks rarely have the hub-and-spoke topology that dominates other biological networks, and that network topologies can vary substantially even within a given viral group. Five examples with hepatitis B virus and poliovirus are presented to illustrate how covariance network analysis can lead to inferences about viral biology. PMID:22238298

  2. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-02-20

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  3. Cellulases and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2001-01-01

    The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.

  4. CHEMICAL STORAGE: MYTHS VERSUS REALITY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simmons, F

    A large number of resources explaining proper chemical storage are available. These resources include books, databases/tables, and articles that explain various aspects of chemical storage including compatible chemical storage, signage, and regulatory requirements. Another source is the chemical manufacturer or distributor who provides storage information in the form of icons or color coding schemes on container labels. Despite the availability of these resources, chemical accidents stemming from improper storage, according to recent reports (1) (2), make up almost 25% of all chemical accidents. This relatively high percentage of chemical storage accidents suggests that these publications and color coding schemes althoughmore » helpful, still provide incomplete information that may not completely mitigate storage risks. This manuscript will explore some ways published storage information may be incomplete, examine the associated risks, and suggest methods to help further eliminate chemical storage risks.« less

  5. Mitochondrial genome of the sweet potato hornworm, Agrius convolvuli (Lepidoptera: Sphingidae), and comparison with other Lepidoptera species.

    PubMed

    Dai, Li-Shang; Li, Sheng; Yu, Hui-Min; Wei, Guo-Qing; Wang, Lei; Qian, Cen; Zhang, Cong-Fen; Li, Jun; Sun, Yu; Zhao, Yue; Zhu, Bao-Jian; Liu, Chao-Liang

    2017-02-01

    In the present study, we sequenced the complete mitochondrial genome (mitogenome) of Agrius convolvuli (Lepidoptera: Sphingidae) and compared it with previously sequenced mitogenomes of lepidopteran species. The mitogenome was a circular molecule, 15 349 base pairs (bp) long, containing 37 genes. The order and orientation of genes in the A. convolvuli mitogenome were similar to those in sequenced mitogenomes of other lepidopterans. All 13 protein-coding genes (PCGs) were initiated by ATN codons, except for the cytochrome c oxidase subunit 1 (cox1) gene, which seemed to be initiated by the codon CGA, as observed in other lepidopterans. Three of the 13 PCGs had the incomplete termination codon T, while the remainder terminated with TAA. Additionally, the codon distributions of the 13 PCGs revealed that Asn, Ile, Leu2, Lys, Phe, and Tyr were the most frequently used codon families. All transfer RNAs were folded into the expected cloverleaf structure except for tRNA Ser (AGN), which lacked a stable dihydrouridine arm. The length of the adenine (A) + thymine (T)-rich region was 331 bp. This region included the motif ATAGA followed by a 19-bp poly-T stretch and a microsatellite-like (TA) 8 element next to the motif ATTTA. Phylogenetic analyses (maximum likelihood and Bayesian methods) showed that A. convolvuli belongs to the family Sphingidae.

  6. The complete mitochondrial genome of Plodia interpunctella (Lepidoptera: Pyralidae) and comparison with other Pyraloidea insects.

    PubMed

    Liu, Qiu-Ning; Chai, Xin-Yue; Bian, Dan-Dan; Zhou, Chun-Lin; Tang, Bo-Ping

    2016-01-01

    The mitochondrial (mt) genome can provide important information for the understanding of phylogenetic relationships. The complete mt genome of Plodia interpunctella (Lepidoptera: Pyralidae) has been sequenced. The circular genome is 15 287 bp in size, encoding 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The AT skew of this mt genome is slightly negative, and the nucleotide composition is biased toward A+T nucleotides (80.15%). All PCGs start with the typical ATN (ATA, ATC, ATG, and ATT) codons, except for the cox1 gene which may start with the CGA codon. Four of the 13 PCGs harbor the incomplete termination codon T or TA. All the tRNA genes are folded into the typical clover-leaf structure of mitochondrial tRNA, except for trnS1 (AGN) in which the DHU arm fails to form a stable stem-loop structure. The overlapping sequences are 35 bp in total and are found in seven different locations. A total of 240 bp of intergenic spacers are scattered in 16 regions. The control region of the mt genome is 327 bp in length and consisted of several features common to the sequenced lepidopteran insects. Phylogenetic analysis based on 13 PCGs using the Maximum Likelihood method shows that the placement of P. interpunctella was within the Pyralidae.

  7. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

    PubMed Central

    2014-01-01

    Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner. PMID:24950923

  8. Low-energy nuclear reaction of the 14N+169Tm system: Incomplete fusion

    NASA Astrophysics Data System (ADS)

    Kumar, R.; Sharma, Vijay R.; Yadav, Abhishek; Singh, Pushpendra P.; Agarwal, Avinash; Appannababu, S.; Mukherjee, S.; Singh, B. P.; Ali, R.; Bhowmik, R. K.

    2017-11-01

    Excitation functions of reaction residues produced in the 14N+169Tm system have been measured to high precision at energies above the fusion barrier, ranging from 1.04 VB to 1.30 VB , and analyzed in the framework of the statistical model code pace4. Analysis of α -emitting channels points toward the onset of incomplete fusion even at slightly above-barrier energies where complete fusion is supposed to be one of the dominant processes. The onset and strength of incomplete fusion have been deduced and studied in terms of various entrance channel parameters. Present results together with the reanalysis of existing data for various projectile-target combinations conclusively suggest strong influence of projectile structure on the onset of incomplete fusion. Also, a strong dependence on the Coulomb effect (ZPZT) has been observed for the present system along with different projectile-target combinations available in the literature. It is concluded that the fraction of incomplete fusion linearly increases with ZPZT and is found to be more for larger ZPZT values, indicating significantly important linear systematics.

  9. Identification of Hospitalizations for Intentional Self-Harm when E-Codes are Incompletely Recorded

    PubMed Central

    Patrick, Amanda R.; Miller, Matthew; Barber, Catherine W.; Wang, Philip S.; Canning, Claire F.; Schneeweiss, Sebastian

    2010-01-01

    Context Suicidal behavior has gained attention as an adverse outcome of prescription drug use. Hospitalizations for intentional self-harm, including suicide, can be identified in administrative claims databases using external cause of injury codes (E-codes). However, rates of E-code completeness in US government and commercial claims databases are low due to issues with hospital billing software. Objective To develop an algorithm to identify intentional self-harm hospitalizations using recorded injury and psychiatric diagnosis codes in the absence of E-code reporting. Methods We sampled hospitalizations with an injury diagnosis (ICD-9 800–995) from 2 databases with high rates of E-coding completeness: 1999–2001 British Columbia, Canada data and the 2004 U.S. Nationwide Inpatient Sample. Our gold standard for intentional self-harm was a diagnosis of E950-E958. We constructed algorithms to identify these hospitalizations using information on type of injury and presence of specific psychiatric diagnoses. Results The algorithm that identified intentional self-harm hospitalizations with high sensitivity and specificity was a diagnosis of poisoning; toxic effects; open wound to elbow, wrist, or forearm; or asphyxiation; plus a diagnosis of depression, mania, personality disorder, psychotic disorder, or adjustment reaction. This had a sensitivity of 63%, specificity of 99% and positive predictive value (PPV) of 86% in the Canadian database. Values in the US data were 74%, 98%, and 73%. PPV was highest (80%) in patients under 25 and lowest those over 65 (44%). Conclusions The proposed algorithm may be useful for researchers attempting to study intentional self-harm in claims databases with incomplete E-code reporting, especially among younger populations. PMID:20922709

  10. DSP code optimization based on cache

    NASA Astrophysics Data System (ADS)

    Xu, Chengfa; Li, Chengcheng; Tang, Bin

    2013-03-01

    DSP program's running efficiency on board is often lower than which via the software simulation during the program development, which is mainly resulted from the user's improper use and incomplete understanding of the cache-based memory. This paper took the TI TMS320C6455 DSP as an example, analyzed its two-level internal cache, and summarized the methods of code optimization. Processor can achieve its best performance when using these code optimization methods. At last, a specific algorithm application in radar signal processing is proposed. Experiment result shows that these optimization are efficient.

  11. Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

    PubMed Central

    Hall, L; Laird, J E; Craig, R K

    1984-01-01

    Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375

  12. DNA barcode goes two-dimensions: DNA QR code web server.

    PubMed

    Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.

  13. Lichenase and coding sequences

    DOEpatents

    Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong

    2000-08-15

    The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.

  14. FRAGS: estimation of coding sequence substitution rates from fragmentary data

    PubMed Central

    Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

    2004-01-01

    Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802

  15. Visual pattern image sequence coding

    NASA Technical Reports Server (NTRS)

    Silsbee, Peter; Bovik, Alan C.; Chen, Dapang

    1990-01-01

    The visual pattern image coding (VPIC) configurable digital image-coding process is capable of coding with visual fidelity comparable to the best available techniques, at compressions which (at 30-40:1) exceed all other technologies. These capabilities are associated with unprecedented coding efficiencies; coding and decoding operations are entirely linear with respect to image size and entail a complexity that is 1-2 orders of magnitude faster than any previous high-compression technique. The visual pattern image sequence coding to which attention is presently given exploits all the advantages of the static VPIC in the reduction of information from an additional, temporal dimension, to achieve unprecedented image sequence coding performance.

  16. Genome sequencing and analysis of a type A Clostridium perfringens isolate from a case of bovine clostridial abomasitis.

    PubMed

    Nowell, Victoria J; Kropinski, Andrew M; Songer, J Glenn; MacInnes, Janet I; Parreira, Valeria R; Prescott, John F

    2012-01-01

    Clostridium perfringens is a common inhabitant of the avian and mammalian gastrointestinal tracts and can behave commensally or pathogenically. Some enteric diseases caused by type A C. perfringens, including bovine clostridial abomasitis, remain poorly understood. To investigate the potential basis of virulence in strains causing this disease, we sequenced the genome of a type A C. perfringens isolate (strain F262) from a case of bovine clostridial abomasitis. The ∼3.34 Mbp chromosome of C. perfringens F262 is predicted to contain 3163 protein-coding genes, 76 tRNA genes, and an integrated plasmid sequence, Cfrag (∼18 kb). In addition, sequences of two complete circular plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), and two incomplete plasmid fragments, pF262A (48.5 kb) and pF262B (50.0 kb), were identified. Comparison of the chromosome sequence of C. perfringens F262 to complete C. perfringens chromosomes, plasmids and phages revealed 261 unique genes. No novel toxin genes related to previously described clostridial toxins were identified: 60% of the 261 unique genes were hypothetical proteins. There was a two base pair deletion in virS, a gene reported to encode the main sensor kinase involved in virulence gene activation. Despite this frameshift mutation, C. perfringens F262 expressed perfringolysin O, alpha-toxin and the beta2-toxin, suggesting that another regulation system might contribute to the pathogenicity of this strain. Two complete plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), unique to this strain of C. perfringens were identified.

  17. Genome Sequencing and Analysis of a Type A Clostridium perfringens Isolate from a Case of Bovine Clostridial Abomasitis

    PubMed Central

    Nowell, Victoria J.; Kropinski, Andrew M.; Songer, J. Glenn; MacInnes, Janet I.; Parreira, Valeria R.; Prescott, John F.

    2012-01-01

    Clostridium perfringens is a common inhabitant of the avian and mammalian gastrointestinal tracts and can behave commensally or pathogenically. Some enteric diseases caused by type A C. perfringens, including bovine clostridial abomasitis, remain poorly understood. To investigate the potential basis of virulence in strains causing this disease, we sequenced the genome of a type A C. perfringens isolate (strain F262) from a case of bovine clostridial abomasitis. The ∼3.34 Mbp chromosome of C. perfringens F262 is predicted to contain 3163 protein-coding genes, 76 tRNA genes, and an integrated plasmid sequence, Cfrag (∼18 kb). In addition, sequences of two complete circular plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), and two incomplete plasmid fragments, pF262A (48.5 kb) and pF262B (50.0 kb), were identified. Comparison of the chromosome sequence of C. perfringens F262 to complete C. perfringens chromosomes, plasmids and phages revealed 261 unique genes. No novel toxin genes related to previously described clostridial toxins were identified: 60% of the 261 unique genes were hypothetical proteins. There was a two base pair deletion in virS, a gene reported to encode the main sensor kinase involved in virulence gene activation. Despite this frameshift mutation, C. perfringens F262 expressed perfringolysin O, alpha-toxin and the beta2-toxin, suggesting that another regulation system might contribute to the pathogenicity of this strain. Two complete plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), unique to this strain of C. perfringens were identified. PMID:22412860

  18. [Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

    PubMed

    Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

    2011-01-01

    The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.

  19. A symmetry model for genetic coding via a wallpaper group composed of the traditional four bases and an imaginary base E: Towards category theory-like systematization of molecular/genetic biology

    PubMed Central

    2014-01-01

    Background Previously, we suggested prototypal models that describe some clinical states based on group postulates. Here, we demonstrate a group/category theory-like model for molecular/genetic biology as an alternative application of our previous model. Specifically, we focus on deoxyribonucleic acid (DNA) base sequences. Results We construct a wallpaper pattern based on a five-letter cruciform motif with letters C, A, T, G, and E. Whereas the first four letters represent the standard DNA bases, the fifth is introduced for ease in formulating group operations that reproduce insertions and deletions of DNA base sequences. A basic group Z5 = {r, u, d, l, n} of operations is defined for the wallpaper pattern, with which a sequence of points can be generated corresponding to changes of a base in a DNA sequence by following the orbit of a point of the pattern under operations in group Z5. Other manipulations of DNA sequence can be treated using a vector-like notation ‘Dj’ corresponding to a DNA sequence but based on the five-letter base set; also, ‘Dj’s are expressed graphically. Insertions and deletions of a series of letters ‘E’ are admitted to assist in describing DNA recombination. Likewise, a vector-like notation Rj can be constructed for sequences of ribonucleic acid (RNA). The wallpaper group B = {Z5×∞, ●} (an ∞-fold Cartesian product of Z5) acts on Dj (or Rj) yielding changes to Dj (or Rj) denoted by ‘Dj◦B(j→k) = Dk’ (or ‘Rj◦B(j→k) = Rk’). Based on the operations of this group, two types of groups—a modulo 5 linear group and a rotational group over the Gaussian plane, acting on the five bases—are linked as parts of the wallpaper group for broader applications. As a result, changes, insertions/deletions and DNA (RNA) recombination (partial/total conversion) are described. As an exploratory study, a notation for the canonical “central dogma” via a category theory-like way is presented for future developments. Conclusions Despite the large incompleteness of our methodology, there is fertile ground to consider a symmetry model for genetic coding based on our specific wallpaper group. A more integrated formulation containing “central dogma” for future molecular/genetic biology remains to be explored. PMID:24885369

  20. Discrete Cosine Transform Image Coding With Sliding Block Codes

    NASA Astrophysics Data System (ADS)

    Divakaran, Ajay; Pearlman, William A.

    1989-11-01

    A transform trellis coding scheme for images is presented. A two dimensional discrete cosine transform is applied to the image followed by a search on a trellis structured code. This code is a sliding block code that utilizes a constrained size reproduction alphabet. The image is divided into blocks by the transform coding. The non-stationarity of the image is counteracted by grouping these blocks in clusters through a clustering algorithm, and then encoding the clusters separately. Mandela ordered sequences are formed from each cluster i.e identically indexed coefficients from each block are grouped together to form one dimensional sequences. A separate search ensues on each of these Mandela ordered sequences. Padding sequences are used to improve the trellis search fidelity. The padding sequences absorb the error caused by the building up of the trellis to full size. The simulations were carried out on a 256x256 image ('LENA'). The results are comparable to any existing scheme. The visual quality of the image is enhanced considerably by the padding and clustering.

  1. DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server

    PubMed Central

    Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin

    2012-01-01

    The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113

  2. Analysis of Defenses Against Code Reuse Attacks on Modern and New Architectures

    DTIC Science & Technology

    2015-09-01

    soundness or completeness. An incomplete analysis will produce extra edges in the CFG that might allow an attacker to slip through. An unsound analysis...Analysis of Defenses Against Code Reuse Attacks on Modern and New Architectures by Isaac Noah Evans Submitted to the Department of Electrical...Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer

  3. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  4. Flexible manipulation of terahertz wave reflection using polarization insensitive coding metasurfaces.

    PubMed

    Jiu-Sheng, Li; Ze-Jiang, Zhao; Jian-Quan, Yao

    2017-11-27

    In order to extend to 3-bit encoding, we propose notched-wheel structures as polarization insensitive coding metasurfaces to control terahertz wave reflection and suppress backward scattering. By using a coding sequence of "00110011…" along x-axis direction and 16 × 16 random coding sequence, we investigate the polarization insensitive properties of the coding metasurfaces. By designing the coding sequences of the basic coding elements, the terahertz wave reflection can be flexibly manipulated. Additionally, radar cross section (RCS) reduction in the backward direction is less than -10dB in a wide band. The present approach can offer application for novel terahertz manipulation devices.

  5. Noncoding sequence classification based on wavelet transform analysis: part I

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  6. An improved taxonomic sampling is a necessary but not sufficient condition for resolving inter-families relationships in Caridean decapods.

    PubMed

    Aznar-Cormano, L; Brisset, J; Chan, T-Y; Corbari, L; Puillandre, N; Utge, J; Zbinden, M; Zuccon, D; Samadi, S

    2015-04-01

    During the past decade, a large number of multi-gene analyses aimed at resolving the phylogenetic relationships within Decapoda. However relationships among families, and even among sub-families, remain poorly defined. Most analyses used an incomplete and opportunistic sampling of species, but also an incomplete and opportunistic gene selection among those available for Decapoda. Here we test in the Caridea if improving the taxonomic coverage following the hierarchical scheme of the classification, as it is currently accepted, provides a better phylogenetic resolution for the inter-families relationships. The rich collections of the Muséum National d'Histoire Naturelle de Paris are used for sampling as far as possible at least two species of two different genera for each family or subfamily. All potential markers are tested over this sampling. For some coding genes the amplification success varies greatly among taxa and the phylogenetic signal is highly saturated. This result probably explains the taxon-heterogeneity among previously published studies. The analysis is thus restricted to the genes homogeneously amplified over the whole sampling. Thanks to the taxonomic sampling scheme the monophyly of most families is confirmed. However the genes commonly used in Decapoda appear non-adapted for clarifying inter-families relationships, which remain poorly resolved. Genome-wide analyses, like transcriptome-based exon capture facilitated by the new generation sequencing methods might provide a sounder approach to resolve deep and rapid radiations like the Caridea.

  7. Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.

    PubMed

    Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao

    2017-07-01

    In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.

  8. Sensitivity of low-energy incomplete fusion to various entrance-channel parameters

    NASA Astrophysics Data System (ADS)

    Kumar, Harish; Tali, Suhail A.; Afzal Ansari, M.; Singh, D.; Ali, Rahbar; Kumar, Kamal; Sathik, N. P. M.; Ali, Asif; Parashari, Siddharth; Dubey, R.; Bala, Indu; Kumar, R.; Singh, R. P.; Muralithar, S.

    2018-03-01

    The disentangling of incomplete fusion dependence on various entrance channel parameters has been made from the forward recoil range distribution measurement for the 12C+175Lu system at ≈ 88 MeV energy. It gives the direct measure of full and/or partial linear momentum transfer from the projectile to the target nucleus. The comparison of observed recoil ranges with theoretical ranges calculated using the code SRIM infers the production of evaporation residues via complete and/or incomplete fusion process. Present results show that incomplete fusion process contributes significantly in the production of α xn and 2α xn emission channels. The deduced incomplete fusion probability (F_{ICF}) is compared with that obtained for systems available in the literature. An interesting behavior of F_{ICF} with ZP ZT is observed in the reinvestigation of incomplete fusion dependency with the Coulomb factor (ZPZT), contrary to the recent observations. The present results based on (ZPZT) are found in good agreement with recent observations of our group. A larger F_{ICF} value for 12C induced reactions is found than that for 13C, although both have the same ZPZT. A nonsystematic behavior of the incomplete fusion process with the target deformation parameter (β2) is observed, which is further correlated with a new parameter (ZP ZT . β2). The projectile α -Q-value is found to explain more clearly the discrepancy observed in incomplete fusion dependency with parameters ( ZPZT) and (ZP ZT . β2). It may be pointed out that any single entrance channel parameter (mass-asymmetry or (ZPZT) or β2 or projectile α-Q-value) may not be able to explain completely the incomplete fusion process.

  9. Molecular Characterization of Transgene Integration by Next-Generation Sequencing in Transgenic Cattle

    PubMed Central

    Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning

    2012-01-01

    As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species. PMID:23185606

  10. GATA2 null mutation associated with incomplete penetrance in a family with Emberger syndrome.

    PubMed

    Brambila-Tapia, Aniel Jessica Leticia; García-Ortiz, José Elías; Brouillard, Pascal; Nguyen, Ha-Long; Vikkula, Miikka; Ríos-González, Blanca Estela; Sandoval-Muñiz, Roberto de Jesús; Sandoval-Talamantes, Ana Karen; Bobadilla-Morales, Lucina; Corona-Rivera, Jorge Román; Arnaud-Lopez, Lisette

    2017-09-01

    GATA2 mutations are associated with several conditions, including Emberger syndrome which is the association of primary lymphedema with hematological anomalies and an increased risk for myelodysplasia and leukemia. To describe a family with Emberger syndrome with incomplete penetrance. A DNA sequencing of GATA2 gene was performed in the parents and offspring (five individuals in total). The family consisted of 5 individuals with a GATA2 null mutation (c.130G>T, p.Glu44*); three of them were affected (two of which were deceased) while two remained unaffected at the age of 40 and 13 years old. The three affected siblings (two boys and one girl) presented with lymphedema of the lower limbs, recurrent warts, epistaxis and recurrent infections. Two died due to hematological abnormalities (AML and pancytopenia). In contrast, the two other family members who carry the same mutation (the mother and one brother) have not presented any symptoms and their blood tests remain normal. Incomplete penetrance may indicate that GATA2 haploinsufficiency is not enough to produce the phenotype of Emberger syndrome. It could be useful to perform whole exome or genome sequencing, in cases where incomplete penetrance or high variable expressivity is described, in order to probably identify specific gene interactions that drastically modify the phenotype. In addition, skewed gene expression by an epigenetic mechanism of gene regulation should also be considered.

  11. Transport genes of Chromobacterium violaceum: an overview.

    PubMed

    Grangeiro, Thalles Barbosa; Jorge, Daniel Macedo de Melo; Bezerra, Walderly Melgaço; Vasconcelos, Ana Tereza Ribeiro; Simpson, Andrew John George

    2004-03-31

    The complete genome sequence of the free-living bacterium Chromobacterium violaceum has been determined by a consortium of laboratories in Brazil. Almost 500 open reading frames (ORFs) coding for transport-related membrane proteins were identified in C. violaceum, which represents 11% of all genes found. The main class of transporter proteins is the primary active transporters (212 ORFs), followed by electrochemical potential-driven transporters (154 ORFs) and channels/pores (62 ORFs). Other classes (61 ORFs) include group translocators, transport electron carriers, accessory factors, and incompletely characterized systems. Therefore, all major categories of transport-related membrane proteins currently recognized in the Transport Protein Database (http://tcdb.ucsd.edu/tcdb) are present in C. violaceum. The complex apparatus of transporters of C. violaceum is certainly an important factor that makes this bacterium a dominant microorganism in a variety of ecosystems in tropical and subtropical regions. From a biotechnological point of view, the most important finding is the transporters of heavy metals, which could lead to the exploitation of C. violaceum for bioremediation.

  12. The complete mitochondrial genome of the diamondback moth, Plutella xylostella (Lepidoptera: Plutellidae).

    PubMed

    Dai, Li-Shang; Zhu, Bao-Jian; Qian, Cen; Zhang, Cong-Fen; Li, Jun; Wang, Lei; Wei, Guo-Qing; Liu, Chao-Liang

    2016-01-01

    The complete mitochondrial genome (mitogenome) of Plutella xylostella (Lepidoptera: Plutellidae) was determined (GenBank accession No. KM023645). The length of this mitogenome is 16,014 bp with 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes and an A + T-rich region. It presents the typical gene organization and order for completely sequenced lepidopteran mitogenomes. The nucleotide composition of the genome is highly A + T biased, accounting for 81.48%, with a slightly positive AT skewness (0.005). All PCGs are initiated by typical ATN codons, except for the gene cox1, which uses CGA as its start codon. Some PCGs harbor TA (nad5) or incomplete termination codon T (cox1, cox2, nad2 and nad4), while others use TAA as their termination codons. The A + T-rich region is located between rrnS and trnM with a length of 888 bp.

  13. Whole-genome analyses resolve early branches in the tree of life of modern birds

    PubMed Central

    Jarvis, Erich D.; Mirarab, Siavash; Aberer, Andre J.; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y. W.; Faircloth, Brant C.; Nabholz, Benoit; Howard, Jason T.; Suh, Alexander; Weber, Claudia C.; da Fonseca, Rute R.; Li, Jianwen; Zhang, Fang; Li, Hui; Zhou, Long; Narula, Nitish; Liu, Liang; Ganapathy, Ganesh; Boussau, Bastien; Bayzid, Md. Shamsuzzoha; Zavidovych, Volodymyr; Subramanian, Sankar; Gabaldón, Toni; Capella-Gutiérrez, Salvador; Huerta-Cepas, Jaime; Rekepalli, Bhanu; Munch, Kasper; Schierup, Mikkel; Lindow, Bent; Warren, Wesley C.; Ray, David; Green, Richard E.; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Li, Shengbin; Li, Ning; Huang, Yinhua; Derryberry, Elizabeth P.; Bertelsen, Mads Frost; Sheldon, Frederick H.; Brumfield, Robb T.; Mello, Claudio V.; Lovell, Peter V.; Wirthlin, Morgan; Schneider, Maria Paula Cruz; Prosdocimi, Francisco; Samaniego, José Alfredo; Velazquez, Amhed Missael Vargas; Alfaro-Núñez, Alonzo; Campos, Paula F.; Petersen, Bent; Sicheritz-Ponten, Thomas; Pas, An; Bailey, Tom; Scofield, Paul; Bunce, Michael; Lambert, David M.; Zhou, Qi; Perelman, Polina; Driskell, Amy C.; Shapiro, Beth; Xiong, Zijun; Zeng, Yongli; Liu, Shiping; Li, Zhenyu; Liu, Binghang; Wu, Kui; Xiao, Jin; Yinqi, Xiong; Zheng, Qiuemei; Zhang, Yong; Yang, Huanming; Wang, Jian; Smeds, Linnea; Rheindt, Frank E.; Braun, Michael; Fjeldsa, Jon; Orlando, Ludovic; Barker, F. Keith; Jønsson, Knud Andreas; Johnson, Warren; Koepfli, Klaus-Peter; O’Brien, Stephen; Haussler, David; Ryder, Oliver A.; Rahbek, Carsten; Willerslev, Eske; Graves, Gary R.; Glenn, Travis C.; McCormack, John; Burt, Dave; Ellegren, Hans; Alström, Per; Edwards, Scott V.; Stamatakis, Alexandros; Mindell, David P.; Cracraft, Joel; Braun, Edward L.; Warnow, Tandy; Jun, Wang; Gilbert, M. Thomas P.; Zhang, Guojie

    2015-01-01

    To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago. PMID:25504713

  14. Whole-genome analyses resolve early branches in the tree of life of modern birds.

    PubMed

    Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Li, Jianwen; Zhang, Fang; Li, Hui; Zhou, Long; Narula, Nitish; Liu, Liang; Ganapathy, Ganesh; Boussau, Bastien; Bayzid, Md Shamsuzzoha; Zavidovych, Volodymyr; Subramanian, Sankar; Gabaldón, Toni; Capella-Gutiérrez, Salvador; Huerta-Cepas, Jaime; Rekepalli, Bhanu; Munch, Kasper; Schierup, Mikkel; Lindow, Bent; Warren, Wesley C; Ray, David; Green, Richard E; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Li, Shengbin; Li, Ning; Huang, Yinhua; Derryberry, Elizabeth P; Bertelsen, Mads Frost; Sheldon, Frederick H; Brumfield, Robb T; Mello, Claudio V; Lovell, Peter V; Wirthlin, Morgan; Schneider, Maria Paula Cruz; Prosdocimi, Francisco; Samaniego, José Alfredo; Vargas Velazquez, Amhed Missael; Alfaro-Núñez, Alonzo; Campos, Paula F; Petersen, Bent; Sicheritz-Ponten, Thomas; Pas, An; Bailey, Tom; Scofield, Paul; Bunce, Michael; Lambert, David M; Zhou, Qi; Perelman, Polina; Driskell, Amy C; Shapiro, Beth; Xiong, Zijun; Zeng, Yongli; Liu, Shiping; Li, Zhenyu; Liu, Binghang; Wu, Kui; Xiao, Jin; Yinqi, Xiong; Zheng, Qiuemei; Zhang, Yong; Yang, Huanming; Wang, Jian; Smeds, Linnea; Rheindt, Frank E; Braun, Michael; Fjeldsa, Jon; Orlando, Ludovic; Barker, F Keith; Jønsson, Knud Andreas; Johnson, Warren; Koepfli, Klaus-Peter; O'Brien, Stephen; Haussler, David; Ryder, Oliver A; Rahbek, Carsten; Willerslev, Eske; Graves, Gary R; Glenn, Travis C; McCormack, John; Burt, Dave; Ellegren, Hans; Alström, Per; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas P; Zhang, Guojie

    2014-12-12

    To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago. Copyright © 2014, American Association for the Advancement of Science.

  15. Comparative mitochondrial genome analysis of Daphnis nerii and other lepidopteran insects reveals conserved mitochondrial genome organization and phylogenetic relationships

    PubMed Central

    Sun, Yu; Chen, Chen; Gao, Jin; Abbas, Muhammad Nadeem; Kausar, Saima; Qian, Cen; Wang, Lei; Wei, Guoqing; Zhu, Bao-Jian

    2017-01-01

    In the present study, the complete sequence of the mitochondrial genome (mitogenome) of Daphnis nerii (Lepidoptera: Sphingidae) is described. The mitogenome (15,247 bp) of D.nerii encodes13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs) and an adenine (A) + thymine (T)-rich region. Its gene complement and order is similar to that of other sequenced lepidopterans. The 12 PCGs initiated by ATN codons except for cytochrome c oxidase subunit 1 (cox1) gene that is seemingly initiated by the CGA codon as documented in other insect mitogenomes. Four of the 13 PCGs have the incomplete termination codon T, while the remainder terminated with the canonical stop codon. This mitogenome has six major intergenic spacers, with the exception of A+T-rich region, spanning at least 10 bp. The A+T-rich region is 351 bp long, and contains some conserved regions, including ‘ATAGA’ motif followed by a 17 bp poly-T stretch, a microsatellite-like element (AT)9 and also a poly-A element. Phylogenetic analyses based on 13 PCGs using maximum likelihood (ML) and Bayesian inference (BI) revealed that D. nerii resides in the Sphingidae family. PMID:28598968

  16. Nucleic Acid Chaperone Activity of the ORF1 Protein from the Mouse LINE-1 Retrotransposon

    PubMed Central

    Martin, Sandra L.; Bushman, Frederic D.

    2001-01-01

    Non-LTR retrotransposons such as L1 elements are major components of the mammalian genome, but their mechanism of replication is incompletely understood. Like retroviruses and LTR-containing retrotransposons, non-LTR retrotransposons replicate by reverse transcription of an RNA intermediate. The details of cDNA priming and integration, however, differ between these two classes. In retroviruses, the nucleocapsid (NC) protein has been shown to assist reverse transcription by acting as a “nucleic acid chaperone,” promoting the formation of the most stable duplexes between nucleic acid molecules. A protein-coding region with an NC-like sequence is present in most non-LTR retrotransposons, but no such sequence is evident in mammalian L1 elements or other members of its class. Here we investigated the ORF1 protein from mouse L1 and found that it does in fact display nucleic acid chaperone activities in vitro. L1 ORF1p (i) promoted annealing of complementary DNA strands, (ii) facilitated strand exchange to form the most stable hybrids in competitive displacement assays, and (iii) facilitated melting of an imperfect duplex but stabilized perfect duplexes. These findings suggest a role for L1 ORF1p in mediating nucleic acid strand transfer steps during L1 reverse transcription. PMID:11134335

  17. Genetic organization of plasmid pXF51 from the plant pathogen Xylella fastidiosa.

    PubMed

    Marques, M V; da Silva, A M; Gomes, S L

    2001-05-01

    The sequence of plasmid pXF51 from the plant pathogen Xylella fastidiosa, the causal agent of citrus variegated chlorosis, has been analyzed. This plasmid codes for 65 open reading frames (ORFs), organized into four main regions, containing genes related to replication, mobilization, and conjugative transfer. Twenty-five ORFs have no counterparts in the public sequence databases, and 7 are similar to conserved hypothetical proteins from other bacteria. A pXF51 incompatibility group has not been determined, as we could not find a typical replication origin. One cluster of conjugation-related genes (trb) seems to be incomplete in pXF51, and a copy of this sequence is found in the chromosome, suggesting it was generated by a duplication event. A second cluster (tra) contains all genes necessary for conjugation transfer to occur, showing a conserved organization with other conjugative plasmids. An identifiable origin of transfer similar to oriT from IncP plasmids is found adjacent to genes encoding two mobilization proteins. None of the ORFs with putative assigned function could be predicted as having a role in pathogenesis, except for a virulence-associated protein D homolog. These results indicate that even though pXF51 appears not to have a direct role in Xylella pathogenesis, it is a conjugative plasmid that could be important for lateral gene transfer in this bacterium. This property may be of great importance for future development of transformation techniques in X. fastidiosa.

  18. The complete mitogenome sequence of the Japanese oak silkmoth, Antheraea yamamai (Lepidoptera: Saturniidae).

    PubMed

    Kim, Seong Ryeol; Kim, Man Il; Hong, Mee Yeon; Kim, Kee Young; Kang, Pil Don; Hwang, Jae Sam; Han, Yeon Soo; Jin, Byung Rae; Kim, Iksoo

    2009-09-01

    The 15,338-bp long complete mitochondrial genome (mitogenome) of the Japanese oak silkmoth, Antheraea yamamai (Lepidoptera: Saturniidae) was determined. This genome has a gene arrangement identical to those of all other sequenced lepidopteran insects, but differs from the most common type, as the result of the movement of tRNA(Met) to a position 5'-upstream of tRNA(Ile). No typical start codon of the A. yamamai COI gene is available. Instead, a tetranucleotide, TTAG, which is found at the beginning context of all sequenced lepidopteran insects was tentatively designated as the start codon for A. yamamai COI gene. Three of the 13 protein-coding genes (PCGs) harbor the incomplete termination codon, T or TA. All tRNAs formed stable stem-and-loop structures, with the exception of tRNA(Ser)(AGN), the DHU arm of which formed a simple loop as has been observed in many other metazoan mt tRNA(Ser)(AGN). The 334-bp long A + T-rich region is noteworthy in that it harbors tRNA-like structures, as has also been seen in the A + T-rich regions of other insect mitogenomes. Phylogenetic analyses of the available species of Bombycoidea, Pyraloidea, and Tortricidea bolstered the current morphology-based hypothesis that Bombycoidea and Pyraloidea are monophyletic (Obtectomera). As has been previously suggested, Bombycidae (Bombyx mori and B. mandarina) and Saturniidae (A. yamamai and Caligula boisduvalii) formed a reciprocal monophyletic group.

  19. Is a Genome a Codeword of an Error-Correcting Code?

    PubMed Central

    Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

    2012-01-01

    Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495

  20. Sparse coding for flexible, robust 3D facial-expression synthesis.

    PubMed

    Lin, Yuxu; Song, Mingli; Quynh, Dao Thi Phuong; He, Ying; Chen, Chun

    2012-01-01

    Computer animation researchers have been extensively investigating 3D facial-expression synthesis for decades. However, flexible, robust production of realistic 3D facial expressions is still technically challenging. A proposed modeling framework applies sparse coding to synthesize 3D expressive faces, using specified coefficients or expression examples. It also robustly recovers facial expressions from noisy and incomplete data. This approach can synthesize higher-quality expressions in less time than the state-of-the-art techniques.

  1. Pattern recognition of electronic bit-sequences using a semiconductor mode-locked laser and spatial light modulators

    NASA Astrophysics Data System (ADS)

    Bhooplapur, Sharad; Akbulut, Mehmetkan; Quinlan, Franklyn; Delfyett, Peter J.

    2010-04-01

    A novel scheme for recognition of electronic bit-sequences is demonstrated. Two electronic bit-sequences that are to be compared are each mapped to a unique code from a set of Walsh-Hadamard codes. The codes are then encoded in parallel on the spectral phase of the frequency comb lines from a frequency-stabilized mode-locked semiconductor laser. Phase encoding is achieved by using two independent spatial light modulators based on liquid crystal arrays. Encoded pulses are compared using interferometric pulse detection and differential balanced photodetection. Orthogonal codes eight bits long are compared, and matched codes are successfully distinguished from mismatched codes with very low error rates, of around 10-18. This technique has potential for high-speed, high accuracy recognition of bit-sequences, with applications in keyword searches and internet protocol packet routing.

  2. Two Perspectives on the Origin of the Standard Genetic Code

    NASA Astrophysics Data System (ADS)

    Sengupta, Supratim; Aggarwal, Neha; Bandhu, Ashutosh Vishwa

    2014-12-01

    The origin of a genetic code made it possible to create ordered sequences of amino acids. In this article we provide two perspectives on code origin by carrying out simulations of code-sequence coevolution in finite populations with the aim of examining how the standard genetic code may have evolved from more primitive code(s) encoding a small number of amino acids. We determine the efficacy of the physico-chemical hypothesis of code origin in the absence and presence of horizontal gene transfer (HGT) by allowing a diverse collection of code-sequence sets to compete with each other. We find that in the absence of horizontal gene transfer, natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. However, for certain probabilities of the horizontal transfer events, a universal code emerges having a structure that is consistent with the standard genetic code.

  3. An algebraic hypothesis about the primeval genetic code architecture.

    PubMed

    Sánchez, Robersy; Grau, Ricardo

    2009-09-01

    A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.

  4. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  5. Analysis of 10,000 ESTs from lymphocytes of the cynomolgus monkey to improve our understanding of its immune system

    PubMed Central

    Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning

    2006-01-01

    Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371

  6. Circular codes revisited: a statistical approach.

    PubMed

    Gonzalez, D L; Giannerini, S; Rosa, R

    2011-04-21

    In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C(3) maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  8. Combat Injury Coding: A Review and Reconfiguration

    DTIC Science & Technology

    2013-01-01

    the clavicle , scapula, and pelvic girdle were grouped with the torso where they are anatomically located rather than in the upper and lower extremities...incomplete return to previous cognitive state Clavicle or scapula fracture, unilateral Burns, second or third degree, hand, wrist, elbow or shoulder

  9. Photonic entanglement-assisted quantum low-density parity-check encoders and decoders.

    PubMed

    Djordjevic, Ivan B

    2010-05-01

    I propose encoder and decoder architectures for entanglement-assisted (EA) quantum low-density parity-check (LDPC) codes suitable for all-optical implementation. I show that two basic gates needed for EA quantum error correction, namely, controlled-NOT (CNOT) and Hadamard gates can be implemented based on Mach-Zehnder interferometer. In addition, I show that EA quantum LDPC codes from balanced incomplete block designs of unitary index require only one entanglement qubit to be shared between source and destination.

  10. Second-generation sequencing of entire mitochondrial coding-regions (∼15.4 kb) holds promise for study of the phylogeny and taxonomy of human body lice and head lice.

    PubMed

    Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C

    2014-08-01

    The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407. © 2014 The Royal Entomological Society.

  11. Complete Genome Sequences of Two Geographically Distinct Legionella micdadei Clinical Isolates

    PubMed Central

    Jose, Bethany R.; Perry, Jasper; Smeele, Zoe; Aitken, Jack; Gardner, Paul P.

    2017-01-01

    ABSTRACT Legionella is a highly diverse genus of intracellular bacterial pathogens that cause Legionnaire’s disease (LD), an often severe form of pneumonia. Two L. micdadei sp. clinical isolates, obtained from patients hospitalized with LD from geographically distinct areas, were sequenced using PacBio SMRT cell technology, identifying incomplete phage regions, which may impact virulence. PMID:28572318

  12. Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding.

    PubMed

    Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh

    2018-04-11

    Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

  13. Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps.

    PubMed

    Aires-de-Sousa, João; Aires-de-Sousa, Luisa

    2003-01-01

    We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002

  14. CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

    PubMed

    Zhou, Carol L Ecale

    2015-01-01

    In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

  15. Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil

    DTIC Science & Technology

    2017-05-04

    and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We

  16. Quantized phase coding and connected region labeling for absolute phase retrieval.

    PubMed

    Chen, Xiangcheng; Wang, Yuwei; Wang, Yajun; Ma, Mengchao; Zeng, Chunnian

    2016-12-12

    This paper proposes an absolute phase retrieval method for complex object measurement based on quantized phase-coding and connected region labeling. A specific code sequence is embedded into quantized phase of three coded fringes. Connected regions of different codes are labeled and assigned with 3-digit-codes combining the current period and its neighbors. Wrapped phase, more than 36 periods, can be restored with reference to the code sequence. Experimental results verify the capability of the proposed method to measure multiple isolated objects.

  17. Functional interrogation of non-coding DNA through CRISPR genome editing

    PubMed Central

    Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.

    2017-01-01

    Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828

  18. Memorization of Sequences of Movements of the Right or the Left Hand by Right- and Left-Handers: Vector Coding.

    PubMed

    Bobrova, E V; Bogacheva, I N; Lyakhovetskii, V A; Fabinskaja, A A; Fomina, E V

    2017-01-01

    In order to test the hypothesis of hemisphere specialization for different types of information coding (the right hemisphere, for positional coding; the left one, for vector coding), we analyzed the errors of right and left-handers during a task involving the memorization of sequences of movements by the left or the right hand, which activates vector coding by changing the order of movements in memorized sequences. The task was first performed by the right or the left hand, then by the opposite hand. It was found that both'right- and left-handers use the information about the previous movements of the dominant hand, but not of the non-dom" inant one. After changing the hand, right-handers use the information about previous movements of the second hand, while left-handers do not. We compared our results with the data of previous experiments, in which positional coding was activated, and concluded that both right- and left-handers use vector coding for memorizing the sequences of their dominant hands and positional coding for memorizing the sequences of non-dominant hand. No similar patterns of errors were found between right- and left-handers after changing the hand, which suggests that in right- and left-handersthe skills are transferred in different ways depending on the type of coding.

  19. Study of incomplete fusion reaction dynamics in 13C +165 Ho system and its dependence on various entrance channel parameters

    NASA Astrophysics Data System (ADS)

    Tali, Suhail A.; Kumar, Harish; Ansari, M. Afzal; Ali, Asif; Singh, D.; Ali, Rahbar; Giri, Pankaj K.; Linda, Sneha B.; Parashari, Siddharth; Kumar, R.; Singh, R. P.; Muralithar, S.

    2018-02-01

    The excitation functions for the evaporation residues populated in the interaction of 13C +165 Ho system have been measured at projectile energies ≈ 4-7 MeV/nucleon. Stacked foil activation technique followed by off-line γ-ray spectroscopy have been employed in the present work. The experimentally measured cross-sections are analyzed in the frame work of statistical model code PACE4, which takes into account only the complete fusion reaction cross-sections. The evaporation residues populated via xn and pxn channels were found to be in good agreement with the PACE4 predictions, while a significant enhancement in the measured cross-sections over PACE4 predictions is observed in case of α-emitting channels, which may be attributed to the incomplete fusion process. For the better understanding of incomplete fusion dynamics, the incomplete fusion fraction has also been deduced and its sensitivity with various entrance channel parameters like: projectile energy, mass-asymmetry, projectile structure in terms of Qα-value and Coulomb effect has been studied in the present work. The incomplete fusion fraction is found to increase with increasing the projectile energy and a strong projectile structure dependent mass-asymmetry systematic is also observed. The incomplete fusion fraction is also found to be small for more negative Qα-value projectile (13C) induced reactions as compared to less negative Qα-value projectiles (12C, 16O and 20Ne) induced reactions with the same target nucleus 165Ho. An interesting trend is obtained on further investigation of incomplete fusion dependence on Coulomb effect (ZPZT).

  20. An evaluation of computer assisted clinical classification algorithms.

    PubMed

    Chute, C G; Yang, Y; Buntrock, J

    1994-01-01

    The Mayo Clinic has a long tradition of indexing patient records in high resolution and volume. Several algorithms have been developed which promise to help human coders in the classification process. We evaluate variations on code browsers and free text indexing systems with respect to their speed and error rates in our production environment. The more sophisticated indexing systems save measurable time in the coding process, but suffer from incompleteness which requires a back-up system or human verification. Expert Network does the best job of rank ordering clinical text, potentially enabling the creation of thresholds for the pass through of computer coded data without human review.

  1. Incidence of dentinal defects after root canal preparation: reciprocating versus rotary instrumentation.

    PubMed

    Bürklein, Sebastian; Tsotsis, Polymnia; Schäfer, Edgar

    2013-04-01

    The purpose of this study was to evaluate the incidence of dentinal defects after root canal preparation with reciprocating instruments (Reciproc and WaveOne) and rotary instruments. One hundred human central mandibular incisors were randomly assigned to 5 groups (n = 20 teeth per group). The root canals were instrumented by using the reciprocating single-file systems Reciproc and WaveOne and the full-sequence rotary Mtwo and ProTaper instruments. One group was left unprepared as control. Roots were sectioned horizontally at 3, 6, and 9 mm from the apex and evaluated under a microscope by using 25-fold magnification. The presence of dentinal defects (complete/incomplete cracks and craze lines) was noted and analyzed by using the chi-square test. No defects were observed in the controls. All canal preparation created dentinal defects. Overall, instrumentation with Reciproc was associated with more complete cracks than the full-sequence files (P = .021). Although both reciprocating files produced more incomplete cracks apically (3 mm) compared with the rotary files (P = .001), no statistically significant differences were obtained concerning the summarized values of all cross sections (P > .05). Under the conditions of this study, root canal preparation with both rotary and reciprocating instruments resulted in dentinal defects. At the apical level of the canals, reciprocating files produced significantly more incomplete dentinal cracks than full-sequence rotary systems (P < .05). Copyright © 2013 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  2. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

    PubMed Central

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-01-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464

  3. An improved method for identification of small non-coding RNAs in bacteria using support vector machine

    NASA Astrophysics Data System (ADS)

    Barman, Ranjan Kumar; Mukhopadhyay, Anirban; Das, Santasabuj

    2017-04-01

    Bacterial small non-coding RNAs (sRNAs) are not translated into proteins, but act as functional RNAs. They are involved in diverse biological processes like virulence, stress response and quorum sensing. Several high-throughput techniques have enabled identification of sRNAs in bacteria, but experimental detection remains a challenge and grossly incomplete for most species. Thus, there is a need to develop computational tools to predict bacterial sRNAs. Here, we propose a computational method to identify sRNAs in bacteria using support vector machine (SVM) classifier. The primary sequence and secondary structure features of experimentally-validated sRNAs of Salmonella Typhimurium LT2 (SLT2) was used to build the optimal SVM model. We found that a tri-nucleotide composition feature of sRNAs achieved an accuracy of 88.35% for SLT2. We validated the SVM model also on the experimentally-detected sRNAs of E. coli and Salmonella Typhi. The proposed model had robustly attained an accuracy of 81.25% and 88.82% for E. coli K-12 and S. Typhi Ty2, respectively. We confirmed that this method significantly improved the identification of sRNAs in bacteria. Furthermore, we used a sliding window-based method and identified sRNAs from complete genomes of SLT2, S. Typhi Ty2 and E. coli K-12 with sensitivities of 89.09%, 83.33% and 67.39%, respectively.

  4. The long non-coding RNA ROCR contributes to SOX9 expression and chondrogenic differentiation of human mesenchymal stem cells

    PubMed Central

    Hyatt, Sam; Cheung, Kat; Skelton, Andrew J.; Xu, Yaobo; Clark, Ian M.

    2017-01-01

    Long non-coding RNAs (lncRNAs) are expressed in a highly tissue-specific manner and function in various aspects of cell biology, often as key regulators of gene expression. In this study, we established a role for lncRNAs in chondrocyte differentiation. Using RNA sequencing we identified a human articular chondrocyte repertoire of lncRNAs from normal hip cartilage donated by neck of femur fracture patients. Of particular interest are lncRNAs upstream of the master chondrocyte transcription factor SOX9 locus. SOX9 is an HMG-box transcription factor that plays an essential role in chondrocyte development by directing the expression of chondrocyte-specific genes. Two of these lncRNAs are upregulated during chondrogenic differentiation of mesenchymal stem cells (MSCs). Depletion of one of these lncRNAs, LOC102723505, which we termed ROCR (regulator of chondrogenesis RNA), by RNA interference disrupted MSC chondrogenesis, concomitant with reduced cartilage-specific gene expression and incomplete matrix component production, indicating an important role in chondrocyte biology. Specifically, SOX9 induction was significantly ablated in the absence of ROCR, and overexpression of SOX9 rescued the differentiation of MSCs into chondrocytes. Our work sheds further light on chondrocyte-specific SOX9 expression and highlights a novel method of chondrocyte gene regulation involving a lncRNA. PMID:29084806

  5. Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.

    PubMed

    Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo

    2016-08-30

    Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.

  6. GATA: A graphic alignment tool for comparative sequenceanalysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less

  7. Performance of automated and manual coding systems for occupational data: a case study of historical records.

    PubMed

    Patel, Mehul D; Rose, Kathryn M; Owens, Cindy R; Bang, Heejung; Kaufman, Jay S

    2012-03-01

    Occupational data are a common source of workplace exposure and socioeconomic information in epidemiologic research. We compared the performance of two occupation coding methods, an automated software and a manual coder, using occupation and industry titles from U.S. historical records. We collected parental occupational data from 1920-40s birth certificates, Census records, and city directories on 3,135 deceased individuals in the Atherosclerosis Risk in Communities (ARIC) study. Unique occupation-industry narratives were assigned codes by a manual coder and the Standardized Occupation and Industry Coding software program. We calculated agreement between coding methods of classification into major Census occupational groups. Automated coding software assigned codes to 71% of occupations and 76% of industries. Of this subset coded by software, 73% of occupation codes and 69% of industry codes matched between automated and manual coding. For major occupational groups, agreement improved to 89% (kappa = 0.86). Automated occupational coding is a cost-efficient alternative to manual coding. However, some manual coding is required to code incomplete information. We found substantial variability between coders in the assignment of occupations although not as large for major groups.

  8. Phylogenetic Network for European mtDNA

    PubMed Central

    Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari

    2001-01-01

    The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229

  9. RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.

    PubMed

    Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo

    2012-01-01

    In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.

  10. Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.

    PubMed

    Suyama, Mikita; Lathe, Warren C; Bork, Peer

    2005-10-10

    We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.

  11. Mass-invariance of the iron enrichment in the hot haloes of massive ellipticals, groups, and clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Mernier, F.; de Plaa, J.; Werner, N.; Kaastra, J. S.; Raassen, A. J. J.; Gu, L.; Mao, J.; Urdampilleta, I.; Truong, N.; Simionescu, A.

    2018-05-01

    X-ray measurements find systematically lower Fe abundances in the X-ray emitting haloes pervading groups (kT ≲ 1.7 keV) than in clusters of galaxies. These results have been difficult to reconcile with theoretical predictions. However, models using incomplete atomic data or the assumption of isothermal plasmas may have biased the best fit Fe abundance in groups and giant elliptical galaxies low. In this work, we take advantage of a major update of the atomic code in the spectral fitting package SPEX to re-evaluate the Fe abundance in 43 clusters, groups, and elliptical galaxies (the CHEERS sample) in a self-consistent analysis and within a common radius of 0.1r500. For the first time, we report a remarkably similar average Fe enrichment in all these systems. Unlike previous results, this strongly suggests that metals are synthesised and transported in these haloes with the same average efficiency across two orders of magnitude in total mass. We show that the previous metallicity measurements in low temperature systems were biased low due to incomplete atomic data in the spectral fitting codes. The reasons for such a code-related Fe bias, also implying previously unconsidered biases in the emission measure and temperature structure, are discussed.

  12. Brain cDNA clone for human cholinesterase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McTiernan, C.; Adkins, S.; Chatonnet, A.

    1987-10-01

    A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less

  13. Functional interrogation of non-coding DNA through CRISPR genome editing.

    PubMed

    Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H

    2017-05-15

    Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Partial Adaptation of Obtained and Observed Value Signals Preserves Information about Gains and Losses

    PubMed Central

    Baddeley, Michelle; Tobler, Philippe N.; Schultz, Wolfram

    2016-01-01

    Given that the range of rewarding and punishing outcomes of actions is large but neural coding capacity is limited, efficient processing of outcomes by the brain is necessary. One mechanism to increase efficiency is to rescale neural output to the range of outcomes expected in the current context, and process only experienced deviations from this expectation. However, this mechanism comes at the cost of not being able to discriminate between unexpectedly low losses when times are bad versus unexpectedly high gains when times are good. Thus, too much adaptation would result in disregarding information about the nature and absolute magnitude of outcomes, preventing learning about the longer-term value structure of the environment. Here we investigate the degree of adaptation in outcome coding brain regions in humans, for directly experienced outcomes and observed outcomes. We scanned participants while they performed a social learning task in gain and loss blocks. Multivariate pattern analysis showed two distinct networks of brain regions adapt to the most likely outcomes within a block. Frontostriatal areas adapted to directly experienced outcomes, whereas lateral frontal and temporoparietal regions adapted to observed social outcomes. Critically, in both cases, adaptation was incomplete and information about whether the outcomes arose in a gain block or a loss block was retained. Univariate analysis confirmed incomplete adaptive coding in these regions but also detected nonadapting outcome signals. Thus, although neural areas rescale their responses to outcomes for efficient coding, they adapt incompletely and keep track of the longer-term incentives available in the environment. SIGNIFICANCE STATEMENT Optimal value-based choice requires that the brain precisely and efficiently represents positive and negative outcomes. One way to increase efficiency is to adapt responding to the most likely outcomes in a given context. However, too strong adaptation would result in loss of precise representation (e.g., when the avoidance of a loss in a loss-context is coded the same as receipt of a gain in a gain-context). We investigated an intermediate form of adaptation that is efficient while maintaining information about received gains and avoided losses. We found that frontostriatal areas adapted to directly experienced outcomes, whereas lateral frontal and temporoparietal regions adapted to observed social outcomes. Importantly, adaptation was intermediate, in line with influential models of reference dependence in behavioral economics. PMID:27683899

  15. A Code Division Multiple Access Communication System for the Low Frequency Band.

    DTIC Science & Technology

    1983-04-01

    frequency channels spread-spectrum communication / complex sequences, orthogonal codes impulsive noise 20. ABSTRACT (Continue an reverse side It...their transmissions with signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal ...signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal sequences and thus log 2 M

  16. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

    PubMed

    Kress, W John; Erickson, David L

    2007-06-06

    A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.

  17. Insights into evolution in Andean Polystichum (Dryopteridaceae) from expanded understanding of the cytosolic phosphoglucose isomerase gene.

    PubMed

    Lyons, Brendan M; McHenry, Monique A; Barrington, David S

    2017-07-01

    Cytosolic phosphoglucose isomerase (pgiC) is an enzyme essential to glycolysis found universally in eukaryotes, but broad understanding of variation in the gene coding for pgiC is lacking for ferns. We used a substantially expanded representation of the gene for Andean species of the fern genus Polystichum to characterize pgiC in ferns relative to angiosperms, insects, and an amoebozoan; assess the impact of selection versus neutral evolutionary processes on pgiC; and explore evolutionary relationships of selected Andean species. The dataset of complete sequences comprised nine accessions representing seven species and one hybrid from the Andes and Serra do Mar. The aligned sequences of the full data set comprised 3376 base pairs (70% of the entire gene) including 17 exons and 15 introns from two central areas of the gene. The exons are highly conserved relative to angiosperms and retain substantial homology to insect pgiC, but intron length and structure are unique to the ferns. Average intron size is similar to angiosperms; intron number and location in insects are unlike those of the plants we considered. The introns included an array of indels and, in intron 7, an extensive microsatellite array with potential utility in analyzing population-level histories. Bayesian and maximum-parsimony analysis of 129 variable nucleotides in the Andean polystichums revealed that 59 (1.7% of the 3376 total) were phylogenetically informative; most of these united sister accessions. The phylogenetic trees for the Andean polystichums were incongruent with previously published cpDNA trees for the same taxa, likely the result of rapid evolutionary change in the introns and contrasting stability in the exons. The exons code a total of seven amino-acid substitutions. Comparison of non-synonymous to synonymous substitutions did not suggest that the pgiC gene is under selection in the Andes. Variation in pgiC including two additional accessions represented by incomplete sequences provided new insights into reticulate relationships among Andean taxa. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  19. The non-coding RNA landscape of human hematopoiesis and leukemia.

    PubMed

    Schwarzer, Adrian; Emmrich, Stephan; Schmidt, Franziska; Beck, Dominik; Ng, Michelle; Reimer, Christina; Adams, Felix Ferdinand; Grasedieck, Sarah; Witte, Damian; Käbler, Sebastian; Wong, Jason W H; Shah, Anushi; Huang, Yizhou; Jammal, Razan; Maroz, Aliaksandra; Jongen-Lavrencic, Mojca; Schambach, Axel; Kuchenbauer, Florian; Pimanda, John E; Reinhardt, Dirk; Heckl, Dirk; Klusmann, Jan-Henning

    2017-08-09

    Non-coding RNAs have emerged as crucial regulators of gene expression and cell fate decisions. However, their expression patterns and regulatory functions during normal and malignant human hematopoiesis are incompletely understood. Here we present a comprehensive resource defining the non-coding RNA landscape of the human hematopoietic system. Based on highly specific non-coding RNA expression portraits per blood cell population, we identify unique fingerprint non-coding RNAs-such as LINC00173 in granulocytes-and assign these to critical regulatory circuits involved in blood homeostasis. Following the incorporation of acute myeloid leukemia samples into the landscape, we further uncover prognostically relevant non-coding RNA stem cell signatures shared between acute myeloid leukemia blasts and healthy hematopoietic stem cells. Our findings highlight the importance of the non-coding transcriptome in the formation and maintenance of the human blood hierarchy.While micro-RNAs are known regulators of haematopoiesis and leukemogenesis, the role of long non-coding RNAs is less clear. Here the authors provide a non-coding RNA expression landscape of the human hematopoietic system, highlighting their role in the formation and maintenance of the human blood hierarchy.

  20. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

  1. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693

  2. The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

    PubMed Central

    Pietan, Lucas L.; Spradling, Theresa A.

    2016-01-01

    In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589

  3. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    PubMed

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  4. Complete mitochondrial genome of Palawan peacock-pheasant Polyplectron napoleonis (Galliformes, Phasianidae).

    PubMed

    Quach, Tommy; Brooks, Daniel M; Miranda, Hector C

    2016-01-01

    The complete mitochondrial genome of the Palawan peacock-pheasant Polyplectron napoleonis is 16,710 bp and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a control-region. All protein-coding genes use the standard ATG start codon, except for cox1 which has GTG start codon. Seven out of 13 PCGs have TAA stop codons, two have AGG (cox1 and nd6), and three PCGs (nd2, cox2 and nd4) have incomplete stop codon of just T- - nucleotide.

  5. VizieR Online Data Catalog: Catalog of Suspected Nearby Young Stars (Riedel+, 2017)

    NASA Astrophysics Data System (ADS)

    Riedel, A. R.; Blunt, S. C.; Lambrides, E. L.; Rice, E. L.; Cruz, K. L.; Faherty, J. K.

    2018-04-01

    LocAting Constituent mEmbers In Nearby Groups (LACEwING) is a frequentist observation space kinematic moving group identification code. Using the spatial and kinematic information available about a target object (α, δ, Dist, μα, μδ, and γ), it determines the probability that the object is a member of each of the known nearby young moving groups (NYMGs). As with other moving group identification codes, LACEwING is capable of estimating memberships for stars with incomplete kinematic and spatial information. (2 data files).

  6. Approaches for in silico finishing of microbial genome sequences

    PubMed Central

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    2017-01-01

    Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352

  7. Approaches for in silico finishing of microbial genome sequences.

    PubMed

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.

  8. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  9. Golay sequences coded coherent optical OFDM for long-haul transmission

    NASA Astrophysics Data System (ADS)

    Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian

    2017-09-01

    We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.

  10. Criterion for estimation of stress-deformed state of SD-materials

    NASA Astrophysics Data System (ADS)

    Orekhov, Andrey V.

    2018-05-01

    A criterion is proposed that determines the moment when the growth pattern of the monotonic numerical sequence varies from the linear to the parabolic one. The criterion is based on the comparison of squares of errors for the linear and the incomplete quadratic approximation. The approximating functions are constructed locally, only at those points that are located near a possible change in nature of the increase in the sequence.

  11. BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

    PubMed

    Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

    2018-06-05

    With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.

  12. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    PubMed

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.

  13. Negotiating identity and self-image: perceptions of falls in ambulatory individuals with spinal cord injury - a qualitative study.

    PubMed

    Jørgensen, Vivien; Roaldsen, Kirsti Skavberg

    2017-04-01

    Explore and describe experiences and perceptions of falls, risk of falling, and fall-related consequences in individuals with incomplete spinal cord injury (SCI) who are still walking. A qualitative interview study applying interpretive content analysis with an inductive approach. Specialized rehabilitation hospital. A purposeful sample of 15 individuals (10 men), 23 to 78 years old, 2-34 years post injury with chronic incomplete traumatic SCI, and walking ⩾75% of time for mobility needs. Individual, semi-structured face-to-face interviews were recorded, condensed, and coded to find themes and subthemes. One overarching theme was revealed: "Falling challenges identity and self-image as normal" which comprised two main themes "Walking with incomplete SCI involves minimizing fall risk and fall-related concerns without compromising identity as normal" and "Walking with incomplete SCI implies willingness to increase fall risk in order to maintain identity as normal". Informants were aware of their increased fall risk and took precautions, but willingly exposed themselves to risky situations when important to self-identity. All informants expressed some conditional fall-related concerns, and a few experienced concerns limiting activity and participation. Ambulatory individuals with incomplete SCI considered falls to be a part of life. However, falls interfered with the informants' identities and self-images as normal, healthy, and well-functioning. A few expressed dysfunctional concerns about falling, and interventions should target these.

  14. Linkage and homology analysis divides the eight genes for the small subunit of petunia ribulose 1,5-bisphosphate carboxylase into three gene families

    PubMed Central

    Dean, Caroline; van den Elzen, Peter; Tamaki, Stanley; Dunsmuir, Pamela; Bedbrook, John

    1985-01-01

    Twenty-six λ phage clones with homology to coding sequences of the small subunit (SSU) of ribulose 1,5-bisphosphate carboxylase have been isolated from an EMBL3 λ phage bank of Petunia (Mitchell) DNA. Restriction mapping of the phage inserts shows that the clones were obtained from five nonoverlapping regions of petunia DNA that carry seven SSU genes. Comparison of the HindIII genomic fragments of petunia DNA with the HindIII restriction fragments of the isolated phage indicates that petunia nuclear DNA encodes eight SSU genes, seven of which are present in the phage clones. Two incomplete genes, which contain only the 3′ end of an SSU gene, were also found in the phage clones. We demonstrate that the eight SSU genes of petunia can be divided into three gene families based on homology to three petunia cDNA clones. Two gene families contain single SSU genes and the third contains six genes, four of which are closely linked within petunia nuclear DNA. Images PMID:16593584

  15. Processing uncertain RFID data in traceability supply chains.

    PubMed

    Xie, Dong; Xiao, Jie; Guo, Guangjun; Jiang, Tong

    2014-01-01

    Radio Frequency Identification (RFID) is widely used to track and trace objects in traceability supply chains. However, massive uncertain data produced by RFID readers are not effective and efficient to be used in RFID application systems. Following the analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. We adjust different smoothing windows according to different rates of uncertain data, employ different strategies to process uncertain readings, and distinguish ghost, missing, and incomplete data according to their apparent positions. We propose a comprehensive data model which is suitable for different application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequence, the position, and the time intervals. The scheme is suitable for cyclic or long paths. Moreover, we further propose a processing algorithm for group and independent objects. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries.

  16. Processing Uncertain RFID Data in Traceability Supply Chains

    PubMed Central

    Xie, Dong; Xiao, Jie

    2014-01-01

    Radio Frequency Identification (RFID) is widely used to track and trace objects in traceability supply chains. However, massive uncertain data produced by RFID readers are not effective and efficient to be used in RFID application systems. Following the analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. We adjust different smoothing windows according to different rates of uncertain data, employ different strategies to process uncertain readings, and distinguish ghost, missing, and incomplete data according to their apparent positions. We propose a comprehensive data model which is suitable for different application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequence, the position, and the time intervals. The scheme is suitable for cyclic or long paths. Moreover, we further propose a processing algorithm for group and independent objects. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries. PMID:24737978

  17. Complete mitochondrial genome of the Kwangtung skate: Dipturus kwangtungensis (Rajiformes, Rajidae).

    PubMed

    Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Lee, Youn-Ho

    2015-01-01

    The complete sequence of mitochondrial DNA of a Kwangtung skate, Dipturus kwangtungensis, was determined as being circular molecules of 16,912 bp including 2 rRNA, 22 tRNA, 13 protein coding genes (PCGs) and a control region. The arrangement of the PCGs is the same as that found in other Rajidae species. The nucleotide of L-strand which encodes most of the proteins is composed of 30.2% A, 27.4% C, 28.2% T and 14.2% G with a bias toward A+T slightly. Twelve of 13 PCGs are initiated by the ATG codon while COX1 starts with GTG. Only ND4 harbors the incomplete termination codon, TA. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA with the exception of tRNA(Ser)AGY, which has a reduced DHU arm. This mitogenome is the first report for a species of the genus Dipturus, which will become an important source of information on the phylogenetic relationship and the evolution of the genus Dipturus within the family Rajidae.

  18. A Two-Locus Global DNA Barcode for Land Plants: The Coding rbcL Gene Complements the Non-Coding trnH-psbA Spacer Region

    PubMed Central

    Kress, W. John; Erickson, David L.

    2007-01-01

    Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588

  19. A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification.

    PubMed

    Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P

    2017-03-01

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  20. Sequence data - Magnitude and implications of some ambiguities.

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Jukes, T. H.

    1972-01-01

    A stochastic model is applied to the divergence of the horse-pig lineage from a common ansestor in terms of the alpha and beta chains of hemoglobin and fibrinopeptides. The results are compared with those based on the minimum mutation distance model of Fitch (1972). Buckwheat and cauliflower cytochrome c sequences are analyzed to demonstrate their ambiguities. A comparative analysis of evolutionary rates for various proteins of horses and pigs shows that errors of considerable magnitude are introduced by Glx and Asx ambiguities into evolutionary conclusions drawn from sequences of incompletely analyzed proteins.

  1. A DS-UWB Cognitive Radio System Based on Bridge Function Smart Codes

    NASA Astrophysics Data System (ADS)

    Xu, Yafei; Hong, Sheng; Zhao, Guodong; Zhang, Fengyuan; di, Jinshan; Zhang, Qishan

    This paper proposes a direct-sequence UWB Gaussian pulse of cognitive radio systems based on bridge function smart sequence matrix and the Gaussian pulse. As the system uses the spreading sequence code, that is the bridge function smart code sequence, the zero correlation zones (ZCZs) which the bridge function sequences' auto-correlation functions had, could reduce multipath fading of the pulse interference. The Modulated channel signal was sent into the IEEE 802.15.3a UWB channel. We analysis the ZCZs's inhibition to the interference multipath interference (MPI), as one of the main system sources interferences. The simulation in SIMULINK/MATLAB is described in detail. The result shows the system has better performance by comparison with that employing Walsh sequence square matrix, and it was verified by the formula in principle.

  2. Argument Structure, Speech Acts, and Roles in Child-Adult Dispute Episodes.

    ERIC Educational Resources Information Center

    Prescott, Barbara L.

    A study identified discourse patterns in potential disputes, deflected disputes, incomplete, and completed disputes from a one-hour conversation involving two 3-year-old female children and one female adult. These varied dispute episodes were identified, coded, and analyzed using a pragmatic model of adult argumentation focusing on the structures,…

  3. 42 CFR 412.428 - Publication of Updates to the inpatient psychiatric facility prospective payment system.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... the methodology and data used to calculate the updated Federal per diem base payment amount. (b)(1... maintain the appropriate outlier percentage. (e) Describe the ICD-9-CM coding changes and DRG... psychiatric facilities for which the fiscal intermediary obtains inaccurate or incomplete data with which to...

  4. The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

    PubMed Central

    Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

    2002-01-01

    Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388

  5. Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

    PubMed Central

    2012-01-01

    Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225

  6. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal region CA1 while rats performed a nonspatial sequence memory task. We found that hippocampal neurons code for the temporal context of items (whether odors were presented in the correct or incorrect sequential position) and that this activity is linked with memory performance. The discovery of this novel form of temporal coding in hippocampal neurons advances our fundamental understanding of the neurobiology of episodic memory and will serve as a foundation for our cross-species, multitechnique approach aimed at elucidating the neural mechanisms underlying memory impairments in aging and dementia. PMID:26843637

  7. Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.

    PubMed

    Ohno, S

    1984-01-01

    Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.

  8. Genetic variation in eleven phase I drug metabolism genes in an ethnically diverse population.

    PubMed

    Solus, Joseph F; Arietta, Brenda J; Harris, James R; Sexton, David P; Steward, John Q; McMunn, Chara; Ihrie, Patrick; Mehall, Janelle M; Edwards, Todd L; Dawson, Elliott P

    2004-10-01

    The extent of genetic variation found in drug metabolism genes and its contribution to interindividual variation in response to medication remains incompletely understood. To better determine the identity and frequency of variation in 11 phase I drug metabolism genes, the exons and flanking intronic regions of the cytochrome P450 (CYP) isoenzyme genes CYP1A1, CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4 and CYP3A5 were amplified from genomic DNA and sequenced. A total of 60 kb of bi-directional sequence was generated from each of 93 human DNAs, which included Caucasian, African-American and Asian samples. There were 388 different polymorphisms identified. These included 269 non-coding, 45 synonymous and 74 non-synonymous polymorphisms. Of these, 54% were novel and included 176 non-coding, 14 synonymous and 21 non-synonymous polymorphisms. Of the novel variants observed, 85 were represented by single occurrences of the minor allele in the sample set. Much of the variation observed was from low-frequency alleles. Comparatively, these genes are variation-rich. Calculations measuring genetic diversity revealed that while the values for the individual genes are widely variable, the overall nucleotide diversity of 7.7 x 10(-4) and polymorphism parameter of 11.5 x 10(-4) are higher than those previously reported for other gene sets. Several independent measurements indicate that these genes are under selective pressure, particularly for polymorphisms corresponding to non-synonymous amino acid changes. There is relatively little difference in measurements of diversity among the ethnic groups, but there are large differences among the genes and gene subfamilies themselves. Of the three CYP subfamilies involved in phase I drug metabolism (1, 2, and 3), subfamily 2 displays the highest levels of genetic diversity.

  9. Comparison of simple sequence repeats in 19 Archaea.

    PubMed

    Trivedi, S

    2006-12-05

    All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.

  10. Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

    PubMed

    Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

    2016-03-01

    Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.

  11. Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

    PubMed

    Pastor, D; Amaya, W; García-Olcina, R; Sales, S

    2007-07-01

    We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.

  12. A Systematic Bayesian Integration of Epidemiological and Genetic Data

    PubMed Central

    Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin

    2015-01-01

    Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399

  13. FOURTH SEMINAR TO THE MEMORY OF D.N. KLYSHKO: Algebraic solution of the synthesis problem for coded sequences

    NASA Astrophysics Data System (ADS)

    Leukhin, Anatolii N.

    2005-08-01

    The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.

  14. Evaluating the protein coding potential of exonized transposable element sequences

    PubMed Central

    Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

    2007-01-01

    Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258

  15. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  16. Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

    PubMed Central

    Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

    2011-01-01

    Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358

  17. Speech processing using conditional observable maximum likelihood continuity mapping

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, John; Nix, David

    A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence ofmore » speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.« less

  18. The complete mitochondrial genome of the styloperlid stonefly species Styloperla spinicercia Wu (Insecta: Plecoptera) with family-level phylogenetic analyses of the Pteronarcyoidea.

    PubMed

    Wang, Ying; Cao, Jinjun; Li, Weihai

    2017-03-13

    We present the complete mitochondrial (mt) genome sequence of the stonefly, Styloperla spinicercia Wu, 1935 (Plecoptera: Styloperlidae), the type species of the genus Styloperla and the first complete mt genome for the family Styloperlidae. The genome is circular, 16,129 base pairs long, has an A+T content of 70.7%, and contains 37 genes including the large and small ribosomal RNA (rRNA) subunits, 13 protein coding genes (PCGs), 22 tRNA genes and a large non-coding region (CR). All of the PCGs use the standard initiation codon ATN except ND1 and ND5, which start with TTG and GTG. Twelve of the PCGs stop with conventional terminal codons TAA and TAG, except ND5 which shows an incomplete terminator signal T. All tRNAs have the classic clover-leaf structures with the dihydrouridine (DHU) arm of tRNASer(AGN) forming a simple loop. Secondary structures of the two ribosomal RNAs are presented with reference to previous models. The structural elements and the variable numbers of tandem repeats are described within the control region. Phylogenetic analyses using both Bayesian (BI) and Maximum Likelihood (ML) methods support the previous hypotheses regarding family level relationships within the Pteronarcyoidea. The genetic distance calculated based on 13 PCGs and two rRNAs between Styloperla sp. and S. spinicercia is provided and interspecific divergence is discussed.

  19. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    NASA Technical Reports Server (NTRS)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.

  20. ANN modeling of DNA sequences: new strategies using DNA shape code.

    PubMed

    Parbhane, R V; Tambe, S S; Kulkarni, B D

    2000-09-01

    Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.

  1. The primitive code and repeats of base oligomers as the primordial protein-encoding sequence.

    PubMed Central

    Ohno, S; Epplen, J T

    1983-01-01

    Even if the prebiotic self-replication of nucleic acids and the subsequent emergence of primitive, enzyme-independent tRNAs are accepted as plausible, the origin of life by spontaneous generation still appears improbable. This is because the just-emerged primitive translational machinery had to cope with base sequences that were not preselected for their coding potentials. Particularly if the primitive mitochondria-like code with four chain-terminating base triplets preceded the universal code, the translation of long, randomly generated, base sequences at this critical stage would have merely resulted in the production of short oligopeptides instead of long polypeptide chains. We present the base sequence of a mouse transcript containing tetranucleotide repeats conserved during evolution. Even if translated in accordance with the primitive mitochondria-like code, this transcript in its three reading frames can yield 245-, 246-, and 251-residue-long tetrapeptidic periodical polypeptides that are already acquiring longer periodicities. We contend that the first set of base sequences translated at the beginning of life were such oligonucleotide repeats. By quickly acquiring longer periodicities, their products must have soon gained characteristic secondary structures--alpha-helical or beta-sheet or both. PMID:6574491

  2. Algebraic Methods to Design Signals

    DTIC Science & Technology

    2015-08-27

    sequence pairs with optimal correlation values. 5. K.T. Arasu, Pradeep Bansal , Cody Watson, Partially balanced incomplete block designs with two...IEEE Transactions Information Theory, Volume: 58, Issue: 11, Nov 2012, Page(s): 6968 – 6978 5. K.T. Arasu, Pradeep Bansal , Cody Watson, Partially

  3. Transcriptome assembly and digital gene expression atlas of the rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Background: Transcriptome analysis is a preferred method for gene discovery, marker development and gene expression profiling in non-model organisms. Previously, we sequenced a transcriptome reference using Sanger-based and 454-pyrosequencing, however, a transcriptome assembly is still incomplete an...

  4. 40 CFR 80.410 - What are the additional requirements for gasoline produced at foreign refineries having...

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 17 2014-07-01 2014-07-01 false What are the additional requirements... knowledge and belief after I have taken reasonable and appropriate steps to verify the accuracy thereof. I... States Code, section 1001, the penalty for furnishing false, incomplete or misleading information in this...

  5. 40 CFR 80.410 - What are the additional requirements for gasoline produced at foreign refineries having...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 16 2010-07-01 2010-07-01 false What are the additional requirements... knowledge and belief after I have taken reasonable and appropriate steps to verify the accuracy thereof. I... States Code, section 1001, the penalty for furnishing false, incomplete or misleading information in this...

  6. 40 CFR 80.410 - What are the additional requirements for gasoline produced at foreign refineries having...

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 17 2012-07-01 2012-07-01 false What are the additional requirements... knowledge and belief after I have taken reasonable and appropriate steps to verify the accuracy thereof. I... States Code, section 1001, the penalty for furnishing false, incomplete or misleading information in this...

  7. 40 CFR 80.410 - What are the additional requirements for gasoline produced at foreign refineries having...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 16 2011-07-01 2011-07-01 false What are the additional requirements... knowledge and belief after I have taken reasonable and appropriate steps to verify the accuracy thereof. I... States Code, section 1001, the penalty for furnishing false, incomplete or misleading information in this...

  8. 40 CFR 80.410 - What are the additional requirements for gasoline produced at foreign refineries having...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 17 2013-07-01 2013-07-01 false What are the additional requirements... knowledge and belief after I have taken reasonable and appropriate steps to verify the accuracy thereof. I... States Code, section 1001, the penalty for furnishing false, incomplete or misleading information in this...

  9. Reticulate evolution and incomplete lineage sorting among the ponderosa pines.

    PubMed

    Willyard, Ann; Cronn, Richard; Liston, Aaron

    2009-08-01

    Interspecific gene flow via hybridization may play a major role in evolution by creating reticulate rather than hierarchical lineages in plant species. Occasional diploid pine hybrids indicate the potential for introgression, but reticulation is hard to detect because ancestral polymorphism is still shared across many groups of pine species. Nucleotide sequences for 53 accessions from 17 species in subsection Ponderosae (Pinus) provide evidence for reticulate evolution. Two discordant patterns among independent low-copy nuclear gene trees and a chloroplast haplotype are better explained by introgression than incomplete lineage sorting or other causes of incongruence. Conflicting resolution of three monophyletic Pinus coulteri accessions is best explained by ancient introgression followed by a genetic bottleneck. More recent hybridization transferred a chloroplast from P. jeffreyi to a sympatric P. washoensis individual. We conclude that incomplete lineage sorting could account for other examples of non-monophyly, and caution against any analysis based on single-accession or single-locus sampling in Pinus.

  10. Genomics dataset of unidentified disclosed isolates.

    PubMed

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  11. ISS mapped from ICD-9-CM by a novel freeware versus traditional coding: a comparative study.

    PubMed

    Di Bartolomeo, Stefano; Tillati, Silvia; Valent, Francesca; Zanier, Loris; Barbone, Fabio

    2010-03-31

    Injury severity measures are based either on the Abbreviated Injury Scale (AIS) or the International Classification of diseases (ICD). The latter is more convenient because routinely collected by clinicians for administrative reasons. To exploit this advantage, a proprietary program that maps ICD-9-CM into AIS codes has been used for many years. Recently, a program called ICDPIC trauma and developed in the USA has become available free of charge for registered STATA users. We compared the ICDPIC calculated Injury Severity Score (ISS) with the one from direct, prospective AIS coding by expert trauma registrars (dAIS). The administrative records of the 289 major trauma cases admitted to the hospital of Udine-Italy from 1 July 2004 to 30 June 2005 and enrolled in the Italian Trauma Registry were retrieved and ICDPIC-ISS was calculated. The agreement between ICDPIC-ISS and dAIS-ISS was assessed by Cohen's Kappa and Bland-Altman charts. We then plotted the differences between the 2 scores against the ratio between the number of traumatic ICD-9-CM codes and the number of dAIS codes for each patient (DIARATIO). We also compared the absolute differences in ISS among 3 groups identified by DIARATIO. The discriminative power for survival of both scores was finally calculated by ROC curves. The scores matched in 33/272 patients (12.1%, k 0.07) and, when categorized, in 80/272 (22.4%, k 0.09). The Bland-Altman average difference was 6.36 (limits: minus 22.0 to plus 34.7). ICDPIC-ISS of 75 was particularly unreliable. The differences increased (p < 0.01) as DIARATIO increased indicating incomplete administrative coding as a cause of the differences. The area under the curve of ICDPIC-ISS was lower (0.63 vs. 0.76, p = 0.02). Despite its great potential convenience, ICPIC-ISS agreed poorly with its conventionally calculated counterpart. Its discriminative power for survival was also significantly lower. Incomplete ICD-9-CM coding was a main cause of these findings. Because this quality of coding is standard in Italy and probably in other European countries, its effects on the performances of other trauma scores based on ICD administrative data deserve further research. Mapping ICD-9-CM code 862.8 to AIS of 6 is an overestimation.

  12. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

    PubMed Central

    2014-01-01

    Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.

  13. Variability and transmission by Aphis glycines of North American and Asian Soybean mosaic virus isolates.

    PubMed

    Domier, L L; Latorre, I J; Steinlage, T A; McCoppin, N; Hartman, G L

    2003-10-01

    The variability of North American and Asian strains and isolates of Soybean mosaic virus was investigated. First, polymerase chain reaction (PCR) products representing the coat protein (CP)-coding regions of 38 SMVs were analyzed for restriction fragment length polymorphisms (RFLP). Second, the nucleotide and predicted amino acid sequence variability of the P1-coding region of 18 SMVs and the helper component/protease (HC/Pro) and CP-coding regions of 25 SMVs were assessed. The CP nucleotide and predicted amino acid sequences were the most similar and predicted phylogenetic relationships similar to those obtained from RFLP analysis. Neither RFLP nor sequence analyses of the CP-coding regions grouped the SMVs by geographical origin. The P1 and HC/Pro sequences were more variable and separated the North American and Asian SMV isolates into two groups similar to previously reported differences in pathogenic diversity of the two sets of SMV isolates. The P1 region was the most informative of the three regions analyzed. To assess the biological relevance of the sequence differences in the HC/Pro and CP coding regions, the transmissibility of 14 SMV isolates by Aphis glycines was tested. All field isolates of SMV were transmitted efficiently by A. glycines, but the laboratory isolates analyzed were transmitted poorly. The amino acid sequences from most, but not all, of the poorly transmitted isolates contained mutations in the aphid transmission-associated DAG and/or KLSC amino acid sequence motifs of CP and HC/Pro, respectively.

  14. Closed Genome Sequence of Chryseobacterium piperi Strain CTMT/ATCC BAA-1782, a Gram-Negative Bacterium with Clostridial Neurotoxin-Like Coding Sequences

    PubMed Central

    Wentz, Travis G.; Muruvanda, Tim; Thirunavukkarasu, Nagarajan; Hoffmann, Maria; Allard, Marc W.; Hodge, David R.; Pillai, Segaran P.; Hammack, Thomas S.; Brown, Eric W.

    2017-01-01

    ABSTRACT Clostridial neurotoxins, including botulinum and tetanus neurotoxins, are among the deadliest known bacterial toxins. Until recently, the horizontal mobility of this toxin gene family appeared to be limited to the genus Clostridium. We report here the closed genome sequence of Chryseobacterium piperi, a Gram-negative bacterium containing coding sequences with homology to clostridial neurotoxin family proteins. PMID:29192076

  15. Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

    PubMed

    Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

    2014-11-20

    Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Nonsyndromic cleft lip with or without cleft palate: Increased burden of rare variants within Gremlin-1, a component of the bone morphogenetic protein 4 pathway.

    PubMed

    Al Chawa, Taofik; Ludwig, Kerstin U; Fier, Heide; Pötzsch, Bernd; Reich, Rudolf H; Schmidt, Gül; Braumann, Bert; Daratsianos, Nikolaos; Böhmer, Anne C; Schuencke, Hannah; Alblas, Margrieta; Fricker, Nadine; Hoffmann, Per; Knapp, Michael; Lange, Christoph; Nöthen, Markus M; Mangold, Elisabeth

    2014-06-01

    The genes Gremlin-1 (GREM1) and Noggin (NOG) are components of the bone morphogenetic protein 4 pathway, which has been implicated in craniofacial development. Both genes map to recently identified susceptibility loci (chromosomal region 15q13, 17q22) for nonsyndromic cleft lip with or without cleft palate (nsCL/P). The aim of the present study was to determine whether rare variants in either gene are implicated in nsCL/P etiology. The complete coding regions, untranslated regions, and splice sites of GREM1 and NOG were sequenced in 96 nsCL/P patients and 96 controls of Central European ethnicity. Three burden and four nonburden tests were performed. Statistically significant results were followed up in a second case-control sample (n = 96, respectively). For rare variants observed in cases, segregation analyses were performed. In NOG, four rare sequence variants (minor allele frequency < 1%) were identified. Here, burden and nonburden analyses generated nonsignificant results. In GREM1, 33 variants were identified, 15 of which were rare. Of these, five were novel. Significant p-values were generated in three nonburden analyses. Segregation analyses revealed incomplete penetrance for all variants investigated. Our study did not provide support for NOG being the causal gene at 17q22. However, the observation of a significant excess of rare variants in GREM1 supports the hypothesis that this is the causal gene at chr. 15q13. Because no single causal variant was identified, future sequencing analyses of GREM1 should involve larger samples and the investigation of regulatory elements. © 2014 Wiley Periodicals, Inc.

  17. Mutations in the GIGYF2 (TNRC15) Gene at the PARK11 Locus in Familial Parkinson Disease

    PubMed Central

    Lautier, Corinne; Goldwurm, Stefano; Dürr, Alexandra; Giovannone, Barbara; Tsiaras, William G.; Pezzoli, Gianni; Brice, Alexis; Smith, Robert J.

    2008-01-01

    The genetic basis for association of the PARK11 region of chromosome 2 with familial Parkinson disease (PD) is unknown. This study examined the GIGYF2 (Grb10-Interacting GYF Protein-2) (TNRC15) gene, which contains the PARK11 microsatellite marker with the highest linkage score (D2S206, LOD 5.14). The 27 coding exons of the GIGYF2 gene were sequenced in 123 Italian and 126 French patients with familial PD, plus 131 Italian and 96 French controls. A total of seven different GIGYF2 missense mutations resulting in single amino acid substitutions were present in 12 unrelated PD index patients (4.8%) and not in controls. Three amino acid insertions or deletions were found in four other index patients and absent in controls. Specific exon sequencing showed that these ten sequence changes were absent from a further 91 controls. In four families with amino acid substitutions in which at least one other PD case was available, the GIGYF2 mutations (Asn56Ser, Thr112Ala, and Asp606Glu) segregated with PD. There were, however, two unaffected carriers in one family, suggesting age-dependent or incomplete penetrance. One index case (PD onset age 33) inherited a GIGYF2 mutation (Ile278Val) from her affected father (PD onset age 66) and a previously described PD-linked mutation in the LRRK2 gene (Ile1371Val) from her affected mother (PD onset age 61). The earlier onset and severe clinical course in the index patient suggest additive effects of the GIGYF2 and LRRK2 mutations. These data strongly support GIGYF2 as a PARK11 gene with a causal role in familial PD. PMID:18358451

  18. Negotiating identity and self-image: perceptions of falls in ambulatory individuals with spinal cord injury – a qualitative study

    PubMed Central

    Jørgensen, Vivien; Roaldsen, Kirsti Skavberg

    2016-01-01

    Objective: Explore and describe experiences and perceptions of falls, risk of falling, and fall-related consequences in individuals with incomplete spinal cord injury (SCI) who are still walking. Design: A qualitative interview study applying interpretive content analysis with an inductive approach. Setting: Specialized rehabilitation hospital. Subjects: A purposeful sample of 15 individuals (10 men), 23 to 78 years old, 2-34 years post injury with chronic incomplete traumatic SCI, and walking ⩾75% of time for mobility needs. Methods: Individual, semi-structured face-to-face interviews were recorded, condensed, and coded to find themes and subthemes. Results: One overarching theme was revealed: “Falling challenges identity and self-image as normal” which comprised two main themes “Walking with incomplete SCI involves minimizing fall risk and fall-related concerns without compromising identity as normal” and “Walking with incomplete SCI implies willingness to increase fall risk in order to maintain identity as normal”. Informants were aware of their increased fall risk and took precautions, but willingly exposed themselves to risky situations when important to self-identity. All informants expressed some conditional fall-related concerns, and a few experienced concerns limiting activity and participation. Conclusion: Ambulatory individuals with incomplete SCI considered falls to be a part of life. However, falls interfered with the informants’ identities and self-images as normal, healthy, and well-functioning. A few expressed dysfunctional concerns about falling, and interventions should target these. PMID:27170274

  19. Pea chloroplast DNA encodes homologues of Escherichia coli ribosomal subunit S2 and the beta'-subunit of RNA polymerase.

    PubMed Central

    Cozens, A L; Walker, J E

    1986-01-01

    The nucleotide sequence has been determined of a segment of 4680 bases of the pea chloroplast genome. It adjoins a sequence described elsewhere that encodes subunits of the F0 membrane domain of the ATP-synthase complex. The sequence contains a potential gene encoding a protein which is strongly related to the S2 polypeptide of Escherichia coli ribosomes. It also encodes an incomplete protein which contains segments that are homologous to the beta'-subunit of E. coli RNA polymerase and to yeast RNA polymerases II and III. PMID:3530249

  20. Effect of projectile on incomplete fusion reactions at low energies

    NASA Astrophysics Data System (ADS)

    Sharma, Vijay R.; Shuaib, Mohd.; Yadav, Abhishek; Singh, Pushpendra P.; Sharma, Manoj K.; Kumar, R.; Singh, Devendra P.; Singh, B. P.; Muralithar, S.; Singh, R. P.; Bhowmik, R. K.; Prasad, R.

    2017-11-01

    Present work deals with the experimental studies of incomplete fusion reaction dynamics at energies as low as ≈ 4 - 7 MeV/A. Excitation functions populated via complete fusion and/or incomplete fusion processes in 12C+175Lu, and 13C+169Tm systems have been measured within the framework of PACE4 code. Data of excitation function measurements on comparison with different projectile-target combinations suggest the existence of ICF even at slightly above barrier energies where complete fusion (CF) is supposed to be the sole contributor, and further demonstrates strong projectile structure dependence of ICF. The incomplete fusion strength functions for 12C+175Lu, and 13C+169Tm systems are analyzed as a function of various physical parameters at a constant vrel ≈ 0.053c. It has been found that one neutron (1n) excess projectile 13C (as compared to 12C) results in less incomplete fusion contribution due to its relatively large negative α-Q-value, hence, α Q-value seems to be a reliable parameter to understand the ICF dynamics at low energies. In order to explore the reaction modes on the basis of their entry state spin population, the spin distribution of residues populated via CF and/or ICF in 16O+159Tb system has been done using particle-γ coincidence technique. CF-α and ICF-α channels have been identified from backward (B) and forward (F) α-gated γspectra, respectively. Reaction dependent decay patterns have been observed in different α emitting channels. The CF channels are found to be fed over a broad spin range, however, ICF-α channels was observed only for high-spin states. Further, the existence of incomplete fusion at low bombarding energies indicates the possibility to populate high spin states

  1. The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.

    PubMed Central

    Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R

    1982-01-01

    The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791

  2. Invariant protection of high-voltage electric motors of technological complexes at industrial enterprises at partial single-phase ground faults

    NASA Astrophysics Data System (ADS)

    Abramovich, B. N.; Sychev, Yu A.; Pelenev, D. N.

    2018-03-01

    Development results of invariant protection of high-voltage motors at incomplete single-phase ground faults are observed in the article. It is established that current protections have low action selectivity because of an inadmissible decrease in entrance signals during the shirt circuit occurrence in the place of transient resistance. The structural functional scheme and an algorithm of protective actions where correction of automatic zero sequence currents signals of the protected accessions implemented according to the level of incompleteness of ground faults are developed. It is revealed that automatic correction of zero sequence currents allows one to provide the invariance of sensitivity factor for protection under the variation conditions of a transient resistance in the place of damage. Application of invariant protection allows one to minimize damages in 6-10 kV electrical installations of industrial enterprises for a cause of infringement of consumers’ power supply and their system breakdown due to timely localization of emergency of ground faults modes.

  3. A novel p.Gly603Arg mutation in CACNA1F causes Åland island eye disease and incomplete congenital stationary night blindness phenotypes in a family

    PubMed Central

    Vincent, Ajoy; Wright, Tom; Day, Megan A.; Westall, Carol A.

    2011-01-01

    Purpose To report, for the first time, that X-linked incomplete congenital stationary night blindness (CSNB2A) and Åland island eye disease (AIED) phenotypes coexist in a molecularly confirmed pedigree and to present novel phenotypic characteristics of calcium channel alpha-1F subunit gene (CACNA1F)-related disease. Methods Two affected subjects (the proband and his maternal grandfather) and an unaffected obligate carrier (the proband’s mother) underwent detailed ophthalmological evaluation, fundus autofluorescence imaging, and spectral-domain optical coherence tomography. Goldmann visual field assessment and full-field electroretinogram (ERG) were performed in the two affected subjects, and multichannel flash visual evoked potential was performed on the proband. Scotopic 15 Hz flicker ERG series were performed in both affected subjects to evaluate the function of the slow and fast rod pathways. Haplotype analysis using polymorphic microsatellite markers flanking CACNA1F was performed in all three family members. The proband’s DNA was sequenced for mutations in the coding sequence of CACNA1F and nyctalopin (NYX) genes. Segregation analysis was performed in the family. Results Both affected subjects had symptoms of nonprogressive nyctalopia since childhood, while the proband also had photophobia. Both cases had a distance visual acuity of 20/50 or better in each eye, normal contrast sensitivity, and an incomplete type of Schubert-Bornschein ERGs. The proband also had high myopia, a mild red-green color deficit, hypopigmented fundus, and foveal hypoplasia with no evidence of chiasmal misrouting. Spectral-domain optical coherence tomography confirmed the presence of foveal hypoplasia in the proband. The clinical phenotype of the proband and his maternal grandfather fit the clinical description of AIED and CSNB2A, respectively. The fundus autofluorescence and the visual fields were normal in both cases; the scotopic 15 Hz flicker ERG demonstrated only fast rod pathway activity in both. Both affected cases shared the same haplotype across CACNA1F. The proband carried a novel hemizygous c.1807G>C mutation (p.G603R) in the CACNA1F gene. The change segregated with the disease phenotypes and was not identified in 360 control chromosomes. No mutations were identified in NYX. Conclusions This report of a missense mutation in CACNA1F causing AIED and CSNB2A phenotypes in a family confirms that both diseases are allelic and that other genetic or environmental modifiers influence the expression of CACNA1F. This is the first report to suggest that in CACNA1F-related disease, the rod system activity is predominantly from the fast rod pathways. PMID:22194652

  4. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  5. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

    PubMed

    Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

    1995-05-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  6. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  7. Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

    PubMed

    Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I

    2017-12-19

    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.

  8. Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

    PubMed Central

    Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.

    2017-01-01

    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133

  9. Comparative Chloroplast Genomics of Gossypium Species: Insights Into Repeat Sequence Variations and Phylogeny

    PubMed Central

    Wu, Ying; Liu, Fang; Yang, Dai-Gang; Li, Wei; Zhou, Xiao-Jian; Pei, Xiao-Yu; Liu, Yan-Gai; He, Kun-Lun; Zhang, Wen-Sheng; Ren, Zhong-Ying; Zhou, Ke-Hai; Ma, Xiong-Feng; Li, Zhong-Hu

    2018-01-01

    Cotton is one of the most economically important fiber crop plants worldwide. The genus Gossypium contains a single allotetraploid group (AD) and eight diploid genome groups (A–G and K). However, the evolution of repeat sequences in the chloroplast genomes and the phylogenetic relationships of Gossypium species are unclear. Thus, we determined the variations in the repeat sequences and the evolutionary relationships of 40 cotton chloroplast genomes, which represented the most diverse in the genus, including five newly sequenced diploid species, i.e., G. nandewarense (C1-n), G. armourianum (D2-1), G. lobatum (D7), G. trilobum (D8), and G. schwendimanii (D11), and an important semi-wild race of upland cotton, G. hirsutum race latifolium (AD1). The genome structure, gene order, and GC content of cotton species were similar to those of other higher plant plastid genomes. In total, 2860 long sequence repeats (>10 bp in length) were identified, where the F-genome species had the largest number of repeats (G. longicalyx F1: 108) and E-genome species had the lowest (G. stocksii E1: 53). Large-scale repeat sequences possibly enrich the genetic information and maintain genome stability in cotton species. We also identified 10 divergence hotspot regions, i.e., rpl33-rps18, psbZ-trnG (GCC), rps4-trnT (UGU), trnL (UAG)-rpl32, trnE (UUC)-trnT (GGU), atpE, ndhI, rps2, ycf1, and ndhF, which could be useful molecular genetic markers for future population genetics and phylogenetic studies. Site-specific selection analysis showed that some of the coding sites of 10 chloroplast genes (atpB, atpE, rps2, rps3, petB, petD, ccsA, cemA, ycf1, and rbcL) were under protein sequence evolution. Phylogenetic analysis based on the whole plastomes suggested that the Gossypium species grouped into six previously identified genetic clades. Interestingly, all 13 D-genome species clustered into a strong monophyletic clade. Unexpectedly, the cotton species with C, G, and K-genomes were admixed and nested in a large clade, which could have been due to their recent radiation, incomplete lineage sorting, and introgression hybridization among different cotton lineages. In conclusion, the results of this study provide new insights into the evolution of repeat sequences in chloroplast genomes and interspecific relationships in the genus Gossypium. PMID:29619041

  10. Dynamics of actin evolution in dinoflagellates.

    PubMed

    Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F

    2011-04-01

    Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.

  11. Not All Order Memory Is Equal: Test Demands Reveal Dissociations in Memory for Sequence Information

    ERIC Educational Resources Information Center

    Jonker, Tanya R.; MacLeod, Colin M.

    2017-01-01

    Remembering the order of a sequence of events is a fundamental feature of episodic memory. Indeed, a number of formal models represent temporal context as part of the memory system, and memory for order has been researched extensively. Yet, the nature of the code(s) underlying sequence memory is still relatively unknown. Across 4 experiments that…

  12. Complete mitochondrial genome sequence of the heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus).

    PubMed

    Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai

    2016-05-01

    In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.

  13. Code-Switching to Know a TL Equivalent of an L1 Word: Request-Provision-Acknowledgement (RPA) Sequence

    ERIC Educational Resources Information Center

    Lucero, Edgar

    2011-01-01

    This article focuses on the learner's use of Code-switching to learn the TL (Target Language) equivalent of an L1 word. The interactional pattern that this situation creates defines the Request-Provision-Acknowledgement (RPA) sequence. The article explains each of the turns of the sequence under the combination of the Ethnomethodological…

  14. Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method

    NASA Astrophysics Data System (ADS)

    Yuan, Zhe; Zhang, Yiming; Zheng, Qijia

    2018-02-01

    An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.

  15. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

    PubMed

    Wright, Imogen A; Travers, Simon A

    2014-07-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Genomics dataset on unclassified published organism (patent US 7547531).

    PubMed

    Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier

    2016-12-01

    Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

  17. RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data

    PubMed Central

    Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick

    2011-01-01

    With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752

  18. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

    PubMed

    Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick

    2011-04-01

    With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.

  19. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  20. Complete genome sequence of uropathogenic Escherichia coli isolate UPEC 26-1.

    PubMed

    Subhadra, Bindu; Kim, Dong Ho; Kim, Jaeseok; Woo, Kyungho; Sohn, Kyung Mok; Kim, Hwa-Jung; Han, Kyudong; Oh, Man Hwan; Choi, Chul Hee

    2018-06-01

    Urinary tract infections (UTIs) are among the most common infections in humans, predominantly caused by uropathogenic Escherichia coli (UPEC). The diverse genomes of UPEC strains mostly impede disease prevention and control measures. In this study, we comparatively analyzed the whole genome sequence of a highly virulent UPEC strain, namely UPEC 26-1, which was isolated from urine sample of a patient suffering from UTI in Korea. Whole genome analysis showed that the genome consists of one circular chromosome of 5,329,753 bp, comprising 5064 protein-coding genes, 122 RNA genes (94 tRNA, 22 rRNA and 6 ncRNA genes), and 100 pseudogenes, with an average G+C content of 50.56%. In addition, we identified 8 prophage regions comprising 5 intact, 2 incomplete and 1 questionable ones and 63 genomic islands, suggesting the possibility of horizontal gene transfer in this strain. Comparative genome analysis of UPEC 26-1 with the UPEC strain CFT073 revealed an average nucleotide identity of 99.7%. The genome comparison with CFT073 provides major differences in the genome of UPEC 26-1 that would explain its increased virulence and biofilm formation. Nineteen of the total GIs were unique to UPEC 26-1 compared to CFT073 and nine of them harbored unique genes that are involved in virulence, multidrug resistance, biofilm formation and bacterial pathogenesis. The data from this study will assist in future studies of UPEC strains to develop effective control measures.

  1. PLATYPUS: A code for reaction dynamics of weakly-bound nuclei at near-barrier energies within a classical dynamical model

    NASA Astrophysics Data System (ADS)

    Diaz-Torres, Alexis

    2011-04-01

    A self-contained Fortran-90 program based on a three-dimensional classical dynamical reaction model with stochastic breakup is presented, which is a useful tool for quantifying complete and incomplete fusion, and breakup in reactions induced by weakly-bound two-body projectiles near the Coulomb barrier. The code calculates (i) integrated complete and incomplete fusion cross sections and their angular momentum distribution, (ii) the excitation energy distribution of the primary incomplete-fusion products, (iii) the asymptotic angular distribution of the incomplete-fusion products and the surviving breakup fragments, and (iv) breakup observables, such as angle, kinetic energy and relative energy distributions. Program summaryProgram title: PLATYPUS Catalogue identifier: AEIG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 332 342 No. of bytes in distributed program, including test data, etc.: 344 124 Distribution format: tar.gz Programming language: Fortran-90 Computer: Any Unix/Linux workstation or PC with a Fortran-90 compiler Operating system: Linux or Unix RAM: 10 MB Classification: 16.9, 17.7, 17.8, 17.11 Nature of problem: The program calculates a wide range of observables in reactions induced by weakly-bound two-body nuclei near the Coulomb barrier. These include integrated complete and incomplete fusion cross sections and their spin distribution, as well as breakup observables (e.g. the angle, kinetic energy, and relative energy distributions of the fragments). Solution method: All the observables are calculated using a three-dimensional classical dynamical model combined with the Monte Carlo sampling of probability-density distributions. See Refs. [1,2] for further details. Restrictions: The program is suited for a weakly-bound two-body projectile colliding with a stable target. The initial orientation of the segment joining the two breakup fragments is considered to be isotropic. Additional comments: Several source routines from Numerical Recipies, and the Mersenne Twister random number generator package are included to enable independent compilation. Running time: About 75 minutes for input provided, using a PC with 1.5 GHz processor.

  2. Validity of data in the Danish Colorectal Cancer Screening Database.

    PubMed

    Thomsen, Mette Kielsholm; Njor, Sisse Helle; Rasmussen, Morten; Linnemann, Dorte; Andersen, Berit; Baatrup, Gunnar; Friis-Hansen, Lennart Jan; Jørgensen, Jens Christian Riis; Mikkelsen, Ellen Margrethe

    2017-01-01

    In Denmark, a nationwide screening program for colorectal cancer was implemented in March 2014. Along with this, a clinical database for program monitoring and research purposes was established. The aim of this study was to estimate the agreement and validity of diagnosis and procedure codes in the Danish Colorectal Cancer Screening Database (DCCSD). All individuals with a positive immunochemical fecal occult blood test (iFOBT) result who were invited to screening in the first 3 months since program initiation were identified. From these, a sample of 150 individuals was selected using stratified random sampling by age, gender and region of residence. Data from the DCCSD were compared with data from hospital records, which were used as the reference. Agreement, sensitivity, specificity and positive and negative predictive values were estimated for categories of codes "clean colon", "colonoscopy performed", "overall completeness of colonoscopy", "incomplete colonoscopy", "polypectomy", "tumor tissue left behind", "number of polyps", "lost polyps", "risk group of polyps" and "colorectal cancer and polyps/benign tumor". Hospital records were available for 136 individuals. Agreement was highest for "colorectal cancer" (97.1%) and lowest for "lost polyps" (88.2%). Sensitivity varied between moderate and high, with 60.0% for "incomplete colonoscopy" and 98.5% for "colonoscopy performed". Specificity was 92.7% or above, except for the categories "colonoscopy performed" and "overall completeness of colonoscopy", where the specificity was low; however, the estimates were imprecise. A high level of agreement between categories of codes in DCCSD and hospital records indicates that DCCSD reflects the hospital records well. Further, the validity of the categories of codes varied from moderate to high. Thus, the DCCSD may be a valuable data source for future research on colorectal cancer screening.

  3. Public sentiment and discourse about Zika virus on Instagram.

    PubMed

    Seltzer, E K; Horst-Martz, E; Lu, M; Merchant, R M

    2017-09-01

    Social media have strongly influenced the awareness and perceptions of public health emergencies, and a considerable amount of social media content is now shared through images, rather than text alone. This content can impact preparedness and response due to the popularity and real-time nature of social media platforms. We sought to explore how the image-sharing platform Instagram is used for information dissemination and conversation during the current Zika outbreak. This was a retrospective review of publicly posted images about Zika on Instagram. Using the keyword '#zika' we identified 500 images posted on Instagram from May to August 2016. Images were coded by three reviewers and contextual information was collected for each image about sentiment, image type, content, audience, geography, reliability, and engagement. Of 500 images tagged with #zika, 342 (68%) contained content actually related to Zika. Of the 342 Zika-specific images, 299 were coded as 'health' and 193 were coded 'public interest'. Some images had multiple 'health' and 'public interest' codes. Health images tagged with #zika were primarily related to transmission (43%, 129/299) and prevention (48%, 145/299). Transmission-related posts were more often mosquito-human transmission (73%, 94/129) than human-human transmission (27%, 35/129). Mosquito bite prevention posts outnumbered safe sex prevention; (84%, 122/145) and (16%, 23/145) respectively. Images with a target audience were primarily aimed at women (95%, 36/38). Many posts (60%, 61/101) included misleading, incomplete, or unclear information about the virus. Additionally, many images expressed fear and negative sentiment, (79/156, 51%). Instagram can be used to characterize public sentiment and highlight areas of focus for public health, such as correcting misleading or incomplete information or expanding messages to reach diverse audiences. Copyright © 2017 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.

  4. Comparative Analysis of the Complete Chloroplast Genome of Four Endangered Herbals of Notopterygium

    PubMed Central

    Yang, Jiao; Yue, Ming; Niu, Chuan; Ma, Xiong-Feng; Li, Zhong-Hu

    2017-01-01

    Notopterygium H. de Boissieu (Apiaceae) is an endangered perennial herb endemic to China. A good knowledge of phylogenetic evolution and population genomics is conducive to the establishment of effective management and conservation strategies of the genus Notopterygium. In this study, the complete chloroplast (cp) genomes of four Notopterygium species (N. incisum C. C. Ting ex H. T. Chang, N. oviforme R. H. Shan, N. franchetii H. de Boissieu and N. forrestii H. Wolff) were assembled and characterized using next-generation sequencing. We investigated the gene organization, order, size and repeat sequences of the cp genome and constructed the phylogenetic relationships of Notopterygium species based on the chloroplast DNA and nuclear internal transcribed spacer (ITS) sequences. Comparative analysis of plastid genome showed that the cp DNA are the standard double-stranded molecule, ranging from 157,462 bp (N. oviforme) to 159,607 bp (N. forrestii) in length. The circular DNA each contained a large single-copy (LSC) region, a small single-copy (SSC) region, and a pair of inverted repeats (IRs). The cp DNA of four species contained 85 protein-coding genes, 37 transfer RNA (tRNA) genes and 8 ribosomal RNA (rRNA) genes, respectively. We determined the marked conservation of gene content and sequence evolutionary rate in the cp genome of four Notopterygium species. Three genes (psaI, psbI and rpoA) were possibly under positive selection among the four sampled species. Phylogenetic analysis showed that four Notopterygium species formed a monophyletic clade with high bootstrap support. However, the inconsistent interspecific relationships with the genus Notopterygium were identified between the cp DNA and ITS markers. The incomplete lineage sorting, convergence evolution or hybridization, gene infiltration and different sampling strategies among species may have caused the incongruence between the nuclear and cp DNA relationships. The present results suggested that Notopterygium species may have experienced a complex evolutionary history and speciation process. PMID:28422071

  5. Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity

    PubMed Central

    Passera, Alessandro; Marcolungo, Luca; Brasca, Milena; Quaglino, Fabio; Cantaloni, Chiara; Delledonne, Massimo

    2018-01-01

    Bacteria of the Paenibacillus genus are becoming important in many fields of science, including agriculture, for their positive effects on the health of plants. However, there are little information available on this genus compared to other bacteria (such as Bacillus or Pseudomonas), especially when considering genomic information. Sequencing the genomes of plant-beneficial bacteria is a crucial step to identify the genetic elements underlying the adaptation to life inside a plant host and, in particular, which of these features determine the differences between a helpful microorganism and a pathogenic one. In this study, we have characterized the genome of Paenibacillus pasadenensis, strain R16, recently investigated for its antifungal activities and plant-associated features. An hybrid assembly approach was used integrating the very precise reads obtained by Illumina technology and long fragments acquired with Oxford Nanopore Technology (ONT) sequencing. De novo genome assembly based solely on Illumina reads generated a relatively fragmented assembly of 5.72 Mbp in 99 ungapped sequences with an N50 length of 544 Kbp; hybrid assembly, integrating Illumina and ONT reads, improved the assembly quality, generating a genome of 5.75 Mbp, organized in 6 contigs with an N50 length of 3.4 Mbp. Annotation of the latter genome identified 4987 coding sequences, of which 1610 are hypothetical proteins. Enrichment analysis identified pathways of particular interest for the endophyte biology, including the chitin-utilization pathway and the incomplete siderophore pathway which hints at siderophore parasitism. In addition the analysis led to the identification of genes for the production of terpenes, as for example farnesol, that was hypothesized as the main antifungal molecule produced by the strain. The functional analysis on the genome confirmed several plant-associated, plant-growth promotion, and biocontrol traits of strain R16, thus adding insights in the genetic bases of these complex features, and of the Paenibacillus genus in general. PMID:29351296

  6. The complete mitochondrial genome of the fall webworm, Hyphantria cunea (Lepidoptera: Arctiidae)

    PubMed Central

    Liao, Fang; Wang, Lin; Wu, Song; Li, Yu-Ping; Zhao, Lei; Huang, Guo-Ming; Niu, Chun-Jing; Liu, Yan-Qun; Li, Ming-Gang

    2010-01-01

    The complete mitochondrial genome (mitogenome) of the fall webworm, Hyphantria cunea (Lepidoptera: Arctiidae) was determined. The genome is a circular molecule 15 481 bp long. It presents a typical gene organization and order for completely sequenced lepidopteran mitogenomes, but differs from the insect ancestral type for the placement of tRNAMet. The nucleotide composition of the genome is also highly A + T biased, accounting for 80.38%, with a slightly positive AT skewness (0.010), indicating the occurrence of more As than Ts, as found in the Noctuoidea species. All protein-coding genes (PCGs) are initiated by ATN codons, except for COI, which is tentatively designated by the CGA codon as observed in other lepidopterans. Four of 13 PCGs harbor the incomplete termination codon, T or TA. All tRNAs have a typical clover-leaf structure of mitochondrial tRNAs, except for tRNASer(AGN), the DHU arm of which could not form a stable stem-loop structure. The intergenic spacer sequence between tRNASer(AGN) and ND1 also contains the ATACTAA motif, which is conserved across the Lepidoptera order. The H. cunea A+T-rich region of 357 bp is comprised of non-repetitive sequences, but harbors several features common to the Lepidoptera insects, including the motif ATAGA followed by an 18 bp poly-T stretch, a microsatellite-like (AT)8 element preceded by the ATTTA motif, an 11 bp poly-A present immediately upstream tRNAMet. The phylogenetic analyses support the view that the H. cunea is closerly related to the Lymantria dispar than Ochrogaster lunifer, and support the hypothesis that Noctuoidea (H. cunea, L. dispar, and O. lunifer) and Geometroidea (Phthonandria atrilineata) are monophyletic. However, in the phylogenetic trees based on mitogenome sequences among the lepidopteran superfamilies, Papillonoidea (Artogeia melete, Acraea issoria, and Coreana raphaelis) joined basally within the monophyly of Lepidoptera, which is different to the traditional classification. PMID:20376208

  7. Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

    PubMed

    Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

    2016-04-01

    Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.

  8. Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

    PubMed Central

    Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

    2016-01-01

    Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575

  9. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    PubMed

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  10. Efficient analysis of mouse genome sequences reveal many nonsense variants

    PubMed Central

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

    2016-01-01

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  11. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    PubMed

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  12. VaDiR: an integrated approach to Variant Detection in RNA.

    PubMed

    Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

    2018-02-01

    Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.

  13. Draft Genome Sequence of Cellulolytic and Xylanolytic Paenibacillus sp. A59, Isolated from Decaying Forest Soil from Patagonia, Argentina

    PubMed Central

    Ghio, Silvina; Martinez Cáceres, Alfredo I.; Talia, Paola; Grasso, Daniel H.

    2015-01-01

    Paenibacillus sp. A59 was isolated from decaying forest soil in Argentina and characterized as a xylanolytic strain. We report the draft genome sequence of this isolate, with an estimated genome size of 7 Mb which harbor 6,424 coding sequences. Genes coding for hydrolytic enzymes involved in lignocellulose deconstruction were predicted. PMID:26494679

  14. Genome Sequencing and Assembly by Long Reads in Plants

    PubMed Central

    Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

    2017-01-01

    Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420

  15. Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

    NASA Astrophysics Data System (ADS)

    Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

    2010-10-01

    Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.

  16. Outbreak of poliomyelitis in Finland in 1984-85 - Re-analysis of viral sequences using the current standard approach.

    PubMed

    Simonen, Marja-Leena; Roivainen, Merja; Iber, Jane; Burns, Cara; Hovi, Tapani

    2010-01-01

    In 1984, a wild type 3 poliovirus (PV3/FIN84) spread all over Finland causing nine cases of paralytic poliomyelitis and one case of aseptic meningitis. The outbreak was ended in 1985 with an intensive vaccination campaign. By limited sequence comparison with previously isolated PV3 strains, closest relatives of PV3/FIN84 were found among strains circulating in the Mediterranean region. Now we wanted to reanalyse the relationships using approaches currently exploited in poliovirus surveillance. Cell lysates of 22 strains isolated during the outbreak and stored frozen were subjected to RT-PCR amplification in three genomic regions without prior subculture. Sequences of the entire VP1 coding region, 150 nucleotides in the VP1-2A junction, most of the 5' non-coding region, partial sequences of the 3D RNA polymerase coding region and partial 3' non-coding region were compared within the outbreak and with sequences available in data banks. In addition, complete nucleotide sequences were obtained for 2 strains isolated from two different cases of disease during the outbreak. The results confirmed the previously described wide intraepidemic variation of the strains, including amino acid substitutions in antigenic sites, as well as the likely Mediterranean region origin of the strains. Simplot and bootscanning analyses of the complete genomes indicated complicated evolutionary history of the non-capsid coding regions of the genome suggesting several recombinations with different HEV-C viruses in the past.

  17. Genetic code, hamming distance and stochastic matrices.

    PubMed

    He, Matthew X; Petoukhov, Sergei V; Ricci, Paolo E

    2004-09-01

    In this paper we use the Gray code representation of the genetic code C=00, U=10, G=11 and A=01 (C pairs with G, A pairs with U) to generate a sequence of genetic code-based matrices. In connection with these code-based matrices, we use the Hamming distance to generate a sequence of numerical matrices. We then further investigate the properties of the numerical matrices and show that they are doubly stochastic and symmetric. We determine the frequency distributions of the Hamming distances, building blocks of the matrices, decomposition and iterations of matrices. We present an explicit decomposition formula for the genetic code-based matrix in terms of permutation matrices, which provides a hypercube representation of the genetic code. It is also observed that there is a Hamiltonian cycle in a genetic code-based hypercube.

  18. Mitochondrial genomes of the jungle crow Corvus macrorhynchos (Passeriformes: Corvidae) from shed feathers and a phylogenetic analysis of genus Corvus using mitochondrial protein-coding genes.

    PubMed

    Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M

    2016-07-01

    The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.

  19. Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.

    PubMed

    Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka

    2018-01-01

    Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Genetic structure of the four wil tomato species in the Solanum peruvianum s.l. species complex

    USDA-ARS?s Scientific Manuscript database

    The most diverse wild tomato species Solanum peruvianum sensu lato (s.l.) has been reclassified into four separate species. However, reproductive barriers among the species are incomplete and this can lead to discrepancies regarding genetic identity of germplasm. We used genotyping by sequencing (...

  1. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize

    USDA-ARS?s Scientific Manuscript database

    Deleterious alleles have long been proposed to play an important role in patterning phenotypic variation and are central to commonly held ideas explaining the hybrid vigor observed in the offspring by crossing two inbred parents. We test these ideas using evolutionary measures of sequence conservati...

  2. Border-ownership-dependent tilt aftereffect in incomplete figures

    NASA Astrophysics Data System (ADS)

    Sugihara, Tadashi; Tsuji, Yoshihisa; Sakai, Ko

    2007-01-01

    A recent physiological finding of neural coding for border ownership (BO) that defines the direction of a figure with respect to the border has provided a possible basis for figure-ground segregation. To explore the underlying neural mechanisms of BO, we investigated stimulus configurations that activate BO circuitry through psychophysical investigation of the BO-dependent tilt aftereffect (BO-TAE). Specifically, we examined robustness of the border ownership signal by determining whether the BO-TAE is observed when gestalt factors are broken. The results showed significant BO-TAEs even when a global shape was not explicitly given due to the ambiguity of the contour, suggesting a contour-independent mechanism for BO coding.

  3. Border-ownership-dependent tilt aftereffect in incomplete figures.

    PubMed

    Sugihara, Tadashi; Tsuji, Yoshihisa; Sakai, Ko

    2007-01-01

    A recent physiological finding of neural coding for border ownership (BO) that defines the direction of a figure with respect to the border has provided a possible basis for figure-ground segregation. To explore the underlying neural mechanisms of BO, we investigated stimulus configurations that activate BO circuitry through psychophysical investigation of the BO-dependent tilt aftereffect (BO-TAE). Specifically, we examined robustness of the border ownership signal by determining whether the BO-TAE is observed when gestalt factors are broken. The results showed significant BO-TAEs even when a global shape was not explicitly given due to the ambiguity of the contour, suggesting a contour-independent mechanism for BO coding.

  4. Complete genome sequencing of the luminescent bacterium, Vibrio qinghaiensis sp. Q67 using PacBio technology

    NASA Astrophysics Data System (ADS)

    Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming

    2018-01-01

    Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.

  5. Weight distributions for turbo codes using random and nonrandom permutations

    NASA Technical Reports Server (NTRS)

    Dolinar, S.; Divsalar, D.

    1995-01-01

    This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.

  6. Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

    PubMed

    Webb, Kristen M; Rosenthal, Benjamin M

    2011-01-01

    The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.

  7. Cloning and sequence determination of the gene coding for the pyruvate phosphate dikinase of Entamoeba histolytica.

    PubMed

    Saavedra-Lira, E; Pérez-Montfort, R

    1994-05-16

    We isolated three overlapping clones from a DNA genomic library of Entamoeba histolytica strain HM1:IMSS, whose translated nucleotide (nt) sequence shows similarities of 51, 48 and 47% with the amino acid (aa) sequences reported for the pyruvate phosphate dikinases from Bacteroides symbiosus, maize and Flaveria trinervia, respectively. The reading frame determined codes for a protein of 886 aa.

  8. Draft Genome Sequence of Cellulolytic and Xylanolytic Paenibacillus sp. A59, Isolated from Decaying Forest Soil from Patagonia, Argentina.

    PubMed

    Ghio, Silvina; Martinez Cáceres, Alfredo I; Talia, Paola; Grasso, Daniel H; Campos, Eleonora

    2015-10-22

    Paenibacillus sp. A59 was isolated from decaying forest soil in Argentina and characterized as a xylanolytic strain. We report the draft genome sequence of this isolate, with an estimated genome size of 7 Mb which harbor 6,424 coding sequences. Genes coding for hydrolytic enzymes involved in lignocellulose deconstruction were predicted. Copyright © 2015 Ghio et al.

  9. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    PubMed Central

    Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

    2015-01-01

    There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098

  10. The mitochondrial genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae.

    PubMed

    Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H

    2017-05-12

    Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra  mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.

  11. Metal resistance sequences and transgenic plants

    DOEpatents

    Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.

    1999-10-12

    The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.

  12. Complete mitochondrial genome of the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae).

    PubMed

    Kim, Min Jee; Im, Hyun Hwak; Lee, Kwang Youll; Han, Yeon Soo; Kim, Iksoo

    2014-06-01

    Abstract The complete nucleotide sequences of the mitochondrial genome from the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae), was determined. The 20,319-bp long circular genome is the longest among completely sequenced Coleoptera. As is typical in animals, the P. brevitarsis genome consisted of two ribosomal RNAs, 22 transfer RNAs, 13 protein-coding genes and one A + T-rich region. Although the size of the coding genes was typical, the non-coding A + T-rich region was 5654 bp, which is the longest in insects. The extraordinary length of this region was composed of 28,117-bp tandem repeats and 782-bp tandem repeats. These repeat sequences were encompassed by three non-repeat sequences constituting 1804 bp.

  13. Impacts of Model Building Energy Codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Athalye, Rahul A.; Sivaraman, Deepak; Elliott, Douglas B.

    The U.S. Department of Energy (DOE) Building Energy Codes Program (BECP) periodically evaluates national and state-level impacts associated with energy codes in residential and commercial buildings. Pacific Northwest National Laboratory (PNNL), funded by DOE, conducted an assessment of the prospective impacts of national model building energy codes from 2010 through 2040. A previous PNNL study evaluated the impact of the Building Energy Codes Program; this study looked more broadly at overall code impacts. This report describes the methodology used for the assessment and presents the impacts in terms of energy savings, consumer cost savings, and reduced CO 2 emissions atmore » the state level and at aggregated levels. This analysis does not represent all potential savings from energy codes in the U.S. because it excludes several states which have codes which are fundamentally different from the national model energy codes or which do not have state-wide codes. Energy codes follow a three-phase cycle that starts with the development of a new model code, proceeds with the adoption of the new code by states and local jurisdictions, and finishes when buildings comply with the code. The development of new model code editions creates the potential for increased energy savings. After a new model code is adopted, potential savings are realized in the field when new buildings (or additions and alterations) are constructed to comply with the new code. Delayed adoption of a model code and incomplete compliance with the code’s requirements erode potential savings. The contributions of all three phases are crucial to the overall impact of codes, and are considered in this assessment.« less

  14. EDGE 2017 R&D 100 Entry with Appendix

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick Sam Guy; Davenport, Karen Walston; Li, Po-E

    Diabetes, infertility, cancer, and Alzheimer’s disease—the key to one day preventing or even curing such afflictions and diseases (both infectious and genetically driven) may be locked in our own genetic code and the code of microorganisms that inhabit our bodies. The study of this code, known as genomics, has recently become much more promising as a result of two things: (1) vast improvements in high-throughput, nextgeneration sequencing (NSG), and (2) an exponential decrease in the cost of such sequencing. For example, it originally cost approximately $3 billion to sequence the human genome; today, this genome could be resequenced for lessmore » than $1,000.« less

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tar, A.; Ion, A.; Gyoervari, B.

    A de novo apparently balanced translocation involving chromosomes 8 and 20 was found in a 14-year-old boy with minor anomalies, mild skeletal abnormalities and ambiguous external genitalia including perineoscrotal hypospadias, rudimentary fused labioscrotal folds, bilateral cryptorchidism, and small penis. The karyotype was 46,XY, t(8;20)(q22.3-23;p13). No signs of other conditions known to be associated with structural anomalies of either chromosome 8 or 20 were present and incomplete masculinisation of the external genitalia appears to be the main component of the phenotype. Clinical and biological studies showed apparently normal testicular function in utero and after birth. Examinations excluded 5{alpha}-reductase deficiency or amore » block in any enzymatic steps of testosterone, glucocorticoid and mineralocorticoid biosynthesis. Coding sequences of the sex-determining gene (SRY) and androgen receptor gene (AR) were found to be identical to those of a normal male excluding their role in the cause of the present condition. Since several other reports describe the association of hypospadias and hypertelorism with deletions or translocations involving 8q, we suggest that a locus necessary for male sex differentiation is located at distal 8q. 24 refs., 3 figs.« less

  16. Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.

    PubMed

    Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin

    2018-02-01

    More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Congenital amegakaryocytic thrombocytopenia in three siblings: molecular analysis of atypical clinical presentation.

    PubMed

    Gandhi, Manish J; Pendergrass, Thomas W; Cummings, Carrie C; Ihara, Kenji; Blau, C Anthony; Drachman, Jonathan G

    2005-10-01

    An 11-year-old girl, presenting with fatigue and bruising, was found to be profoundly pancytopenic. Bone marrow exam and clinical evaluation were consistent with aplastic anemia. Family members were studied as potential stem cell donors, revealing that both younger siblings displayed significant thrombocytopenia, whereas both parents had normal blood counts. We evaluated this pedigree to understand the unusually late presentation of congenital amegakaryocytic thrombocytopenia (CAMT). The coding region and the intron/exon junctions of MPL were sequenced from each family member. Vectors representing each of the mutations were constructed and tested for the ability to support growth of Baf3/Mpl(mutant) cells. All three siblings had elevated thrombopoietin levels. Analysis of genomic DNA demonstrated that each parent had mutations/polymorphisms in a single MPL allele and that each child was a compound heterozygote, having inherited both abnormal alleles. The maternal allele encoded a mutation of the donor splice-junction at the exon-3/intron-3 boundary. A mini-gene construct encoding normal vs mutant versions of the intron-3 donor-site demonstrated that physiologic splicing was significantly reduced in the mutant construct. Mutations that incompletely eliminate Mpl expression/function may result in delayed diagnosis of CAMT and confusion with aplastic anemia.

  18. Scaling features of noncoding DNA

    NASA Technical Reports Server (NTRS)

    Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.

    1999-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.

  19. Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

    PubMed Central

    Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

    2010-01-01

    Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877

  20. [Learning and Repetive Reproduction of Memorized Sequences by the Right and the Left Hand].

    PubMed

    Bobrova, E V; Lyakhovetskii, V A; Bogacheva, I N

    2015-01-01

    An important stage of learning a new skill is repetitive reproduction of one and the same sequence of movements, which plays a significant role in forming of the movement stereotypes. Two groups of right-handers repeatedly memorized (6-10 repetitions) the sequences of their hand transitions by experimenter in 6 positions, firstly by the right hand (RH), and then--by the left hand (LH) or vice versa. Random sequences previously unknown to the volunteers were reproduced in the 11 series. Modified sequences were tested in the 2nd and 3rd series, where the same elements' positions were presented in different order. The processes of repetitive sequence reproduction were similar for RH and LH. However, the learning of the modified sequences differed: Information about elements' position disregarding the reproduction order was used only when LH initiated task performing. This information was not used when LH followed RH and when RH performed the task. Consequently, the type of information coding activated by LH helped learn the positions of sequence elements, while the type of information coding activated by RH prevented learning. It is supposedly connected with the predominant role of right hemisphere in the processes of positional coding and motor learning.

  1. Sequence Polishing Library (SPL) v10.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oberortner, Ernst

    The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less

  2. Multiple Access Interference Reduction Using Received Response Code Sequence for DS-CDMA UWB System

    NASA Astrophysics Data System (ADS)

    Toh, Keat Beng; Tachikawa, Shin'ichi

    This paper proposes a combination of novel Received Response (RR) sequence at the transmitter and a Matched Filter-RAKE (MF-RAKE) combining scheme receiver system for the Direct Sequence-Code Division Multiple Access Ultra Wideband (DS-CDMA UWB) multipath channel model. This paper also demonstrates the effectiveness of the RR sequence in Multiple Access Interference (MAI) reduction for the DS-CDMA UWB system. It suggests that by using conventional binary code sequence such as the M sequence or the Gold sequence, there is a possibility of generating extra MAI in the UWB system. Therefore, it is quite difficult to collect the energy efficiently although the RAKE reception method is applied at the receiver. The main purpose of the proposed system is to overcome the performance degradation for UWB transmission due to the occurrence of MAI during multiple accessing in the DS-CDMA UWB system. The proposed system improves the system performance by improving the RAKE reception performance using the RR sequence which can reduce the MAI effect significantly. Simulation results verify that significant improvement can be obtained by the proposed system in the UWB multipath channel models.

  3. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

    PubMed Central

    Martin, Andrew C. R.

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  4. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  5. Purpose-Driven Communities in Multiplex Networks: Thresholding User-Engaged Layer Aggregation

    DTIC Science & Technology

    2016-06-01

    dark networks is a non-trivial yet useful task. Because terrorists work hard to hide their relationships/network, analysts have an incomplete picture...them identify meaningful terrorist communities. This thesis introduces a general-purpose algorithm for community detection in multiplex dark networks...aggregation, dark networks, conductance, cluster adequacy, mod- ularity, Louvain method, shortest path interdiction 15. NUMBER OF PAGES 155 16. PRICE CODE

  6. Sequence and Role in Virulence of the Three Plasmid Complement of the Model Tumor-Inducing Bacterium Pseudomonas savastanoi pv. savastanoi NCPPB 3335

    PubMed Central

    Bardaji, Leire; Pérez-Martínez, Isabel; Rodríguez-Moreno, Luis; Rodríguez-Palenzuela, Pablo; Sundin, George W.; Ramos, Cayo; Murillo, Jesús

    2011-01-01

    Pseudomonas savastanoi pv. savastanoi NCPPB 3335 is a model for the study of the molecular basis of disease production and tumor formation in woody hosts, and its draft genome sequence has been recently obtained. Here we closed the sequence of the plasmid complement of this strain, composed of three circular molecules of 78,357 nt (pPsv48A), 45,220 nt (pPsv48B), and 42,103 nt (pPsv48C), all belonging to the pPT23A-like family of plasmids widely distributed in the P. syringae complex. A total of 152 coding sequences were predicted in the plasmid complement, of which 38 are hypothetical proteins and seven correspond to putative virulence genes. Plasmid pPsv48A contains an incomplete Type IVB secretion system, the type III secretion system (T3SS) effector gene hopAF1, gene ptz, involved in cytokinin biosynthesis, and three copies of a gene highly conserved in plant-associated proteobacteria, which is preceded by a hrp box motif. A complete Type IVA secretion system, a well conserved origin of transfer (oriT), and a homolog of the T3SS effector gene hopAO1 are present in pPsv48B, while pPsv48C contains a gene with significant homology to isopentenyl-diphosphate delta-isomerase, type 1. Several potential mobile elements were found on the three plasmids, including three types of MITE, a derivative of IS801, and a new transposon effector, ISPsy30. Although the replication regions of these three plasmids are phylogenetically closely related, their structure is diverse, suggesting that the plasmid architecture results from an active exchange of sequences. Artificial inoculations of olive plants with mutants cured of plasmids pPsv48A and pPsv48B showed that pPsv48A is necessary for full virulence and for the development of mature xylem vessels within the knots; we were unable to obtain mutants cured of pPsv48C, which contains five putative toxin-antitoxin genes. PMID:22022435

  7. Genome sequence of Shigella flexneri strain SP1, a diarrheal isolate that encodes an extended-spectrum β-lactamase (ESBL).

    PubMed

    Shen, Ping; Fan, Jianzhong; Guo, Lihua; Li, Jiahua; Li, Ang; Zhang, Jing; Ying, Chaoqun; Ji, Jinru; Xu, Hao; Zheng, Beiwen; Xiao, Yonghong

    2017-05-12

    Shigellosis is the most common cause of gastrointestinal infections in developing countries. In China, the species most frequently responsible for shigellosis is Shigella flexneri. S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on biochemical and serological properties. Moreover, increasing numbers of ESBL-producing Shigella strains have been isolated from clinical samples. Despite this, only a few cases of ESBL-producing Shigella have been described in China. Therefore, a better understanding of ESBL-producing Shigella from a genomic standpoint is required. In this study, a S. flexneri type 1a isolate SP1 harboring bla CTX-M-14 , which was recovered from the patient with diarrhea, was subjected to whole genome sequencing. The draft genome assembly of S. flexneri strain SP1 consisted of 4,592,345 bp with a G+C content of 50.46%. RAST analysis revealed the genome contained 4798 coding sequences (CDSs) and 100 RNA-encoding genes. We detected one incomplete prophage and six candidate CRISPR loci in the genome. In vitro antimicrobial susceptibility testing demonstrated that strain SP1 is resistant to ampicillin, amoxicillin/clavulanic acid, cefazolin, ceftriaxone and trimethoprim. In silico analysis detected genes mediating resistance to aminoglycosides, β-lactams, phenicol, tetracycline, sulphonamides, and trimethoprim. The bla CTX-M-14 gene was located on an IncFII2 plasmid. A series of virulence factors were identified in the genome. In this study, we report the whole genome sequence of a bla CTX-M-14 -encoding S. flexneri strain SP1. Dozens of resistance determinants were detected in the genome and may be responsible for the multidrug-resistance of this strain, although further confirmation studies are warranted. Numerous virulence factors identified in the strain suggest that isolate SP1 is potential pathogenic. The availability of the genome sequence and comparative analysis with other S. flexneri strains provides the basis to further address the evolution of drug resistance mechanisms and pathogenicity in S. flexneri.

  8. RNAcentral: an international database of ncRNA sequences

    DOE PAGES

    Williams, Kelly Porter

    2014-10-28

    The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.

  9. The complete validated mitochondrial genome of the silver gemfish Rexea solandri (Cuvier, 1832) (Perciformes, Gempylidae).

    PubMed

    Bustamante, Carlos; Ovenden, Jennifer R

    2016-01-01

    The silver gemfish Rexea solandri is an important economic resource but Vulnerable to overfishing in Australian waters. The complete mitochondrial genome sequence is described from 1.6 million reads obtained via next generation sequencing. The total length of the mitogenome is 16,350 bp comprising 2 rRNA, 13 protein-coding genes, 22 tRNA and 2 non-coding regions. The mitogenome sequence was validated against sequences of PCR fragments and BLAST queries of Genbank. Gene order was equivalent to that found in marine fishes.

  10. High rate concatenated coding systems using bandwidth efficient trellis inner codes

    NASA Technical Reports Server (NTRS)

    Deng, Robert H.; Costello, Daniel J., Jr.

    1989-01-01

    High-rate concatenated coding systems with bandwidth-efficient trellis inner codes and Reed-Solomon (RS) outer codes are investigated for application in high-speed satellite communication systems. Two concatenated coding schemes are proposed. In one the inner code is decoded with soft-decision Viterbi decoding, and the outer RS code performs error-correction-only decoding (decoding without side information). In the other, the inner code is decoded with a modified Viterbi algorithm, which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, whereas branch metrics are used to provide reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. The two schemes have been proposed for high-speed data communication on NASA satellite channels. The rates considered are at least double those used in current NASA systems, and the results indicate that high system reliability can still be achieved.

  11. Intact coding region of the serotonin transporter gene in obsessive-compulsive disorder

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Altemus, M.; Murphy, D.L.; Greenberg, B.

    1996-07-26

    Epidemiologic studies indicate that obsessive-compulsive disorder is genetically transmitted in some families, although no genetic abnormalities have been identified in individuals with this disorder. The selective response of obsessive-compulsive disorder to treatment with agents which block serotonin reuptake suggests the gene coding for the serotonin transporter as a candidate gene. The primary structure of the serotonin-transporter coding region was sequenced in 22 patients with obsessive-compulsive disorder, using direct PCR sequencing of cDNA synthesized from platelet serotonin-transporter mRNA. No variations in amino acid sequence were found among the obsessive-compulsive disorder patients or healthy controls. These results do not support a rolemore » for alteration in the primary structure of the coding region of the serotonin-transporter gene in the pathogenesis of obsessive-compulsive disorder. 27 refs.« less

  12. Novel coding, translation, and gene expression of a replicating covalently closed circular RNA of 220 nt.

    PubMed

    AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer

    2014-10-07

    The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact "nanogenome."

  13. In vitro cytotoxicity of Manville Code 100 glass fibers: Effect of fiber length on human alveolar macrophages

    PubMed Central

    Zeidler-Erdely, Patti C; Calhoun, William J; Ameredes, Bill T; Clark, Melissa P; Deye, Gregory J; Baron, Paul; Jones, William; Blake, Terri; Castranova, Vincent

    2006-01-01

    Background Synthetic vitreous fibers (SVFs) are inorganic noncrystalline materials widely used in residential and industrial settings for insulation, filtration, and reinforcement purposes. SVFs conventionally include three major categories: fibrous glass, rock/slag/stone (mineral) wool, and ceramic fibers. Previous in vitro studies from our laboratory demonstrated length-dependent cytotoxic effects of glass fibers on rat alveolar macrophages which were possibly associated with incomplete phagocytosis of fibers ≥ 17 μm in length. The purpose of this study was to examine the influence of fiber length on primary human alveolar macrophages, which are larger in diameter than rat macrophages, using length-classified Manville Code 100 glass fibers (8, 10, 16, and 20 μm). It was hypothesized that complete engulfment of fibers by human alveolar macrophages could decrease fiber cytotoxicity; i.e. shorter fibers that can be completely engulfed might not be as cytotoxic as longer fibers. Human alveolar macrophages, obtained by segmental bronchoalveolar lavage of healthy, non-smoking volunteers, were treated with three different concentrations (determined by fiber number) of the sized fibers in vitro. Cytotoxicity was assessed by monitoring cytosolic lactate dehydrogenase release and loss of function as indicated by a decrease in zymosan-stimulated chemiluminescence. Results Microscopic analysis indicated that human alveolar macrophages completely engulfed glass fibers of the 20 μm length. All fiber length fractions tested exhibited equal cytotoxicity on a per fiber basis, i.e. increasing lactate dehydrogenase and decreasing chemiluminescence in the same concentration-dependent fashion. Conclusion The data suggest that due to the larger diameter of human alveolar macrophages, compared to rat alveolar macrophages, complete phagocytosis of longer fibers can occur with the human cells. Neither incomplete phagocytosis nor length-dependent toxicity was observed in fiber-exposed human macrophage cultures. In contrast, rat macrophages exhibited both incomplete phagocytosis of long fibers and length-dependent toxicity. The results of the human and rat cell studies suggest that incomplete engulfment may enhance cytotoxicity of fiber glass. However, the possibility should not be ruled out that differences between human versus rat macrophages other than cell diameter could account for differences in fiber effects. PMID:16569233

  14. Gene and genon concept: coding versus regulation

    PubMed Central

    2007-01-01

    We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760

  15. The Use and Effectiveness of Triple Multiplex System for Coding Region Single Nucleotide Polymorphism in Mitochondrial DNA Typing of Archaeologically Obtained Human Skeletons from Premodern Joseon Tombs of Korea

    PubMed Central

    Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon

    2015-01-01

    Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190

  16. Flexible and polarization-controllable diffusion metasurface with optical transparency

    NASA Astrophysics Data System (ADS)

    Zhuang, Yaqiang; Wang, Guangming; Liang, Jiangang; Cai, Tong; Guo, Wenlong; Zhang, Qingfeng

    2017-11-01

    In this paper, a novel coding metasurface is proposed to realize polarization-controllable diffusion scattering. The anisotropic Jerusalem-cross unit cell is employed as the basic coding element due to its polarization-dependent phase response. The isotropic random coding sequence is firstly designed to obtain diffusion scattering, and the anisotropic random coding sequence is subsequently realized by adding different periodic coding sequences to the original isotropic one along different directions. For demonstration, we designed and fabricated a flexible polarization-controllable diffusion metasurface (PCDM) with both chessboard diffusion and hedge diffusion under different polarizations. The specular scattering reduction performance of the anisotropic metasurface is better than the isotropic one because the scattered energies are redirected away from the specular reflection direction. For potential applications, the flexible PCDM wrapped around a cylinder structure is investigated and tested for polarization-controllable diffusion scattering. The numerical and experimental results coincide well, indicating anisotropic low scatterings with comparable performances. This paper provides an alternative approach for designing high-performance, flexible, low-scattering platforms.

  17. Two-Dimensional Optical CDMA System Parameters Limitations for Wavelength Hopping/Time-Spreading Scheme based on Simulation Experiment

    NASA Astrophysics Data System (ADS)

    Kandouci, Chahinaz; Djebbari, Ali

    2018-04-01

    A new family of two-dimensional optical hybrid code which employs zero cross-correlation (ZCC) codes, constructed by the balanced incomplete block design BIBD, as both time-spreading and wavelength hopping patterns are used in this paper. The obtained codes have both off-peak autocorrelation and cross-correlation values respectively equal to zero and unity. The work in this paper is a computer experiment performed using Optisystem 9.0 software program as a simulator to determine the wavelength hopping/time spreading (WH/TS) OCDMA system performances limitations. Five system parameters were considered in this work: the optical fiber length (transmission distance), the bitrate, the chip spacing and the transmitted power. This paper shows for what sufficient system performance parameters (BER≤10-9, Q≥6) the system can stand for.

  18. Coding visual features extracted from video sequences.

    PubMed

    Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano

    2014-05-01

    Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.

  19. Community standards for genomic resources, genetic conservation, and data integration

    Treesearch

    Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson

    2017-01-01

    Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...

  20. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity

    USDA-ARS?s Scientific Manuscript database

    Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we ...

  1. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

    PubMed

    Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

    2010-02-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.

  2. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

    PubMed Central

    Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

    2010-01-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640

  3. Landscape of X chromosome inactivation across human tissues.

    PubMed

    Tukiainen, Taru; Villani, Alexandra-Chloé; Yen, Angela; Rivas, Manuel A; Marshall, Jamie L; Satija, Rahul; Aguirre, Matt; Gauthier, Laura; Fleharty, Mark; Kirby, Andrew; Cummings, Beryl B; Castel, Stephane E; Karczewski, Konrad J; Aguet, François; Byrnes, Andrea; Lappalainen, Tuuli; Regev, Aviv; Ardlie, Kristin G; Hacohen, Nir; MacArthur, Daniel G

    2017-10-11

    X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI.

  4. Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

    PubMed Central

    Tramontano, A; Macchiato, M F

    1986-01-01

    An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761

  5. PatGen--a consolidated resource for searching genetic patent sequences.

    PubMed

    Rouse, Richard J D; Castagnetto, Jesus; Niedner, Roland H

    2005-04-15

    Compared to the wealth of online resources covering genomic, proteomic and derived data the Bioinformatics community is rather underserved when it comes to patent information related to biological sequences. The current online resources are either incomplete or rather expensive. This paper describes, PatGen, an integrated database containing data from bioinformatic and patent resources. This effort addresses the inconsistency of publicly available genetic patent data coverage by providing access to a consolidated dataset. PatGen can be searched at http://www.patgendb.com rjdrouse@patentinformatics.com.

  6. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    PubMed

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  7. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  8. A candidate gene for choanal atresia in alpaca.

    PubMed

    Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G

    2010-03-01

    Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.

  9. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences.

    PubMed

    Pang, Erli; Wu, Xiaomei; Lin, Kui

    2016-06-01

    Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.

  10. Translational resistivity/conductivity of coding sequences during exponential growth of Escherichia coli.

    PubMed

    Takai, Kazuyuki

    2017-01-21

    Codon adaptation index (CAI) has been widely used for prediction of expression of recombinant genes in Escherichia coli and other organisms. However, CAI has no mechanistic basis that rationalizes its application to estimation of translational efficiency. Here, I propose a model based on which we could consider how codon usage is related to the level of expression during exponential growth of bacteria. In this model, translation of a gene is considered as an analog of electric current, and an analog of electric resistance corresponding to each gene is considered. "Translational resistance" is dependent on the steady-state concentration and the sequence of the mRNA species, and "translational resistivity" is dependent only on the mRNA sequence. The latter is the sum of two parts: one is the resistivity for the elongation reaction (coding sequence resistivity), and the other comes from all of the other steps of the decoding reaction. This electric circuit model clearly shows that some conditions should be met for codon composition of a coding sequence to correlate well with its expression level. On the other hand, I calculated relative frequency of each of the 61 sense codon triplets translated during exponential growth of E. coli from a proteomic dataset covering over 2600 proteins. A tentative method for estimating relative coding sequence resistivity based on the data is presented. Copyright © 2016. Published by Elsevier Ltd.

  11. Origins of Genes: "Big Bang" or Continuous Creation?

    NASA Astrophysics Data System (ADS)

    Kesse, Paul K.; Gibbs, Adrian

    1992-10-01

    Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.

  12. Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

    PubMed

    Brunak, S; Engelbrecht, J

    1996-06-01

    A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.

  13. Evolution and Diversity of the Human Hepatitis D Virus Genome

    PubMed Central

    Huang, Chi-Ruei; Lo, Szecheng J.

    2010-01-01

    Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work. PMID:20204073

  14. Novel methodologies for spectral classification of exon and intron sequences

    NASA Astrophysics Data System (ADS)

    Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

    2012-12-01

    Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.

  15. Random digital encryption secure communication system

    NASA Technical Reports Server (NTRS)

    Doland, G. D. (Inventor)

    1982-01-01

    The design of a secure communication system is described. A product code, formed from two pseudorandom sequences of digital bits, is used to encipher or scramble data prior to transmission. The two pseudorandom sequences are periodically changed at intervals before they have had time to repeat. One of the two sequences is transmitted continuously with the scrambled data for synchronization. In the receiver portion of the system, the incoming signal is compared with one of two locally generated pseudorandom sequences until correspondence between the sequences is obtained. At this time, the two locally generated sequences are formed into a product code which deciphers the data from the incoming signal. Provision is made to ensure synchronization of the transmitting and receiving portions of the system.

  16. Expressed gene sequence of the IFN-gamma-response chemokine CXCL9 of cattle, horses, and swine

    USDA-ARS?s Scientific Manuscript database

    This report describes the cloning and characterization of expressed gene sequences of bovine, equine, and swine CXCL9 from RNA obtained from peripheral blood mononuclear cell (PBMC) or other tissues. The bovine coding region was 378 nucleotides in length, while the equine and swine coding regions w...

  17. ISSYS: An integrated synergistic Synthesis System

    NASA Technical Reports Server (NTRS)

    Dovi, A. R.

    1980-01-01

    Integrated Synergistic Synthesis System (ISSYS), an integrated system of computer codes in which the sequence of program execution and data flow is controlled by the user, is discussed. The commands available to exert such control, the ISSYS major function and rules, and the computer codes currently available in the system are described. Computational sequences frequently used in the aircraft structural analysis and synthesis are defined. External computer codes utilized by the ISSYS system are documented. A bibliography on the programs is included.

  18. A Six Nuclear Gene Phylogeny of Citrus (Rutaceae) Taking into Account Hybridization and Lineage Sorting

    PubMed Central

    Keremane, Manjunath L.; Lee, Richard F.; Maureira-Butler, Ivan J.; Roose, Mikeal L.

    2013-01-01

    Background Genus Citrus (Rutaceae) comprises many important cultivated species that generally hybridize easily. Phylogenetic study of a group showing extensive hybridization is challenging. Since the genus Citrus has diverged recently (4–12 Ma), incomplete lineage sorting of ancestral polymorphisms is also likely to cause discrepancies among genes in phylogenetic inferences. Incongruence of gene trees is observed and it is essential to unravel the processes that cause inconsistencies in order to understand the phylogenetic relationships among the species. Methodology and Principal Findings (1) We generated phylogenetic trees using haplotype sequences of six low copy nuclear genes. (2) Published simple sequence repeat data were re-analyzed to study population structure and the results were compared with the phylogenetic trees constructed using sequence data and coalescence simulations. (3) To distinguish between hybridization and incomplete lineage sorting, we developed and utilized a coalescence simulation approach. In other studies, species trees have been inferred despite the possibility of hybridization having occurred and used to generate null distributions of the effect of lineage sorting alone (by coalescent simulation). Since this is problematic, we instead generate these distributions directly from observed gene trees. Of the six trees generated, we used the most resolved three to detect hybrids. We found that 11 of 33 samples appear to be affected by historical hybridization. Analysis of the remaining three genes supported the conclusions from the hybrid detection test. Conclusions We have identified or confirmed probable hybrid origins for several Citrus cultivars using three different approaches–gene phylogenies, population structure analysis and coalescence simulation. Hybridization and incomplete lineage sorting were identified primarily based on differences among gene phylogenies with reference to null expectations via coalescence simulations. We conclude that identifying hybridization as a frequent cause of incongruence among gene trees is critical to correctly infer the phylogeny among species of Citrus. PMID:23874615

  19. Connection anonymity analysis in coded-WDM PONs

    NASA Astrophysics Data System (ADS)

    Sue, Chuan-Ching

    2008-04-01

    A coded wavelength division multiplexing passive optical network (WDM PON) is presented for fiber to the home (FTTH) systems to protect against eavesdropping. The proposed scheme applies spectral amplitude coding (SAC) with a unipolar maximal-length sequence (M-sequence) code matrix to generate a specific signature address (coding) and to retrieve its matching address codeword (decoding) by exploiting the cyclic properties inherent in array waveguide grating (AWG) routers. In addition to ensuring the confidentiality of user data, the proposed coded-WDM scheme is also a suitable candidate for the physical layer with connection anonymity. Under the assumption that the eavesdropper applies a photo-detection strategy, it is shown that the coded WDM PON outperforms the conventional TDM PON and WDM PON schemes in terms of a higher degree of connection anonymity. Additionally, the proposed scheme allows the system operator to partition the optical network units (ONUs) into appropriate groups so as to achieve a better degree of anonymity.

  20. Mitigating the impact of the DESI fiber assignment on galaxy clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burden, Angela; Padmanabhan, Nikhil; Cahn, Robert N.

    2017-03-01

    We present a simple strategy to mitigate the impact of an incomplete spectroscopic redshift galaxy sample as a result of fiber assignment and survey tiling. The method has been designed for the Dark Energy Spectroscopic Instrument (DESI) galaxy survey but may have applications beyond this. We propose a modification to the usual correlation function that nulls the almost purely angular modes affected by survey incompleteness due to fiber assignment. Predictions of this modified statistic can be calculated given a model of the two point correlation function. The new statistic can be computed with a slight modification to the data cataloguesmore » input to the standard correlation function code and does not incur any additional computational time. Finally we show that the spherically averaged baryon acoustic oscillation signal is not biased by the new statistic.« less

  1. Novel coding, translation, and gene expression of a replicating covalently closed circular RNA of 220 nt

    PubMed Central

    AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer

    2014-01-01

    The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact “nanogenome.” PMID:25253891

  2. Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3; The Map and Related Decoding Algirithms

    NASA Technical Reports Server (NTRS)

    Lin, Shu; Fossorier, Marc

    1998-01-01

    In a coded communication system with equiprobable signaling, MLD minimizes the word error probability and delivers the most likely codeword associated with the corresponding received sequence. This decoding has two drawbacks. First, minimization of the word error probability is not equivalent to minimization of the bit error probability. Therefore, MLD becomes suboptimum with respect to the bit error probability. Second, MLD delivers a hard-decision estimate of the received sequence, so that information is lost between the input and output of the ML decoder. This information is important in coded schemes where the decoded sequence is further processed, such as concatenated coding schemes, multi-stage and iterative decoding schemes. In this chapter, we first present a decoding algorithm which both minimizes bit error probability, and provides the corresponding soft information at the output of the decoder. This algorithm is referred to as the MAP (maximum aposteriori probability) decoding algorithm.

  3. Automatic vehicle location system

    NASA Technical Reports Server (NTRS)

    Hansen, G. R., Jr. (Inventor)

    1973-01-01

    An automatic vehicle detection system is disclosed, in which each vehicle whose location is to be detected carries active means which interact with passive elements at each location to be identified. The passive elements comprise a plurality of passive loops arranged in a sequence along the travel direction. Each of the loops is tuned to a chosen frequency so that the sequence of the frequencies defines the location code. As the vehicle traverses the sequence of the loops as it passes over each loop, signals only at the frequency of the loop being passed over are coupled from a vehicle transmitter to a vehicle receiver. The frequencies of the received signals in the receiver produce outputs which together represent a code of the traversed location. The code location is defined by a painted pattern which reflects light to a vehicle carried detector whose output is used to derive the code defined by the pattern.

  4. Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India.

    PubMed

    Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

    2016-06-01

    Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58' N; 76° 31' E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species.

  5. Image Based Biomarker of Breast Cancer Risk: Analysis of Risk Disparity among Minority Populations

    DTIC Science & Technology

    2013-03-01

    TITLE: Image Based Biomarker of Breast Cancer Risk: Analysis of Risk Disparity among Minority Populations PRINCIPAL INVESTIGATOR: Fengshan Liu...SUBTITLE 5a. CONTRACT NUMBER Image Based Biomarker of Breast Cancer Risk: Analysis of Risk Disparity among Minority Populations 5b. GRANT NUMBER...identifying the prevalence of women with incomplete visualization of the breast . We developed a code to estimate the breast cancer risks using the

  6. A new theory of development: the generation of complexity in ontogenesis.

    PubMed

    Barbieri, Marcello

    2016-03-13

    Today there is a very wide consensus on the idea that embryonic development is the result of a genetic programme and of epigenetic processes. Many models have been proposed in this theoretical framework to account for the various aspects of development, and virtually all of them have one thing in common: they do not acknowledge the presence of organic codes (codes between organic molecules) in ontogenesis. Here it is argued instead that embryonic development is a convergent increase in complexity that necessarily requires organic codes and organic memories, and a few examples of such codes are described. This is the code theory of development, a theory that was originally inspired by an algorithm that is capable of reconstructing structures from incomplete information, an algorithm that here is briefly summarized because it makes it intuitively appealing how a convergent increase in complexity can be achieved. The main thesis of the new theory is that the presence of organic codes in ontogenesis is not only a theoretical necessity but, first and foremost, an idea that can be tested and that has already been found to be in agreement with the evidence. © 2016 The Author(s).

  7. Sequence differences in the diagnostic region of the cysteine protease 8 gene of Tritrichomonas foetus parasites of cats and cattle.

    PubMed

    Sun, Zichen; Stack, Colin; Šlapeta, Jan

    2012-05-25

    In order to investigate the genetic variation between Tritrichomonas foetus from bovine and feline origins, cysteine protease 8 (CP8) coding sequence was selected as the polymorphic DNA marker. Direct sequencing of CP8 coding sequence of T. foetus from four feline isolates and two bovine isolates with polymerase chain reaction successfully revealed conserved nucleotide polymorphisms between feline and bovine isolates. These results provide useful information for CP8-based molecular differentiation of T. foetus genotypes. Copyright © 2011 Elsevier B.V. All rights reserved.

  8. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

    PubMed

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-07-01

    EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.

  9. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  10. Tenebrio molitor antifreeze protein gene identification and regulation.

    PubMed

    Qin, Wensheng; Walker, Virginia K

    2006-02-15

    The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.

  11. What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?

    NASA Astrophysics Data System (ADS)

    Liebovitch, Larry

    1998-03-01

    The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.

  12. A Bioinformatics-Based Alternative mRNA Splicing Code that May Explain Some Disease Mutations Is Conserved in Animals.

    PubMed

    Qu, Wen; Cingolani, Pablo; Zeeberg, Barry R; Ruden, Douglas M

    2017-01-01

    Deep sequencing of cDNAs made from spliced mRNAs indicates that most coding genes in many animals and plants have pre-mRNA transcripts that are alternatively spliced. In pre-mRNAs, in addition to invariant exons that are present in almost all mature mRNA products, there are at least 6 additional types of exons, such as exons from alternative promoters or with alternative polyA sites, mutually exclusive exons, skipped exons, or exons with alternative 5' or 3' splice sites. Our bioinformatics-based hypothesis is that, in analogy to the genetic code, there is an "alternative-splicing code" in introns and flanking exon sequences, analogous to the genetic code, that directs alternative splicing of many of the 36 types of introns. In humans, we identified 42 different consensus sequences that are each present in at least 100 human introns. 37 of the 42 top consensus sequences are significantly enriched or depleted in at least one of the 36 types of introns. We further supported our hypothesis by showing that 96 out of 96 analyzed human disease mutations that affect RNA splicing, and change alternative splicing from one class to another, can be partially explained by a mutation altering a consensus sequence from one type of intron to that of another type of intron. Some of the alternative splicing consensus sequences, and presumably their small-RNA or protein targets, are evolutionarily conserved from 50 plant to animal species. We also noticed the set of introns within a gene usually share the same splicing codes, thus arguing that one sub-type of splicesosome might process all (or most) of the introns in a given gene. Our work sheds new light on a possible mechanism for generating the tremendous diversity in protein structure by alternative splicing of pre-mRNAs.

  13. Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

    PubMed Central

    Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2013-01-01

    Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856

  14. A splice variant in the ACSL5 gene relates migraine with fatty acid activation in mitochondria

    PubMed Central

    Matesanz, Fuencisla; Fedetz, María; Barrionuevo, Cristina; Karaky, Mohamad; Catalá-Rabasa, Antonio; Potenciano, Victor; Bello-Morales, Raquel; López-Guerrero, Jose-Antonio; Alcina, Antonio

    2016-01-01

    Genome-wide association studies (GWAS) in migraine are providing the molecular basis of this heterogeneous disease, but the understanding of its aetiology is still incomplete. Although some biomarkers have currently been accepted for migraine, large amount of studies for identifying new ones is needed. The migraine-associated variant rs12355831:A>G (P=2 × 10−6), described in a GWAS of the International Headache Genetic Consortium, is localized in a non-coding sequence with unknown function. We sought to identify the causal variant and the genetic mechanism involved in the migraine risk. To this end, we integrated data of RNA sequences from the Genetic European Variation in Health and Disease (GEUVADIS) and genotypes from 1000 GENOMES of 344 lymphoblastoid cell lines (LCLs), to determine the expression quantitative trait loci (eQTLs) in the region. We found that the migraine-associated variant belongs to a linkage disequilibrium block associated with the expression of an acyl-coenzyme A synthetase 5 (ACSL5) transcript lacking exon 20 (ACSL5-Δ20). We showed by exon-skipping assay a direct causality of rs2256368-G in the exon 20 skipping of approximately 20 to 40% of ACSL5 RNA molecules. In conclusion, we identified the functional variant (rs2256368:A>G) affecting ACSL5 exon 20 skipping, as a causal factor linked to the migraine-associated rs12355831:A>G, suggesting that the activation of long-chain fatty acids by the spliced ACSL5-Δ20 molecules, a mitochondrial located enzyme, is involved in migraine pathology. PMID:27189022

  15. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  16. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  17. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  18. RNAcentral: A comprehensive database of non-coding RNA sequences

    DOE PAGES

    Williams, Kelly Porter; Lau, Britney Yan

    2016-10-28

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less

  19. RNAcentral: A comprehensive database of non-coding RNA sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Kelly Porter; Lau, Britney Yan

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less

  20. A specific indel marker for the Philippines Schistosoma japonicum revealed by analysis of mitochondrial genome sequences.

    PubMed

    Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan

    2015-07-01

    In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.

  1. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  2. Code-modulated visual evoked potentials using fast stimulus presentation and spatiotemporal beamformer decoding.

    PubMed

    Wittevrongel, Benjamin; Van Wolputte, Elia; Van Hulle, Marc M

    2017-11-08

    When encoding visual targets using various lagged versions of a pseudorandom binary sequence of luminance changes, the EEG signal recorded over the viewer's occipital pole exhibits so-called code-modulated visual evoked potentials (cVEPs), the phase lags of which can be tied to these targets. The cVEP paradigm has enjoyed interest in the brain-computer interfacing (BCI) community for the reported high information transfer rates (ITR, in bits/min). In this study, we introduce a novel decoding algorithm based on spatiotemporal beamforming, and show that this algorithm is able to accurately identify the gazed target. Especially for a small number of repetitions of the coding sequence, our beamforming approach significantly outperforms an optimised support vector machine (SVM)-based classifier, which is considered state-of-the-art in cVEP-based BCI. In addition to the traditional 60 Hz stimulus presentation rate for the coding sequence, we also explore the 120 Hz rate, and show that the latter enables faster communication, with a maximal median ITR of 172.87 bits/min. Finally, we also report on a transition effect in the EEG signal following the onset of the stimulus sequence, and recommend to exclude the first 150 ms of the trials from decoding when relying on a single presentation of the stimulus sequence.

  3. Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

    PubMed

    Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

    2014-06-01

    It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.

  4. Identification of Putative Nuclear Receptors and Steroidogenic Enzymes in Murray-Darling Rainbowfish (Melanotaenia fluviatilis) Using RNA-Seq and De Novo Transcriptome Assembly.

    PubMed

    Bain, Peter A; Papanicolaou, Alexie; Kumar, Anupama

    2015-01-01

    Murray-Darling rainbowfish (Melanotaenia fluviatilis [Castelnau, 1878]; Atheriniformes: Melanotaeniidae) is a small-bodied teleost currently under development in Australasia as a test species for aquatic toxicological studies. To date, efforts towards the development of molecular biomarkers of contaminant exposure have been hindered by the lack of available sequence data. To address this, we sequenced messenger RNA from brain, liver and gonads of mature male and female fish and generated a high-quality draft transcriptome using a de novo assembly approach. 149,742 clusters of putative transcripts were obtained, encompassing 43,841 non-redundant protein-coding regions. Deduced amino acid sequences were annotated by functional inference based on similarity with sequences from manually curated protein sequence databases. The draft assembly contained protein-coding regions homologous to 95.7% of the complete cohort of predicted proteins from the taxonomically related species, Oryzias latipes (Japanese medaka). The mean length of rainbowfish protein-coding sequences relative to their medaka homologues was 92.1%, indicating that despite the limited number of tissues sampled a large proportion of the total expected number of protein-coding genes was captured in the study. Because of our interest in the effects of environmental contaminants on endocrine pathways, we manually curated subsets of coding regions for putative nuclear receptors and steroidogenic enzymes in the rainbowfish transcriptome, revealing 61 candidate nuclear receptors encompassing all known subfamilies, and 41 putative steroidogenic enzymes representing all major steroidogenic enzymes occurring in teleosts. The transcriptome presented here will be a valuable resource for researchers interested in biomarker development, protein structure and function, and contaminant-response genomics in Murray-Darling rainbowfish.

  5. Applications of statistical physics and information theory to the analysis of DNA sequences

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  6. Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

    PubMed

    Galián, José A; Rosato, Marcela; Rosselló, Josep A

    2014-03-01

    Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.

  7. Shared prefetching to reduce execution skew in multi-threaded systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eichenberger, Alexandre E; Gunnels, John A

    Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less

  8. Whole Exome Sequencing Identifies Rare Protein-Coding Variants in Behçet's Disease.

    PubMed

    Ognenovski, Mikhail; Renauer, Paul; Gensterblum, Elizabeth; Kötter, Ina; Xenitidis, Theodoros; Henes, Jörg C; Casali, Bruno; Salvarani, Carlo; Direskeneli, Haner; Kaufman, Kenneth M; Sawalha, Amr H

    2016-05-01

    Behçet's disease (BD) is a systemic inflammatory disease with an incompletely understood etiology. Despite the identification of multiple common genetic variants associated with BD, rare genetic variants have been less explored. We undertook this study to investigate the role of rare variants in BD by performing whole exome sequencing in BD patients of European descent. Whole exome sequencing was performed in a discovery set comprising 14 German BD patients of European descent. For replication and validation, Sanger sequencing and Sequenom genotyping were performed in the discovery set and in 2 additional independent sets of 49 German BD patients and 129 Italian BD patients of European descent. Genetic association analysis was then performed in BD patients and 503 controls of European descent. Functional effects of associated genetic variants were assessed using bioinformatic approaches. Using whole exome sequencing, we identified 77 rare variants (in 74 genes) with predicted protein-damaging effects in BD. These variants were genotyped in 2 additional patient sets and then analyzed to reveal significant associations with BD at 2 genetic variants detected in all 3 patient sets that remained significant after Bonferroni correction. We detected genetic association between BD and LIMK2 (rs149034313), involved in regulating cytoskeletal reorganization, and between BD and NEIL1 (rs5745908), involved in base excision DNA repair (P = 3.22 × 10(-4) and P = 5.16 × 10(-4) , respectively). The LIMK2 association is a missense variant with predicted protein damage that may influence functional interactions with proteins involved in cytoskeletal regulation by Rho GTPase, inflammation mediated by chemokine and cytokine signaling pathways, T cell activation, and angiogenesis (Bonferroni-corrected P = 5.63 × 10(-14) , P = 7.29 × 10(-6) , P = 1.15 × 10(-5) , and P = 6.40 × 10(-3) , respectively). The genetic association in NEIL1 is a predicted splice donor variant that may introduce a deleterious intron retention and result in a noncoding transcript variant. We used whole exome sequencing in BD for the first time and identified 2 rare putative protein-damaging genetic variants associated with this disease. These genetic variants might influence cytoskeletal regulation and DNA repair mechanisms in BD and might provide further insight into increased leukocyte tissue infiltration and the role of oxidative stress in BD. © 2016, American College of Rheumatology.

  9. Beyond barcoding: a mitochondrial genomics approach to molecular phylogenetics and diagnostics of blowflies (Diptera: Calliphoridae).

    PubMed

    Nelson, Leigh A; Lambkin, Christine L; Batterham, Philip; Wallman, James F; Dowton, Mark; Whiting, Michael F; Yeates, David K; Cameron, Stephen L

    2012-12-15

    Members of the Calliphoridae (blowflies) are significant for medical and veterinary management, due to the ability of some species to consume living flesh as larvae, and for forensic investigations due to the ability of others to develop in corpses. Due to the difficulty of accurately identifying larval blowflies to species there is a need for DNA-based diagnostics for this family, however the widely used DNA-barcoding marker, cox1, has been shown to fail for several groups within this family. Additionally, many phylogenetic relationships within the Calliphoridae are still unresolved, particularly deeper level relationships. Sequencing whole mt genomes has been demonstrated both as an effective method for identifying the most informative diagnostic markers and for resolving phylogenetic relationships. Twenty-seven complete, or nearly so, mt genomes were sequenced representing 13 species, seven genera and four calliphorid subfamilies and a member of the related family Tachinidae. PCR and sequencing primers developed for sequencing one calliphorid species could be reused to sequence related species within the same superfamily with success rates ranging from 61% to 100%, demonstrating the speed and efficiency with which an mt genome dataset can be assembled. Comparison of molecular divergences for each of the 13 protein-coding genes and 2 ribosomal RNA genes, at a range of taxonomic scales identified novel targets for developing as diagnostic markers which were 117-200% more variable than the markers which have been used previously in calliphorids. Phylogenetic analysis of whole mt genome sequences resulted in much stronger support for family and subfamily-level relationships. The Calliphoridae are polyphyletic, with the Polleninae more closely related to the Tachinidae, and the Sarcophagidae are the sister group of the remaining calliphorids. Within the Calliphoridae, there was strong support for the monophyly of the Chrysomyinae and Luciliinae and for the sister-grouping of Luciliinae with Calliphorinae. Relationships within Chrysomya were not well resolved. Whole mt genome data, supported the previously demonstrated paraphyly of Lucilia cuprina with respect to L. sericata and allowed us to conclude that it is due to hybrid introgression prior to the last common ancestor of modern sericata populations, rather than due to recent hybridisation, nuclear pseudogenes or incomplete lineage sorting. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. The evolution of transcriptional regulation in eukaryotes

    NASA Technical Reports Server (NTRS)

    Wray, Gregory A.; Hahn, Matthew W.; Abouheif, Ehab; Balhoff, James P.; Pizer, Margaret; Rockman, Matthew V.; Romano, Laura A.

    2003-01-01

    Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. Here we review the evolutionary dynamics of promoter, or cis-regulatory, sequences and the evolutionary mechanisms that shape them. Existing evidence indicates that populations harbor extensive genetic variation in promoter sequences, that a substantial fraction of this variation has consequences for both biochemical and organismal phenotype, and that some of this functional variation is sorted by selection. As with protein-coding sequences, rates and patterns of promoter sequence evolution differ considerably among loci and among clades for reasons that are not well understood. Studying the evolution of transcriptional regulation poses empirical and conceptual challenges beyond those typically encountered in analyses of coding sequence evolution: promoter organization is much less regular than that of coding sequences, and sequences required for the transcription of each locus reside at multiple other loci in the genome. Because of the strong context-dependence of transcriptional regulation, sequence inspection alone provides limited information about promoter function. Understanding the functional consequences of sequence differences among promoters generally requires biochemical and in vivo functional assays. Despite these challenges, important insights have already been gained into the evolution of transcriptional regulation, and the pace of discovery is accelerating.

  11. Convolutional encoding of self-dual codes

    NASA Technical Reports Server (NTRS)

    Solomon, G.

    1994-01-01

    There exist almost complete convolutional encodings of self-dual codes, i.e., block codes of rate 1/2 with weights w, w = 0 mod 4. The codes are of length 8m with the convolutional portion of length 8m-2 and the nonsystematic information of length 4m-1. The last two bits are parity checks on the two (4m-1) length parity sequences. The final information bit complements one of the extended parity sequences of length 4m. Solomon and van Tilborg have developed algorithms to generate these for the Quadratic Residue (QR) Codes of lengths 48 and beyond. For these codes and reasonable constraint lengths, there are sequential decodings for both hard and soft decisions. There are also possible Viterbi-type decodings that may be simple, as in a convolutional encoding/decoding of the extended Golay Code. In addition, the previously found constraint length K = 9 for the QR (48, 24;12) Code is lowered here to K = 8.

  12. High incidence of unrecognized visceral/neurological late-onset Niemann-Pick disease, type C1, predicted by analysis of massively parallel sequencing data sets.

    PubMed

    Wassif, Christopher A; Cross, Joanna L; Iben, James; Sanchez-Pulido, Luis; Cougnoux, Antony; Platt, Frances M; Ory, Daniel S; Ponting, Chris P; Bailey-Wilson, Joan E; Biesecker, Leslie G; Porter, Forbes D

    2016-01-01

    Niemann-Pick disease type C (NPC) is a recessive, neurodegenerative, lysosomal storage disease caused by mutations in either NPC1 or NPC2. The diagnosis is difficult and frequently delayed. Ascertainment is likely incomplete because of both these factors and because the full phenotypic spectrum may not have been fully delineated. Given the recent development of a blood-based diagnostic test and the development of potential therapies, understanding the incidence of NPC and defining at-risk patient populations are important. We evaluated data from four large, massively parallel exome sequencing data sets. Variant sequences were identified and classified as pathogenic or nonpathogenic based on a combination of literature review and bioinformatic analysis. This methodology provided an unbiased approach to determining the allele frequency. Our data suggest an incidence rate for NPC1 and NPC2 of 1/92,104 and 1/2,858,998, respectively. Evaluation of common NPC1 variants, however, suggests that there may be a late-onset NPC1 phenotype with a markedly higher incidence, on the order of 1/19,000-1/36,000. We determined a combined incidence of classical NPC of 1/89,229, or 1.12 affected patients per 100,000 conceptions, but predict incomplete ascertainment of a late-onset phenotype of NPC1. This finding strongly supports the need for increased screening of potential patients.

  13. Gene and translation initiation site prediction in metagenomic sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less

  14. Complete genome sequencing and analysis of Saprospira grandis str. Lewin, a predatory marine bacterium

    PubMed Central

    Saw, Jimmy H. W.; Yuryev, Anton; Kanbe, Masaomi; Hou, Shaobin; Young, Aaron G.; Aizawa, Shin-Ichi

    2012-01-01

    Saprospira grandis is a coastal marine bacterium that can capture and prey upon other marine bacteria using a mechanism known as ‘ixotrophy’. Here, we present the complete genome sequence of Saprospira grandis str. Lewin isolated from La Jolla beach in San Diego, California. The complete genome sequence comprises a chromosome of 4.35 Mbp and a plasmid of 54.9 Kbp. Genome analysis revealed incomplete pathways for the biosynthesis of nine essential amino acids but presence of a large number of peptidases. The genome encodes multiple copies of sensor globin-coupled rsbR genes thought to be essential for stress response and the presence of such sensor globins in Bacteroidetes is unprecedented. A total of 429 spacer sequences within the three CRISPR repeat regions were identified in the genome and this number is the largest among all the Bacteroidetes sequenced to date. PMID:22675601

  15. Sequencing-based diagnostics for pediatric genetic diseases: progress and potential

    PubMed Central

    Tayoun, Ahmad Abou; Krock, Bryan; Spinner, Nancy B.

    2016-01-01

    Introduction The last two decades have witnessed revolutionary changes in clinical diagnostics, fueled by the Human Genome Project and advances in high throughput, Next Generation Sequencing (NGS). We review the current state of sequencing-based pediatric diagnostics, associated challenges, and future prospects. Areas Covered We present an overview of genetic disease in children, review the technical aspects of Next Generation Sequencing and the strategies to make molecular diagnoses for children with genetic disease. We discuss the challenges of genomic sequencing including incomplete current knowledge of variants, lack of data about certain genomic regions, mosaicism, and the presence of regions with high homology. Expert Commentary NGS has been a transformative technology and the gap between the research and clinical communities has never been so narrow. Therapeutic interventions are emerging based on genomic findings and the applications of NGS are progressing to prenatal genetics, epigenomics and transcriptomics. PMID:27388938

  16. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.

    PubMed

    Hua, Wei; Wang, Jiasong; Zhao, Jian

    2014-01-01

    Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. The Evolution of Bony Vertebrate Enhancers at Odds with Their Coding Sequence Landscape.

    PubMed

    Yousaf, Aisha; Sohail Raza, Muhammad; Ali Abbasi, Amir

    2015-08-06

    Enhancers lie at the heart of transcriptional and developmental gene regulation. Therefore, changes in enhancer sequences usually disrupt the target gene expression and result in disease phenotypes. Despite the well-established role of enhancers in development and disease, evolutionary sequence studies are lacking. The current study attempts to unravel the puzzle of bony vertebrates' conserved noncoding elements (CNE) enhancer evolution. Bayesian phylogenetics of enhancer sequences spotlights promising interordinal relationships among placental mammals, proposing a closer relationship between humans and laurasiatherians while placing rodents at the basal position. Clock-based estimates of enhancer evolution provided a dynamic picture of interspecific rate changes across the bony vertebrate lineage. Moreover, coelacanth in the study augmented our appreciation of the vertebrate cis-regulatory evolution during water-land transition. Intriguingly, we observed a pronounced upsurge in enhancer evolution in land-dwelling vertebrates. These novel findings triggered us to further investigate the evolutionary trend of coding as well as CNE nonenhancer repertoires, to highlight the relative evolutionary dynamics of diverse genomic landscapes. Surprisingly, the evolutionary rates of enhancer sequences were clearly at odds with those of the coding and the CNE nonenhancer sequences during vertebrate adaptation to land, with land vertebrates exhibiting significantly reduced rates of coding sequence evolution in comparison to their fast evolving regulatory landscape. The observed variation in tetrapod cis-regulatory elements caused the fine-tuning of associated gene regulatory networks. Therefore, the increased evolutionary rate of tetrapods' enhancer sequences might be responsible for the variation in developmental regulatory circuits during the process of vertebrate adaptation to land. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...

  19. Characterization of apple replant disease-associated microbial communities over multiple growth periods using next-generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Replant disease in apple occurs as a result of incompletely understood and variable complexes of soil-borne pathogens that can build up over time in orchard soil. This disease limits economic viability of newly established orchards on replant sites and results in reduced productivity for the life of...

  20. Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb

    2011-10-28

    Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less

  1. Novel numerical and graphical representation of DNA sequences and proteins.

    PubMed

    Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

    2006-12-01

    We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.

  2. Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

    PubMed

    Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

    2012-07-01

    Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.

  3. Sense-antisense (complementary) peptide interactions and the proteomic code; potential opportunities in biology and pharmaceutical science.

    PubMed

    Miller, Andrew D

    2015-02-01

    A sense peptide can be defined as a peptide whose sequence is coded by the nucleotide sequence (read 5' → 3') of the sense (positive) strand of DNA. Conversely, an antisense (complementary) peptide is coded by the corresponding nucleotide sequence (read 5' → 3') of the antisense (negative) strand of DNA. Research has been accumulating steadily to suggest that sense peptides are capable of specific interactions with their corresponding antisense peptides. Unfortunately, although more and more examples of specific sense-antisense peptide interactions are emerging, the very idea of such interactions does not conform to standard biology dogma and so there remains a sizeable challenge to lift this concept from being perceived as a peripheral phenomenon if not worse, into becoming part of the scientific mainstream. Specific interactions have now been exploited for the inhibition of number of widely different protein-protein and protein-receptor interactions in vitro and in vivo. Further, antisense peptides have also been used to induce the production of antibodies targeted to specific receptors or else the production of anti-idiotypic antibodies targeted against auto-antibodies. Such illustrations of utility would seem to suggest that observed sense-antisense peptide interactions are not just the consequence of a sequence of coincidental 'lucky-hits'. Indeed, at the very least, one might conclude that sense-antisense peptide interactions represent a potentially new and different source of leads for drug discovery. But could there be more to come from studies in this area? Studies on the potential mechanism of sense-antisense peptide interactions suggest that interactions may be driven by amino acid residue interactions specified from the genetic code. If so, such specified amino acid residue interactions could form the basis for an even wider amino acid residue interaction code (proteomic code) that links gene sequences to actual protein structure and function, even entire genomes to entire proteomes. The possibility that such a proteomic code should exist is discussed. So too the potential implications for biology and pharmaceutical science are also discussed were such a code to exist.

  4. Validity of data in the Danish Colorectal Cancer Screening Database

    PubMed Central

    Thomsen, Mette Kielsholm; Njor, Sisse Helle; Rasmussen, Morten; Linnemann, Dorte; Andersen, Berit; Baatrup, Gunnar; Friis-Hansen, Lennart Jan; Jørgensen, Jens Christian Riis; Mikkelsen, Ellen Margrethe

    2017-01-01

    Background In Denmark, a nationwide screening program for colorectal cancer was implemented in March 2014. Along with this, a clinical database for program monitoring and research purposes was established. Objective The aim of this study was to estimate the agreement and validity of diagnosis and procedure codes in the Danish Colorectal Cancer Screening Database (DCCSD). Methods All individuals with a positive immunochemical fecal occult blood test (iFOBT) result who were invited to screening in the first 3 months since program initiation were identified. From these, a sample of 150 individuals was selected using stratified random sampling by age, gender and region of residence. Data from the DCCSD were compared with data from hospital records, which were used as the reference. Agreement, sensitivity, specificity and positive and negative predictive values were estimated for categories of codes “clean colon”, “colonoscopy performed”, “overall completeness of colonoscopy”, “incomplete colonoscopy”, “polypectomy”, “tumor tissue left behind”, “number of polyps”, “lost polyps”, “risk group of polyps” and “colorectal cancer and polyps/benign tumor”. Results Hospital records were available for 136 individuals. Agreement was highest for “colorectal cancer” (97.1%) and lowest for “lost polyps” (88.2%). Sensitivity varied between moderate and high, with 60.0% for “incomplete colonoscopy” and 98.5% for “colonoscopy performed”. Specificity was 92.7% or above, except for the categories “colonoscopy performed” and “overall completeness of colonoscopy”, where the specificity was low; however, the estimates were imprecise. Conclusion A high level of agreement between categories of codes in DCCSD and hospital records indicates that DCCSD reflects the hospital records well. Further, the validity of the categories of codes varied from moderate to high. Thus, the DCCSD may be a valuable data source for future research on colorectal cancer screening. PMID:28255255

  5. Missed surgical intensive care unit billing: potential financial impact of 24/7 faculty presence.

    PubMed

    Hendershot, Kimberly M; Bollins, John P; Armen, Scott B; Thomas, Yalaunda M; Steinberg, Steven M; Cook, Charles H

    2009-07-01

    To efficiently capture evaluation and management (E&M) and procedural billing in our surgical intensive care unit (SICU), we have developed an electronic billing system that links to the electronic medical record (EMR). In this system, only notes electronically signed and coded by an attending generate billing charges. We hypothesized that capture of missed billing during nighttime and weekends might be sufficient to subsidize 24/7 in-house attending coverage. A retrospective chart EMR review was performed of the EMRs for all SICU patients during a 2-month period. Note type, date, time, attending signature, and coding were analyzed. Notes without attending signature, diagnosis, or current procedural terminology (CPT) code were considered incomplete and identified as "missed billing." Four hundred and forty-three patients had 465 admissions generating 2,896 notes. Overall, 76% of notes were signed and coded by an attending and billed. Incomplete (not billed) notes represented an overall missed billing opportunity of $159,138 for the 2-month time period (approximately $954,000 annually). Unbilled E&M encounters during weekdays totaled $54,758, whereas unbilled E&M and procedures from weeknights and weekends totaled $88,408 ($44,566 and $43,842, respectively). Missed billing after-hours thus represents approximately $530K annually, extrapolating to approximately $220K in collections from our payer mix. Surprisingly, missed E&M and procedural billing during weekdays totaled $70,730 (approximately $425K billing, approximately $170K collections annually), and typically represented patients seen, but transferred from the SICU before attending documentation was completed. Capture of nighttime and weekend ICU collections alone may be insufficient to add faculty or incentivize in-house coverage, but could certainly complement other in-house derived revenues to such ends. In addition, missed daytime billing in busy modern ICUs can be substantial, and use of an EMR to identify missed billing opportunities can help create solutions to recover these revenues.

  6. Origins of genes: "big bang" or continuous creation?

    PubMed Central

    Keese, P K; Gibbs, A

    1992-01-01

    Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes. PMID:1329098

  7. Abstract feature codes: The building blocks of the implicit learning system.

    PubMed

    Eberhardt, Katharina; Esser, Sarah; Haider, Hilde

    2017-07-01

    According to the Theory of Event Coding (TEC; Hommel, Müsseler, Aschersleben, & Prinz, 2001), action and perception are represented in a shared format in the cognitive system by means of feature codes. In implicit sequence learning research, it is still common to make a conceptual difference between independent motor and perceptual sequences. This supposedly independent learning takes place in encapsulated modules (Keele, Ivry, Mayr, Hazeltine, & Heuer 2003) that process information along single dimensions. These dimensions have remained underspecified so far. It is especially not clear whether stimulus and response characteristics are processed in separate modules. Here, we suggest that feature dimensions as they are described in the TEC should be viewed as the basic content of modules of implicit learning. This means that the modules process all stimulus and response information related to certain feature dimensions of the perceptual environment. In 3 experiments, we investigated by means of a serial reaction time task the nature of the basic units of implicit learning. As a test case, we used stimulus location sequence learning. The results show that a stimulus location sequence and a response location sequence cannot be learned without interference (Experiment 2) unless one of the sequences can be coded via an alternative, nonspatial dimension (Experiment 3). These results support the notion that spatial location is one module of the implicit learning system and, consequently, that there are no separate processing units for stimulus versus response locations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. Adaptive decoding of convolutional codes

    NASA Astrophysics Data System (ADS)

    Hueske, K.; Geldmacher, J.; Götze, J.

    2007-06-01

    Convolutional codes, which are frequently used as error correction codes in digital transmission systems, are generally decoded using the Viterbi Decoder. On the one hand the Viterbi Decoder is an optimum maximum likelihood decoder, i.e. the most probable transmitted code sequence is obtained. On the other hand the mathematical complexity of the algorithm only depends on the used code, not on the number of transmission errors. To reduce the complexity of the decoding process for good transmission conditions, an alternative syndrome based decoder is presented. The reduction of complexity is realized by two different approaches, the syndrome zero sequence deactivation and the path metric equalization. The two approaches enable an easy adaptation of the decoding complexity for different transmission conditions, which results in a trade-off between decoding complexity and error correction performance.

  9. Optimization of algorithm of coding of genetic information of Chlamydia

    NASA Astrophysics Data System (ADS)

    Feodorova, Valentina A.; Ulyanov, Sergey S.; Zaytsev, Sergey S.; Saltykov, Yury V.; Ulianova, Onega V.

    2018-04-01

    New method of coding of genetic information using coherent optical fields is developed. Universal technique of transformation of nucleotide sequences of bacterial gene into laser speckle pattern is suggested. Reference speckle patterns of the nucleotide sequences of omp1 gene of typical wild strains of Chlamydia trachomatis of genovars D, E, F, G, J and K and Chlamydia psittaci serovar I as well are generated. Algorithm of coding of gene information into speckle pattern is optimized. Fully developed speckles with Gaussian statistics for gene-based speckles have been used as criterion of optimization.

  10. Simulated Assessment of Interference Effects in Direct Sequence Spread Spectrum (DSSS) QPSK Receiver

    DTIC Science & Technology

    2014-03-27

    bit error rate BPSK binary phase shift keying CDMA code division multiple access CSI comb spectrum interference CW continuous wave DPSK differential... CDMA ) and GPS systems which is a Gold code. This code is generated by a modulo-2 operation between two different preferred m-sequences. The preferred m...10 SNR Sim (dB) S N R O ut ( dB ) SNR RF SNR DS Figure 3.26: Comparison of input S NRS im and S NROut of the band-pass RF filter (S NRRF) and

  11. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  12. Physics behind the mechanical nucleosome positioning code

    NASA Astrophysics Data System (ADS)

    Zuiddam, Martijn; Everaers, Ralf; Schiessel, Helmut

    2017-11-01

    The positions along DNA molecules of nucleosomes, the most abundant DNA-protein complexes in cells, are influenced by the sequence-dependent DNA mechanics and geometry. This leads to the "nucleosome positioning code", a preference of nucleosomes for certain sequence motives. Here we introduce a simplified model of the nucleosome where a coarse-grained DNA molecule is frozen into an idealized superhelical shape. We calculate the exact sequence preferences of our nucleosome model and find it to reproduce qualitatively all the main features known to influence nucleosome positions. Moreover, using well-controlled approximations to this model allows us to come to a detailed understanding of the physics behind the sequence preferences of nucleosomes.

  13. EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

    PubMed Central

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-01-01

    EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  14. The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.

    PubMed Central

    Gustafson, G; Armour, S L

    1986-01-01

    The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962

  15. Coding, Organization and Feedback Variables in Motor Skills.

    DTIC Science & Technology

    1982-04-01

    teachers) as anyone else--has been its nondirectional and incompletely conceptualized nature . Those involved in research now are being urged to avoid...functional evaluations. It constitutes more than simply a methodology; it is an ideology for studying ’how things work’ and by its nature draws on many...not necessarily dependent on the physical nature of the system. It furnishes a superstructure for interpreting and comparing input from a multitude of

  16. Misclassification of childhood homicide on death certificates.

    PubMed Central

    Lapidus, G D; Gregorio, D I; Hansen, H

    1990-01-01

    Suspect classification of homicide deaths of Connecticut residents under 20 years of age was noted for 29 percent of cases examined. Misclassification was attributed to incomplete or erroneous information recorded on the death certificates, rather than errors in the designation of ICD-9 homicide codes. The results have important implications in the interpretation of vital statistics when homicide is listed as the cause of death and underscore the value of record linkage systems. PMID:2297072

  17. Colour cyclic code for Brillouin distributed sensors

    NASA Astrophysics Data System (ADS)

    Le Floch, Sébastien; Sauser, Florian; Llera, Miguel; Rochat, Etienne

    2015-09-01

    For the first time, a colour cyclic coding (CCC) is theoretically and experimentally demonstrated for Brillouin optical time-domain analysis (BOTDA) distributed sensors. Compared to traditional intensity-modulated cyclic codes, the code presents an additional gain of √2 while keeping the same number of sequences as for a colour coding. A comparison with a standard BOTDA sensor is realized and validates the theoretical coding gain.

  18. Delineating slowly and rapidly evolving fractions of the Drosophila genome.

    PubMed

    Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

    2008-05-01

    Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.

  19. New encoded single-indicator sequences based on physico-chemical parameters for efficient exon identification.

    PubMed

    Meher, J K; Meher, P K; Dash, G N; Raval, M K

    2012-01-01

    The first step in gene identification problem based on genomic signal processing is to convert character strings into numerical sequences. These numerical sequences are then analysed spectrally or using digital filtering techniques for the period-3 peaks, which are present in exons (coding areas) and absent in introns (non-coding areas). In this paper, we have shown that single-indicator sequences can be generated by encoding schemes based on physico-chemical properties. Two new methods are proposed for generating single-indicator sequences based on hydration energy and dipole moments. The proposed methods produce high peak at exon locations and effectively suppress false exons (intron regions having greater peak than exon regions) resulting in high discriminating factor, sensitivity and specificity.

  20. TANDEM: matching proteins with tandem mass spectra.

    PubMed

    Craig, Robertson; Beavis, Ronald C

    2004-06-12

    Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.

  1. Draft Genome Sequence of the Deinococcus-Thermus Bacterium Meiothermus ruber Strain A

    DOE PAGES

    Thiel, Vera; Tomsho, Lynn P.; Burhans, Richard; ...

    2015-03-26

    The draft genome sequence of the Deinococcus-Thermus group bacterium Meiothermus ruber strain A, isolated from a cyanobacterial enrichment culture obtained from Octopus Spring (Yellowstone National Park, WY), comprises 2,968,099 bp in 170 contigs. It is predicted to contain 2,895 protein-coding genes, 44 tRNA-coding genes, and 2 rRNA operons.

  2. Draft Genome Sequence of Staphylococcus cohnii subsp. urealyticus Isolated from a Healthy Dog

    PubMed Central

    Wigmore, Sarah M.; Wareham, David W.

    2017-01-01

    ABSTRACT   Staphylococcus cohnii subsp. urealyticus strain SW120 was isolated from the ear swab of a healthy dog. The isolate is resistant to methicillin and fusidic acid. The SW120 draft genome is 2,805,064 bp and contains 2,667 coding sequences, including 58 tRNAs and nine complete rRNA coding regions. PMID:28209829

  3. Draft Genome Sequence of a Canine Isolate of Methicillin-Resistant Staphylococcus haemolyticus

    PubMed Central

    Wigmore, Sarah M.; Wareham, David W.

    2017-01-01

    ABSTRACT Staphylococcus haemolyticus strain SW007 was isolated from a nasal swab taken from a healthy dog. The isolate is resistant to methicillin, mupirocin, macrolides, and sulfonamides. The SW007 draft genome is 2,325,410 bp and contains 2,277 coding sequences, including 60 tRNAs and nine complete rRNA-coding regions. PMID:28385855

  4. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    USDA-ARS?s Scientific Manuscript database

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  5. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  6. Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

    PubMed Central

    Kikhno, Irina

    2014-01-01

    Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153

  7. Characterization of the complete mitochondrial genome of Marshallagia marshalli and phylogenetic implications for the superfamily Trichostrongyloidea.

    PubMed

    Sun, Miao-Miao; Han, Liang; Zhang, Fu-Kai; Zhou, Dong-Hui; Wang, Shu-Qing; Ma, Jun; Zhu, Xing-Quan; Liu, Guo-Hua

    2018-01-01

    Marshallagia marshalli (Nematoda: Trichostrongylidae) infection can lead to serious parasitic gastroenteritis in sheep, goat, and wild ruminant, causing significant socioeconomic losses worldwide. Up to now, the study concerning the molecular biology of M. marshalli is limited. Herein, we sequenced the complete mitochondrial (mt) genome of M. marshalli and examined its phylogenetic relationship with selected members of the superfamily Trichostrongyloidea using Bayesian inference (BI) based on concatenated mt amino acid sequence datasets. The complete mt genome sequence of M. marshalli is 13,891 bp, including 12 protein-coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes. All protein-coding genes are transcribed in the same direction. Phylogenetic analyses based on concatenated amino acid sequences of the 12 protein-coding genes supported the monophylies of the families Haemonchidae, Molineidae, and Dictyocaulidae with strong statistical support, but rejected the monophyly of the family Trichostrongylidae. The determination of the complete mt genome sequence of M. marshalli provides novel genetic markers for studying the systematics, population genetics, and molecular epidemiology of M. marshalli and its congeners.

  8. The mitochondrial genome of the multicolored Asian lady beetle Harmonia axyridis (Pallas) and a phylogenetic analysis of the Polyphaga (Insecta: Coleoptera).

    PubMed

    Niu, Fang-Fang; Zhu, Liang; Wang, Su; Wei, Shu-Jun

    2016-07-01

    Here, we report the mitochondrial genome sequence of the multicolored Asian lady beetle Harmonia axyridis (Pallas, 1773) (Coleoptera: Coccinellidae) (GenBank accession No. KR108208). This is the first species with sequenced mitochondrial genome from the genus Harmonia. The current length with partitial A + T-rich region of this mitochondrial genome is 16,387 bp. All the typical genes were sequenced except the trnI and trnQ. As in most other sequenced mitochondrial genomes of Coleoptera, there is no re-arrangement in the sequenced region compared with the pupative ancestral arrangement of insects. All protein-coding genes start with ATN codons. Five, five and three protein-coding genes stop with termination codon TAA, TA and T, respectively. Phylogenetic analysis using Bayesian method based on the first and second codon positions of the protein-coding genes supported that the Scirtidae is a basal lineage of Polyphaga. The Harmonia and the Coccinella form a sister lineage. The monophyly of Staphyliniformia, Scarabaeiformia and Cucujiformia was supported. The Buprestidae was found to be a sister group to the Bostrichiformia.

  9. A Sabin 2-related poliovirus recombinant contains a homologous sequence of human enterovirus species C in the viral polymerase coding region.

    PubMed

    Zhang, Yong; Zhang, Fan; Zhu, Shuangli; Chen, Li; Yan, Dongmei; Wang, Dongyan; Tang, Ruiyan; Zhu, Hui; Hou, Xiaohui; An, Hongqiu; Zhang, Hong; Xu, Wenbo

    2010-02-01

    A type 2 vaccine-related poliovirus (strain CHN3024), differing from the Sabin 2 strain by 0.44% in the VP1 coding region was isolated from a patient with vaccine-associated paralytic poliomyelitis. Sequences downstream of nucleotide position 6735 (3D(pol) coding region) were derived from an unidentified sequence; no close match for a potential parent was found, but it could be classified into a non-polio human enteroviruses species C (HEV-C) phylogeny. The virus differed antigenically from the parental Sabin strain, having an amino acid substitution in the neutralizing antigenic site 1. The similarity between CHN3024 and Sabin 2 sequences suggests that the recombination was recent; this is supported by the estimation that the initiating OPV dose was given only 36-75 days before sampling. The patient's clinical manifestations, intratypic differentiation examination, and whole-genome sequencing showed that this recombinant exhibited characteristics of neurovirulent vaccine-derived polioviruses (VDPV), which may, thus, pose a potential threat to a polio-free world.

  10. Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

    PubMed

    Shao, Renfu; Barker, Stephen C

    2011-02-15

    The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.

  11. Cloning and Sequencing of Defective Particles Derived from the Autonomous Parvovirus Minute Virus of Mice for the Construction of Vectors with Minimal cis-Acting Sequences

    PubMed Central

    Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick

    2001-01-01

    The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501

  12. Next generation sequencing analysis reveals a relationship between rDNA unit diversity and locus number in Nicotiana diploids

    PubMed Central

    2012-01-01

    Background Tandemly arranged nuclear ribosomal DNA (rDNA), encoding 18S, 5.8S and 26S ribosomal RNA (rRNA), exhibit concerted evolution, a pattern thought to result from the homogenisation of rDNA arrays. However rDNA homogeneity at the single nucleotide polymorphism (SNP) level has not been detailed in organisms with more than a few hundred copies of the rDNA unit. Here we study rDNA complexity in species with arrays consisting of thousands of units. Methods We examined homogeneity of genic (18S) and non-coding internally transcribed spacer (ITS1) regions of rDNA using Roche 454 and/or Illumina platforms in four angiosperm species, Nicotiana sylvestris, N. tomentosiformis, N. otophora and N. kawakamii. We compared the data with Southern blot hybridisation revealing the structure of intergenic spacer (IGS) sequences and with the number and distribution of rDNA loci. Results and Conclusions In all four species the intragenomic homogeneity of the 18S gene was high; a single ribotype makes up over 90% of the genes. However greater variation was observed in the ITS1 region, particularly in species with two or more rDNA loci, where >55% of rDNA units were a single ribotype, with the second most abundant variant accounted for >18% of units. IGS heterogeneity was high in all species. The increased number of ribotypes in ITS1 compared with 18S sequences may reflect rounds of incomplete homogenisation with strong selection for functional genic regions and relaxed selection on ITS1 variants. The relationship between the number of ITS1 ribotypes and the number of rDNA loci leads us to propose that rDNA evolution and complexity is influenced by locus number and/or amplification of orphaned rDNA units at new chromosomal locations. PMID:23259460

  13. A comprehensive collection of annotations to interpret sequence variation in human mitochondrial transfer RNAs.

    PubMed

    Diroma, Maria Angela; Lubisco, Paolo; Attimonelli, Marcella

    2016-11-08

    The abundance of biological data characterizing the genomics era is contributing to a comprehensive understanding of human mitochondrial genetics. Nevertheless, many aspects are still unclear, specifically about the variability of the 22 human mitochondrial transfer RNA (tRNA) genes and their involvement in diseases. The complex enrichment and isolation of tRNAs in vitro leads to an incomplete knowledge of their post-transcriptional modifications and three-dimensional folding, essential for correct tRNA functioning. An accurate annotation of mitochondrial tRNA variants would be definitely useful and appreciated by mitochondrial researchers and clinicians since the most of bioinformatics tools for variant annotation and prioritization available so far cannot shed light on the functional role of tRNA variations. To this aim, we updated our MToolBox pipeline for mitochondrial DNA analysis of high throughput and Sanger sequencing data by integrating tRNA variant annotations in order to identify and characterize relevant variants not only in protein coding regions, but also in tRNA genes. The annotation step in the pipeline now provides detailed information for variants mapping onto the 22 mitochondrial tRNAs. For each mt-tRNA position along the entire genome, the relative tRNA numbering, tRNA type, cloverleaf secondary domains (loops and stems), mature nucleotide and interactions in the three-dimensional folding were reported. Moreover, pathogenicity predictions for tRNA and rRNA variants were retrieved from the literature and integrated within the annotations provided by MToolBox, both in the stand-alone version and web-based tool at the Mitochondrial Disease Sequence Data Resource (MSeqDR) website. All the information available in the annotation step of MToolBox were exploited to generate custom tracks which can be displayed in the GBrowse instance at MSeqDR website. To the best of our knowledge, specific data regarding mitochondrial variants in tRNA genes were introduced for the first time in a tool for mitochondrial genome analysis, supporting the interpretation of genetic variants in specific genomic contexts.

  14. Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India).

    PubMed

    Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

    2016-03-01

    16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85'N and 78°25'E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/.

  15. On Asymptotically Good Ramp Secret Sharing Schemes

    NASA Astrophysics Data System (ADS)

    Geil, Olav; Martin, Stefano; Martínez-Peñas, Umberto; Matsumoto, Ryutaroh; Ruano, Diego

    Asymptotically good sequences of linear ramp secret sharing schemes have been intensively studied by Cramer et al. in terms of sequences of pairs of nested algebraic geometric codes. In those works the focus is on full privacy and full reconstruction. In this paper we analyze additional parameters describing the asymptotic behavior of partial information leakage and possibly also partial reconstruction giving a more complete picture of the access structure for sequences of linear ramp secret sharing schemes. Our study involves a detailed treatment of the (relative) generalized Hamming weights of the considered codes.

  16. Statistical and linguistic features of DNA sequences

    NASA Technical Reports Server (NTRS)

    Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.

  17. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  18. Kangaroo – A pattern-matching program for biological sequences

    PubMed Central

    2002-01-01

    Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718

  19. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

    PubMed

    Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

    2010-12-15

    Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  20. The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

    PubMed

    Shen, Jinhui; Cong, Qian; Grishin, Nick V

    2015-09-01

    Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.

  1. Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India

    PubMed Central

    Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.

    2016-01-01

    Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58′ N; 76° 31′ E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species. PMID:27141529

  2. Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.

    PubMed

    Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel

    2013-09-01

    RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  3. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  4. Efficient burst image compression using H.265/HEVC

    NASA Astrophysics Data System (ADS)

    Roodaki-Lavasani, Hoda; Lainema, Jani

    2014-02-01

    New imaging use cases are emerging as more powerful camera hardware is entering consumer markets. One family of such use cases is based on capturing multiple pictures instead of just one when taking a photograph. That kind of a camera operation allows e.g. selecting the most successful shot from a sequence of images, showing what happened right before or after the shot was taken or combining the shots by computational means to improve either visible characteristics of the picture (such as dynamic range or focus) or the artistic aspects of the photo (e.g. by superimposing pictures on top of each other). Considering that photographic images are typically of high resolution and quality and the fact that these kind of image bursts can consist of at least tens of individual pictures, an efficient compression algorithm is desired. However, traditional video coding approaches fail to provide the random access properties these use cases require to achieve near-instantaneous access to the pictures in the coded sequence. That feature is critical to allow users to browse the pictures in an arbitrary order or imaging algorithms to extract desired pictures from the sequence quickly. This paper proposes coding structures that provide such random access properties while achieving coding efficiency superior to existing image coders. The results indicate that using HEVC video codec with a single reference picture fixed for the whole sequence can achieve nearly as good compression as traditional IPPP coding structures. It is also shown that the selection of the reference frame can further improve the coding efficiency.

  5. DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information

    ERIC Educational Resources Information Center

    McCallister, Gary

    2005-01-01

    The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)

  6. High-Content Optical Codes for Protecting Rapid Diagnostic Tests from Counterfeiting.

    PubMed

    Gökçe, Onur; Mercandetti, Cristina; Delamarche, Emmanuel

    2018-06-19

    Warnings and reports on counterfeit diagnostic devices are released several times a year by regulators and public health agencies. Unfortunately, mishandling, altering, and counterfeiting point-of-care diagnostics (POCDs) and rapid diagnostic tests (RDTs) is lucrative, relatively simple and can lead to devastating consequences. Here, we demonstrate how to implement optical security codes in silicon- and nitrocellulose-based flow paths for device authentication using a smartphone. The codes are created by inkjet spotting inks directly on nitrocellulose or on micropillars. Codes containing up to 32 elements per mm 2 and 8 colors can encode as many as 10 45 combinations. Codes on silicon micropillars can be erased by setting a continuous flow path across the entire array of code elements or for nitrocellulose by simply wicking a liquid across the code. Static or labile code elements can further be formed on nitrocellulose to create a hidden code using poly(ethylene glycol) (PEG) or glycerol additives to the inks. More advanced codes having a specific deletion sequence can also be created in silicon microfluidic devices using an array of passive routing nodes, which activate in a particular, programmable sequence. Such codes are simple to fabricate, easy to view, and efficient in coding information; they can be ideally used in combination with information on a package to protect diagnostic devices from counterfeiting.

  7. Error control techniques for satellite and space communications

    NASA Technical Reports Server (NTRS)

    Costello, D. J., Jr.

    1986-01-01

    High rate concatenated coding systems with trellis inner codes and Reed-Solomon (RS) outer codes for application in satellite communication systems are considered. Two types of inner codes are studied: high rate punctured binary convolutional codes which result in overall effective information rates between 1/2 and 1 bit per channel use; and bandwidth efficient signal space trellis codes which can achieve overall effective information rates greater than 1 bit per channel use. Channel capacity calculations with and without side information performed for the concatenated coding system. Concatenated coding schemes are investigated. In Scheme 1, the inner code is decoded with the Viterbi algorithm and the outer RS code performs error-correction only (decoding without side information). In scheme 2, the inner code is decoded with a modified Viterbi algorithm which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, while branch metrics are used to provide the reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. These two schemes are proposed for use on NASA satellite channels. Results indicate that high system reliability can be achieved with little or no bandwidth expansion.

  8. DNA-based watermarks using the DNA-Crypt algorithm.

    PubMed

    Heider, Dominik; Barnekow, Angelika

    2007-05-29

    The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.

  9. DNA-based watermarks using the DNA-Crypt algorithm

    PubMed Central

    Heider, Dominik; Barnekow, Angelika

    2007-01-01

    Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434

  10. SOX9 Duplication Linked to Intersex in Deer

    PubMed Central

    Kropatsch, Regina; Dekomien, Gabriele; Akkad, Denis A.; Gerding, Wanda M.; Petrasch-Parwez, Elisabeth; Young, Neil D.; Altmüller, Janine; Nürnberg, Peter; Gasser, Robin B.; Epplen, Jörg T.

    2013-01-01

    A complex network of genes determines sex in mammals. Here, we studied a European roe deer with an intersex phenotype that was consistent with a XY genotype with incomplete male-determination. Whole genome sequencing and quantitative real-time PCR analyses revealed a triple dose of the SOX9 gene, allowing insights into a new genetic defect in a wild animal. PMID:24040047

  11. Transcriptome profiling of microRNA by next-gen deep sequencing reveals known and novel miRNA species in the lipid fraction of human breast milk

    USDA-ARS?s Scientific Manuscript database

    While breast milk has unique health advantages for infants, the mechanisms by which it regulates the physiology of newborns are incompletely understood. miRNAs have been described as functioning transcellularly, and have been previously isolated in cell-free and exosomal form from bodily liquids (se...

  12. Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies

    PubMed Central

    Denton, James F.; Lugo-Martinez, Jose; Tucker, Abraham E.; Schrider, Daniel R.; Warren, Wesley C.; Hahn, Matthew W.

    2014-01-01

    Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. PMID:25474019

  13. Extensive error in the number of genes inferred from draft genome assemblies.

    PubMed

    Denton, James F; Lugo-Martinez, Jose; Tucker, Abraham E; Schrider, Daniel R; Warren, Wesley C; Hahn, Matthew W

    2014-12-01

    Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  14. Creation and utilization of a World Wide Web based space radiation effects code: SIREST

    NASA Technical Reports Server (NTRS)

    Singleterry, R. C. Jr; Wilson, J. W.; Shinn, J. L.; Tripathi, R. K.; Thibeault, S. A.; Noor, A. K.; Cucinotta, F. A.; Badavi, F. F.; Chang, C. K.; Qualls, G. D.; hide

    2001-01-01

    In order for humans and electronics to fully and safely operate in the space environment, codes like HZETRN (High Charge and Energy Transport) must be included in any designer's toolbox for design evaluation with respect to radiation damage. Currently, spacecraft designers do not have easy access to accurate radiation codes like HZETRN to evaluate their design for radiation effects on humans and electronics. Today, the World Wide Web is sophisticated enough to support the entire HZETRN code and all of the associated pre and post processing tools. This package is called SIREST (Space Ionizing Radiation Effects and Shielding Tools). There are many advantages to SIREST. The most important advantage is the instant update capability of the web. Another major advantage is the modularity that the web imposes on the code. Right now, the major disadvantage of SIREST will be its modularity inside the designer's system. This mostly comes from the fact that a consistent interface between the designer and the computer system to evaluate the design is incomplete. This, however, is to be solved in the Intelligent Synthesis Environment (ISE) program currently being funded by NASA.

  15. Surviving "Payment by Results": a simple method of improving clinical coding in burn specialised services in the United Kingdom.

    PubMed

    Wallis, Katy L; Malic, Claudia C; Littlewood, Sonia L; Judkins, Keith; Phipps, Alan R

    2009-03-01

    Coding inpatient episodes plays an important role in determining the financial remuneration of a clinical service. Insufficient or incomplete data may have very significant consequences on its viability. We created a document that improves the coding process in our Burns Centre. At Yorkshire Regional Burns Centre an inpatient summary sheet was designed to prospectively record and present essential information on a daily basis, for use in the coding process. The level of care was also recorded. A 3-month audit was conducted to assess the efficacy of the new forms. Forty-nine patients were admitted to the Burns Centre with a mean age of 27.6 years and TBSA ranging from 0.5% to 65%. The total stay in the Burns Centre was 758 days, of which 22% were at level B3-B5 and 39% at level B2. The use of the new discharge document identified potential income of about 500,000 GB pound sterling at our local daily tariffs for high dependency and intensive care. The new form is able to ensure a high quality of coding with a possible direct impact on the financial resources accrued for burn care.

  16. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.

    PubMed

    Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas

    2009-03-01

    Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.

  17. Multirate control with incomplete information over Profibus-DP network

    NASA Astrophysics Data System (ADS)

    Salt, J.; Casanova, V.; Cuenca, A.; Pizá, R.

    2014-07-01

    When a process field bus-decentralized peripherals (Profibus-DP) network is used in an industrial environment, a deterministic behaviour is usually claimed. However, due to some concerns such as bandwidth limitations, lack of synchronisation among different clocks and existence of time-varying delays, a more complex problem must be faced. This problem implies the transmission of irregular and, even, random sequences of incomplete information. The main consequence of this issue is the appearance of different sampling periods at different network devices. In this paper, this aspect is checked by means of a detailed Profibus-DP timescale study. In addition, in order to deal with the different periods, a delay-dependent dual-rate proportional-integral-derivative control is introduced. Stability for the proposed control system is analysed in terms of linear matrix inequalities.

  18. TRPV1 variants impair intracellular Ca2+ signaling and may confer susceptibility to malignant hyperthermia.

    PubMed

    Abeele, Fabien Vanden; Lotteau, Sabine; Ducreux, Sylvie; Dubois, Charlotte; Monnier, Nicole; Hanna, Amy; Gkika, Dimitra; Romestaing, Caroline; Noyer, Lucile; Flourakis, Matthieu; Tessier, Nolwenn; Al-Mawla, Ribal; Chouabe, Christophe; Lefai, Etienne; Lunardi, Joël; Hamilton, Susan; Fauré, Julien; Van Coppenolle, Fabien; Prevarskaya, Natalia

    2018-06-21

    Malignant hyperthermia (MH) is a pharmacogenetic disorder arising from uncontrolled muscle calcium release due to an abnormality in the sarcoplasmic reticulum (SR) calcium-release mechanism triggered by halogenated inhalational anesthetics. However, the molecular mechanisms involved are still incomplete. We aimed to identify transient receptor potential vanilloid 1 (TRPV1) variants within the entire coding sequence in patients who developed sensitivity to MH of unknown etiology. In vitro and in vivo functional studies were performed in heterologous expression system, trpv1 -/- mice, and a murine model of human MH. We identified TRPV1 variants in two patients and their heterologous expression in muscles of trpv1 -/- mice strongly enhanced calcium release from SR upon halogenated anesthetic stimulation, suggesting they could be responsible for the MH phenotype. We confirmed the in vivo significance by using mice with a knock-in mutation (Y524S) in the type I ryanodine receptor (Ryr1), a mutation analogous to the Y522S mutation associated with MH in humans. We showed that the TRPV1 antagonist capsazepine slows the heat-induced hypermetabolic response in this model. We propose that TRPV1 contributes to MH and could represent an actionable therapeutic target for prevention of the pathology and also be responsible for MH sensitivity when mutated.

  19. Characterization of the complete mitochondrial genome of the giant silkworm moth, Eriogyna pyretorum (Lepidoptera: Saturniidae).

    PubMed

    Jiang, Shao-Tong; Hong, Gui-Yun; Yu, Miao; Li, Na; Yang, Ying; Liu, Yan-Qun; Wei, Zhao-Jun

    2009-05-22

    The complete mitochondrial genome (mitogenome) of Eriogyna pyretorum (Lepidoptera: Saturniidae) was determined as being composed of 15,327 base pairs (bp), including 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The arrangement of the PCGs is the same as that found in the other sequenced lepidopteran. The AT skewness for the E. pyretorum mitogenome is slightly negative (-0.031), indicating the occurrence of more Ts than As. The nucleotide composition of the E. pyretorum mitogenome is also biased toward A + T nucleotides (80.82%). All PCGs are initiated by ATN codons, except for cytochrome c oxidase subunit 1 and 2 (cox1 and cox2). Two of the 13 PCGs harbor the incomplete termination codon by T. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA, with the exception of trnS1(AGN) and trnS2(UCN). Phylogenetic analysis among the available lepidopteran species supports the current morphology-based hypothesis that Bombycoidea, Geometroidea, Notodontidea, Papilionoidea and Pyraloidea are monophyletic. As has been previously suggested, Bombycidae (Bombyx mori and Bombyx mandarina), Sphingoidae (Manduca sexta) and Saturniidae (Antheraea pernyi, Antheraea yamamai, E. pyretorum and Caligula boisduvalii) formed a group.

  20. Characterization of the complete mitochondrial genome of the giant silkworm moth, Eriogyna pyretorum (Lepidoptera: Saturniidae)

    PubMed Central

    Jiang, Shao-Tong; Hong, Gui-Yun; Yu, Miao; Li, Na; Yang, Ying; Liu, Yan-Qun; Wei, Zhao-Jun

    2009-01-01

    The complete mitochondrial genome (mitogenome) of Eriogyna pyretorum (Lepidoptera: Saturniidae) was determined as being composed of 15,327 base pairs (bp), including 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The arrangement of the PCGs is the same as that found in the other sequenced lepidopteran. The AT skewness for the E. pyretorum mitogenome is slightly negative (-0.031), indicating the occurrence of more Ts than As. The nucleotide composition of the E. pyretorum mitogenome is also biased toward A + T nucleotides (80.82%). All PCGs are initiated by ATN codons, except for cytochrome c oxidase subunit 1 and 2 (cox1 and cox2). Two of the 13 PCGs harbor the incomplete termination codon by T. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA, with the exception of trnS1(AGN) and trnS2(UCN). Phylogenetic analysis among the available lepidopteran species supports the current morphology-based hypothesis that Bombycoidea, Geometroidea, Notodontidea, Papilionoidea and Pyraloidea are monophyletic. As has been previously suggested, Bombycidae (Bombyx mori and Bombyx mandarina), Sphingoidae (Manduca sexta) and Saturniidae (Antheraea pernyi, Antheraea yamamai, E. pyretorum and Caligula boisduvalii) formed a group. PMID:19471586

  1. Subjective quality evaluation of low-bit-rate video

    NASA Astrophysics Data System (ADS)

    Masry, Mark; Hemami, Sheila S.; Osberger, Wilfried M.; Rohaly, Ann M.

    2001-06-01

    A subjective quality evaluation was performed to qualify vie4wre responses to visual defects that appear in low bit rate video at full and reduced frame rates. The stimuli were eight sequences compressed by three motion compensated encoders - Sorenson Video, H.263+ and a Wavelet based coder - operating at five bit/frame rate combinations. The stimulus sequences exhibited obvious coding artifacts whose nature differed across the three coders. The subjective evaluation was performed using the Single Stimulus Continuos Quality Evaluation method of UTI-R Rec. BT.500-8. Viewers watched concatenated coded test sequences and continuously registered the perceived quality using a slider device. Data form 19 viewers was colleted. An analysis of their responses to the presence of various artifacts across the range of possible coding conditions and content is presented. The effects of blockiness and blurriness on perceived quality are examined. The effects of changes in frame rate on perceived quality are found to be related to the nature of the motion in the sequence.

  2. Draft Genome Sequence of a Canine Isolate of Methicillin-Resistant Staphylococcus haemolyticus.

    PubMed

    Bean, David C; Wigmore, Sarah M; Wareham, David W

    2017-04-06

    Staphylococcus haemolyticus strain SW007 was isolated from a nasal swab taken from a healthy dog. The isolate is resistant to methicillin, mupirocin, macrolides, and sulfonamides. The SW007 draft genome is 2,325,410 bp and contains 2,277 coding sequences, including 60 tRNAs and nine complete rRNA-coding regions. Copyright © 2017 Bean et al.

  3. The complete chloroplast genome sequence of strawberry (Fragaria  × ananassa Duch.) and comparison with related species of Rosaceae

    PubMed Central

    Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong

    2017-01-01

    Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765

  4. Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride

    PubMed Central

    Matroudi, S.; Zamani, M.R.; Motallebi, M.

    2008-01-01

    In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242

  5. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  6. Decoding DNA labels by melting curve analysis using real-time PCR.

    PubMed

    Balog, József A; Fehér, Liliána Z; Puskás, László G

    2017-12-01

    Synthetic DNA has been used as an authentication code for a diverse number of applications. However, existing decoding approaches are based on either DNA sequencing or the determination of DNA length variations. Here, we present a simple alternative protocol for labeling different objects using a small number of short DNA sequences that differ in their melting points. Code amplification and decoding can be done in two steps using quantitative PCR (qPCR). To obtain a DNA barcode with high complexity, we defined 8 template groups, each having 4 different DNA templates, yielding 158 (>2.5 billion) combinations of different individual melting temperature (Tm) values and corresponding ID codes. The reproducibility and specificity of the decoding was confirmed by using the most complex template mixture, which had 32 different products in 8 groups with different Tm values. The industrial applicability of our protocol was also demonstrated by labeling a drone with an oil-based paint containing a predefined DNA code, which was then successfully decoded. The method presented here consists of a simple code system based on a small number of synthetic DNA sequences and a cost-effective, rapid decoding protocol using a few qPCR reactions, enabling a wide range of authentication applications.

  7. Using hidden Markov models and observed evolution to annotate viral genomes.

    PubMed

    McCauley, Stephen; Hein, Jotun

    2006-06-01

    ssRNA (single stranded) viral genomes are generally constrained in length and utilize overlapping reading frames to maximally exploit the coding potential within the genome length restrictions. This overlapping coding phenomenon leads to complex evolutionary constraints operating on the genome. In regions which code for more than one protein, silent mutations in one reading frame generally have a protein coding effect in another. To maximize coding flexibility in all reading frames, overlapping regions are often compositionally biased towards amino acids which are 6-fold degenerate with respect to the 64 codon alphabet. Previous methodologies have used this fact in an ad hoc manner to look for overlapping genes by motif matching. In this paper differentiated nucleotide compositional patterns in overlapping regions are incorporated into a probabilistic hidden Markov model (HMM) framework which is used to annotate ssRNA viral genomes. This work focuses on single sequence annotation and applies an HMM framework to ssRNA viral annotation. A description of how the HMM is parameterized, whilst annotating within a missing data framework is given. A Phylogenetic HMM (Phylo-HMM) extension, as applied to 14 aligned HIV2 sequences is also presented. This evolutionary extension serves as an illustration of the potential of the Phylo-HMM framework for ssRNA viral genomic annotation. The single sequence annotation procedure (SSA) is applied to 14 different strains of the HIV2 virus. Further results on alternative ssRNA viral genomes are presented to illustrate more generally the performance of the method. The results of the SSA method are encouraging however there is still room for improvement, and since there is overwhelming evidence to indicate that comparative methods can improve coding sequence (CDS) annotation, the SSA method is extended to a Phylo-HMM to incorporate evolutionary information. The Phylo-HMM extension is applied to the same set of 14 HIV2 sequences which are pre-aligned. The performance improvement that results from including the evolutionary information in the analysis is illustrated.

  8. TDRSS telecommunications system, PN code analysis

    NASA Technical Reports Server (NTRS)

    Dixon, R.; Gold, R.; Kaiser, F.

    1976-01-01

    The pseudo noise (PN) codes required to support the TDRSS telecommunications services are analyzed and the impact of alternate coding techniques on the user transponder equipment, the TDRSS equipment, and all factors that contribute to the acquisition and performance of these telecommunication services is assessed. Possible alternatives to the currently proposed hybrid FH/direct sequence acquisition procedures are considered and compared relative to acquisition time, implementation complexity, operational reliability, and cost. The hybrid FH/direct sequence technique is analyzed and rejected in favor of a recommended approach which minimizes acquisition time and user transponder complexity while maximizing probability of acquisition and overall link reliability.

  9. The effect of an exogenous magnetic field on neural coding in deep spiking neural networks.

    PubMed

    Guo, Lei; Zhang, Wei; Zhang, Jialei

    2018-01-01

    A ten-layer feed forward network is constructed in the presence of an exogenous alternating magnetic field. Specifically, our results indicate that for rate coding, the firing rate is significantly increased in the presence of an exogenous alternating magnetic field and particularly with increasing enhancement of the alternating magnetic field amplitude. For temporal coding, the interspike intervals of the spiking sequence are decreased and the distribution of the interspike intervals of the spiking sequence tends to be uniform in the presence of alternating magnetic field.

  10. Coverage Maximization Using Dynamic Taint Tracing

    DTIC Science & Technology

    2007-03-28

    we do not have source code are handled, incompletely, via models of taint transfer. We use a little language to specify how taint transfers across a...n) 2.3.7 Implementation and Runtime Issues The taint graph instrumentation is a 2K line Ocaml module extending CIL and is supported by 5K lines of...modern scripting languages such as Ruby have taint modes that work similarly; however, all propagate taint at the variable rather than the byte level and

  11. Designing for Compressive Sensing: Compressive Art, Camouflage, Fonts, and Quick Response Codes

    DTIC Science & Technology

    2018-01-01

    an example where the signal is non-sparse in the standard basis, but sparse in the discrete cosine basis . The top plot shows the signal from the...previous example, now used as sparse discrete cosine transform (DCT) coefficients . The next plot shows the non-sparse signal in the standard...Romberg JK, Tao T. Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math . 2006;59(8):1207–1223. 3. Donoho DL

  12. Study on multiple-hops performance of MOOC sequences-based optical labels for OPS networks

    NASA Astrophysics Data System (ADS)

    Zhang, Chongfu; Qiu, Kun; Ma, Chunli

    2009-11-01

    In this paper, we utilize a new study method that is under independent case of multiple optical orthogonal codes to derive the probability function of MOOCS-OPS networks, discuss the performance characteristics for a variety of parameters, and compare some characteristics of the system employed by single optical orthogonal code or multiple optical orthogonal codes sequences-based optical labels. The performance of the system is also calculated, and our results verify that the method is effective. Additionally it is found that performance of MOOCS-OPS networks would, negatively, be worsened, compared with single optical orthogonal code-based optical label for optical packet switching (SOOC-OPS); however, MOOCS-OPS networks can greatly enlarge the scalability of optical packet switching networks.

  13. Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes.

    PubMed

    Albrechtsen, A; Grarup, N; Li, Y; Sparsø, T; Tian, G; Cao, H; Jiang, T; Kim, S Y; Korneliussen, T; Li, Q; Nie, C; Wu, R; Skotte, L; Morris, A P; Ladenvall, C; Cauchi, S; Stančáková, A; Andersen, G; Astrup, A; Banasik, K; Bennett, A J; Bolund, L; Charpentier, G; Chen, Y; Dekker, J M; Doney, A S F; Dorkhan, M; Forsen, T; Frayling, T M; Groves, C J; Gui, Y; Hallmans, G; Hattersley, A T; He, K; Hitman, G A; Holmkvist, J; Huang, S; Jiang, H; Jin, X; Justesen, J M; Kristiansen, K; Kuusisto, J; Lajer, M; Lantieri, O; Li, W; Liang, H; Liao, Q; Liu, X; Ma, T; Ma, X; Manijak, M P; Marre, M; Mokrosiński, J; Morris, A D; Mu, B; Nielsen, A A; Nijpels, G; Nilsson, P; Palmer, C N A; Rayner, N W; Renström, F; Ribel-Madsen, R; Robertson, N; Rolandsson, O; Rossing, P; Schwartz, T W; Slagboom, P E; Sterner, M; Tang, M; Tarnow, L; Tuomi, T; van't Riet, E; van Leeuwen, N; Varga, T V; Vestmar, M A; Walker, M; Wang, B; Wang, Y; Wu, H; Xi, F; Yengo, L; Yu, C; Zhang, X; Zhang, J; Zhang, Q; Zhang, W; Zheng, H; Zhou, Y; Altshuler, D; 't Hart, L M; Franks, P W; Balkau, B; Froguel, P; McCarthy, M I; Laakso, M; Groop, L; Christensen, C; Brandslund, I; Lauritzen, T; Witte, D R; Linneberg, A; Jørgensen, T; Hansen, T; Wang, J; Nielsen, R; Pedersen, O

    2013-02-01

    Human complex metabolic traits are in part regulated by genetic determinants. Here we applied exome sequencing to identify novel associations of coding polymorphisms at minor allele frequencies (MAFs) >1% with common metabolic phenotypes. The study comprised three stages. We performed medium-depth (8×) whole exome sequencing in 1,000 cases with type 2 diabetes, BMI >27.5 kg/m(2) and hypertension and in 1,000 controls (stage 1). We selected 16,192 polymorphisms nominally associated (p < 0.05) with case-control status, from four selected annotation categories or from loci reported to associate with metabolic traits. These variants were genotyped in 15,989 Danes to search for association with 12 metabolic phenotypes (stage 2). In stage 3, polymorphisms showing potential associations were genotyped in a further 63,896 Europeans. Exome sequencing identified 70,182 polymorphisms with MAF >1%. In stage 2 we identified 51 potential associations with one or more of eight metabolic phenotypes covered by 45 unique polymorphisms. In meta-analyses of stage 2 and stage 3 results, we demonstrated robust associations for coding polymorphisms in CD300LG (fasting HDL-cholesterol: MAF 3.5%, p = 8.5 × 10(-14)), COBLL1 (type 2 diabetes: MAF 12.5%, OR 0.88, p = 1.2 × 10(-11)) and MACF1 (type 2 diabetes: MAF 23.4%, OR 1.10, p = 8.2 × 10(-10)). We applied exome sequencing as a basis for finding genetic determinants of metabolic traits and show the existence of low-frequency and common coding polymorphisms with impact on common metabolic traits. Based on our study, coding polymorphisms with MAF above 1% do not seem to have particularly high effect sizes on the measured metabolic traits.

  14. Bias-Corrected Targeted Next-Generation Sequencing for Rapid, Multiplexed Detection of Actionable Alterations in Cell-Free DNA from Advanced Lung Cancer Patients.

    PubMed

    Paweletz, Cloud P; Sacher, Adrian G; Raymond, Chris K; Alden, Ryan S; O'Connell, Allison; Mach, Stacy L; Kuang, Yanan; Gandhi, Leena; Kirschmeier, Paul; English, Jessie M; Lim, Lee P; Jänne, Pasi A; Oxnard, Geoffrey R

    2016-02-15

    Tumor genotyping is a powerful tool for guiding non-small cell lung cancer (NSCLC) care; however, comprehensive tumor genotyping can be logistically cumbersome. To facilitate genotyping, we developed a next-generation sequencing (NGS) assay using a desktop sequencer to detect actionable mutations and rearrangements in cell-free plasma DNA (cfDNA). An NGS panel was developed targeting 11 driver oncogenes found in NSCLC. Targeted NGS was performed using a novel methodology that maximizes on-target reads, and minimizes artifact, and was validated on DNA dilutions derived from cell lines. Plasma NGS was then blindly performed on 48 patients with advanced, progressive NSCLC and a known tumor genotype, and explored in two patients with incomplete tumor genotyping. NGS could identify mutations present in DNA dilutions at ≥ 0.4% allelic frequency with 100% sensitivity/specificity. Plasma NGS detected a broad range of driver and resistance mutations, including ALK, ROS1, and RET rearrangements, HER2 insertions, and MET amplification, with 100% specificity. Sensitivity was 77% across 62 known driver and resistance mutations from the 48 cases; in 29 cases with common EGFR and KRAS mutations, sensitivity was similar to droplet digital PCR. In two cases with incomplete tumor genotyping, plasma NGS rapidly identified a novel EGFR exon 19 deletion and a missed case of MET amplification. Blinded to tumor genotype, this plasma NGS approach detected a broad range of targetable genomic alterations in NSCLC with no false positives including complex mutations like rearrangements and unexpected resistance mutations such as EGFR C797S. Through use of widely available vacutainers and a desktop sequencing platform, this assay has the potential to be implemented broadly for patient care and translational research. ©2015 American Association for Cancer Research.

  15. Bias-corrected targeted next-generation sequencing for rapid, multiplexed detection of actionable alterations in cell-free DNA from advanced lung cancer patients

    PubMed Central

    Paweletz, Cloud P.; Sacher, Adrian G.; Raymond, Chris K.; Alden, Ryan S.; O'Connell, Allison; Mach, Stacy L.; Kuang, Yanan; Gandhi, Leena; Kirschmeier, Paul; English, Jessie M.; Lim, Lee P.; Jänne, Pasi A.; Oxnard, Geoffrey R.

    2015-01-01

    Purpose Tumor genotyping is a powerful tool for guiding non-small cell lung cancer (NSCLC) care, however comprehensive tumor genotyping can be logistically cumbersome. To facilitate genotyping, we developed a next-generation sequencing (NGS) assay using a desktop sequencer to detect actionable mutations and rearrangements in cell-free plasma DNA (cfDNA). Experimental Design An NGS panel was developed targeting 11 driver oncogenes found in NSCLC. Targeted NGS was performed using a novel methodology that maximizes on-target reads, and minimizes artifact, and was validated on DNA dilutions derived from cell lines. Plasma NGS was then blindly performed on 48 patients with advanced, progressive NSCLC and a known tumor genotype, and explored in two patients with incomplete tumor genotyping. Results NGS could identify mutations present in DNA dilutions at ≥0.4% allelic frequency with 100% sensitivity/specificity. Plasma NGS detected a broad range of driver and resistance mutations, including ALK, ROS1, and RET rearrangements, HER2 insertions, and MET amplification, with 100% specificity. Sensitivity was 77% across 62 known driver and resistance mutations from the 48 cases; in 29 cases with common EGFR and KRAS mutations, sensitivity was similar to droplet digital PCR. In two cases with incomplete tumor genotyping, plasma NGS rapidly identified a novel EGFR exon 19 deletion and a missed case of MET amplification. Conclusion Blinded to tumor genotype, this plasma NGS approach detected a broad range of targetable genomic alterations in NSCLC with no false positives including complex mutations like rearrangements and unexpected resistance mutations such as EGFR C797S. Through use of widely available vacutainers and a desktop sequencing platform, this assay has the potential to be implemented broadly for patient care and translational research. PMID:26459174

  16. Composition and Function of Sulfate-Reducing Prokaryotes in Eutrophic and Pristine Areas of the Florida Everglades†

    PubMed Central

    Castro, Hector; Reddy, K. R.; Ogram, Andrew

    2002-01-01

    As a result of agricultural activities in regions adjacent to the northern boundary of the Florida Everglades, a nutrient gradient developed that resulted in physicochemical and ecological changes from the original system. Sulfate input from agricultural runoff and groundwater is present in soils of the Northern Everglades, and sulfate-reducing prokaryotes (SRP) may play an important role in biogeochemical processes such as carbon cycling. The goal of this project was to utilize culture-based and non-culture-based approaches to study differences between the composition of assemblages of SRP in eutrophic and pristine areas of the Everglades. Sulfate reduction rates and most-probable-number enumerations revealed SRP populations and activities to be greater in eutrophic zones than in more pristine soils. In eutrophic regions, methanogenesis rates were higher, the addition of acetate stimulated methanogenesis, and SRP able to utilize acetate competed to a limited degree with acetoclastic methanogens. A surprising amount of diversity within clone libraries of PCR-amplified dissimilatory sulfite reductase (DSR) genes was observed, and the majority of DSR sequences were associated with gram-positive spore-forming Desulfotomaculum and uncultured microorganisms. Sequences associated with Desulfotomaculum fall into two categories: in the eutrophic regions, 94.7% of the sequences related to Desulfotomaculum were associated with those able to completely oxidize substrates, and in samples from pristine regions, all Desulfotomaculum-like sequences were related to incomplete oxidizers. This metabolic selection may be linked to the types of substrates that Desulfotomaculum spp. utilize; it may be that complete oxidizers are more versatile and likelier to proliferate in nutrient-rich zones of the Everglades. Desulfotomaculum incomplete oxidizers may outcompete complete oxidizers for substrates such as hydrogen in pristine zones where diverse carbon sources are less available. PMID:12450837

  17. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  18. The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

    PubMed

    Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

    2018-05-17

    Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

    PubMed Central

    Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

    2006-01-01

    Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030

  20. Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus

    PubMed Central

    Oroszlan, Stephen; Henderson, Louis E.; Stephenson, John R.; Copeland, Terry D.; Long, Cedric W.; Ihle, James N.; Gilden, Raymond V.

    1978-01-01

    The amino- and carboxyl-terminal amino acid sequences of proteins (p10, p12, p15, and p30) coded by the gag gene of Rauscher and AKR murine leukemia viruses were determined. Among these proteins, p15 from both viruses appears to have a blocked amino end. Proline was found to be the common NH2 terminus of both p30s and both p12s, and alanine of both p10s. The amino-terminal sequences of p30s are identical, as are those of p10s, while the p12 sequences are clearly distinctive but also show substantial homology. The carboxyl-terminal amino acids of both viral p30s and p12s are leucine and phenylalanine, respectively. Rauscher leukemia virus p15 has tyrosine as the carboxyl terminus while AKR virus p15 has phenylalanine in this position. The compositional and sequence data provide definite chemical criteria for the identification of analogous gag gene products and for the comparison of viral proteins isolated in different laboratories. On the basis of amino acid sequences and the previously proposed H-p15-p12-p30-p10-COOH peptide sequence in the precursor polyprotein, a model for cleavage sites involved in the post-translational processing of the precursor coded for by the gag gene is proposed. PMID:206897

  1. Marks of Change in Sequences

    NASA Astrophysics Data System (ADS)

    Jürgensen, H.

    2011-12-01

    Given a sequence of events, how does one recognize that a change has occurred? We explore potential definitions of the concept of change in a sequence and propose that words in relativized solid codes might serve as indicators of change.

  2. Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.

    PubMed

    Kanai, T H; Ueda, S; Nakamura, T

    2000-01-31

    Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).

  3. Analysis of CHRNA7 rare variants in autism spectrum disorder susceptibility.

    PubMed

    Bacchelli, Elena; Battaglia, Agatino; Cameli, Cinzia; Lomartire, Silvia; Tancredi, Raffaella; Thomson, Susanne; Sutcliffe, James S; Maestrini, Elena

    2015-04-01

    Chromosome 15q13.3 recurrent microdeletions are causally associated with a wide range of phenotypes, including autism spectrum disorder (ASD), seizures, intellectual disability, and other psychiatric conditions. Whether the reciprocal microduplication is pathogenic is less certain. CHRNA7, encoding for the alpha7 subunit of the neuronal nicotinic acetylcholine receptor, is considered the likely culprit gene in mediating neurological phenotypes in 15q13.3 deletion cases. To assess if CHRNA7 rare variants confer risk to ASD, we performed copy number variant analysis and Sanger sequencing of the CHRNA7 coding sequence in a sample of 135 ASD cases. Sequence variation in this gene remains largely unexplored, given the existence of a fusion gene, CHRFAM7A, which includes a nearly identical partial duplication of CHRNA7. Hence, attempts to sequence coding exons must distinguish between CHRNA7 and CHRFAM7A, making next-generation sequencing approaches unreliable for this purpose. A CHRNA7 microduplication was detected in a patient with autism and moderate cognitive impairment; while no rare damaging variants were identified in the coding region, we detected rare variants in the promoter region, previously described to functionally reduce transcription. This study represents the first sequence variant analysis of CHRNA7 in a sample of idiopathic autism. © 2015 Wiley Periodicals, Inc.

  4. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  5. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  6. Cloning and sequencing of a laccase gene from the lignin-degrading basidiomycete Pleurotus ostreatus.

    PubMed Central

    Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G

    1995-01-01

    The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961

  7. Application of 2D graphic representation of protein sequence based on Huffman tree method.

    PubMed

    Qi, Zhao-Hui; Feng, Jun; Qi, Xiao-Qin; Li, Ling

    2012-05-01

    Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This representation can completely avoid loss of information in the transfer of data from a protein sequence to its graphic representation. The method consists of two parts. One is about the 0-1 codes of 20 amino acids by Huffman tree with amino acid frequency. The amino acid frequency is defined as the statistical number of an amino acid in the analyzed protein sequences. The other is about the 2D graphic representation of protein sequence based on the 0-1 codes. Then the applications of the method on ten ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed model may provide us with some new sights to understand the evolution patterns determined from protein sequences and complete genomes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  8. Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India)

    PubMed Central

    Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.

    2015-01-01

    16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85′N and 78°25′E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. PMID:26793757

  9. Eddy current-nulled convex optimized diffusion encoding (EN-CODE) for distortion-free diffusion tensor imaging with short echo times.

    PubMed

    Aliotta, Eric; Moulin, Kévin; Ennis, Daniel B

    2018-02-01

    To design and evaluate eddy current-nulled convex optimized diffusion encoding (EN-CODE) gradient waveforms for efficient diffusion tensor imaging (DTI) that is free of eddy current-induced image distortions. The EN-CODE framework was used to generate diffusion-encoding waveforms that are eddy current-compensated. The EN-CODE DTI waveform was compared with the existing eddy current-nulled twice refocused spin echo (TRSE) sequence as well as monopolar (MONO) and non-eddy current-compensated CODE in terms of echo time (TE) and image distortions. Comparisons were made in simulations, phantom experiments, and neuro imaging in 10 healthy volunteers. The EN-CODE sequence achieved eddy current compensation with a significantly shorter TE than TRSE (78 versus 96 ms) and a slightly shorter TE than MONO (78 versus 80 ms). Intravoxel signal variance was lower in phantoms with EN-CODE than with MONO (13.6 ± 11.6 versus 37.4 ± 25.8) and not different from TRSE (15.1 ± 11.6), indicating good robustness to eddy current-induced image distortions. Mean fractional anisotropy values in brain edges were also significantly lower with EN-CODE than with MONO (0.16 ± 0.01 versus 0.24 ± 0.02, P < 1 x 10 -5 ) and not different from TRSE (0.16 ± 0.01 versus 0.16 ± 0.01, P = nonsignificant). The EN-CODE sequence eliminated eddy current-induced image distortions in DTI with a TE comparable to MONO and substantially shorter than TRSE. Magn Reson Med 79:663-672, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  10. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    PubMed

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  11. Remembering Plurals: Unit of Coding and Form of Coding during Serial Recall.

    ERIC Educational Resources Information Center

    Van Der Molen, Hugo; Morton, John

    1979-01-01

    Adult females recalled lists of six words, including some plural nouns, presented visually in sequence. A frequent error was to detach the plural from its root. This supports a morpheme-based as opposed to a unitary word code. Evidence for a primarily phonological coding of the plural morpheme was obtained. (Author/RD)

  12. Coalescent-Based Analyses of Genomic Sequence Data Provide a Robust Resolution of Phylogenetic Relationships among Major Groups of Gibbons

    PubMed Central

    Shi, Cheng-Min; Yang, Ziheng

    2018-01-01

    Abstract The phylogenetic relationships among extant gibbon species remain unresolved despite numerous efforts using morphological, behavorial, and genetic data and the sequencing of whole genomes. A major challenge in reconstructing the gibbon phylogeny is the radiative speciation process, which resulted in extremely short internal branches in the species phylogeny and extensive incomplete lineage sorting with extensive gene-tree heterogeneity across the genome. Here, we analyze two genomic-scale data sets, with ∼10,000 putative noncoding and exonic loci, respectively, to estimate the species tree for the major groups of gibbons. We used the Bayesian full-likelihood method bpp under the multispecies coalescent model, which naturally accommodates incomplete lineage sorting and uncertainties in the gene trees. For comparison, we included three heuristic coalescent-based methods (mp-est, SVDQuartets, and astral) as well as concatenation. From both data sets, we infer the phylogeny for the four extant gibbon genera to be (Hylobates, (Nomascus, (Hoolock, Symphalangus))). We used simulation guided by the real data to evaluate the accuracy of the methods used. Astral, while not as efficient as bpp, performed well in estimation of the species tree even in presence of excessive incomplete lineage sorting. Concatenation, mp-est and SVDQuartets were unreliable when the species tree contains very short internal branches. Likelihood ratio test of gene flow suggests a small amount of migration from Hylobates moloch to H. pileatus, while cross-genera migration is absent or rare. Our results highlight the utility of coalescent-based methods in addressing challenging species tree problems characterized by short internal branches and rampant gene tree-species tree discordance. PMID:29087487

  13. Coordination analysis of players' distribution in football using cross-correlation and vector coding techniques.

    PubMed

    Moura, Felipe Arruda; van Emmerik, Richard E A; Santana, Juliana Exel; Martins, Luiz Eduardo Barreto; Barros, Ricardo Machado Leite de; Cunha, Sergio Augusto

    2016-12-01

    The purpose of this study was to investigate the coordination between teams spread during football matches using cross-correlation and vector coding techniques. Using a video-based tracking system, we obtained the trajectories of 257 players during 10 matches. Team spread was calculated as functions of time. For a general coordination description, we calculated the cross-correlation between the signals. Vector coding was used to identify the coordination patterns between teams during offensive sequences that ended in shots on goal or defensive tackles. Cross-correlation showed that opponent teams have a tendency to present in-phase coordination, with a short time lag. During offensive sequences, vector coding results showed that, although in-phase coordination dominated, other patterns were observed. We verified that during the early stages, offensive sequences ending in shots on goal present greater anti-phase and attacking team phase periods, compared to sequences ending in tackles. Results suggest that the attacking team may seek to present a contrary behaviour of its opponent (or may lead the adversary behaviour) in the beginning of the attacking play, regarding to the distribution strategy, to increase the chances of a shot on goal. The techniques allowed detecting the coordination patterns between teams, providing additional information about football dynamics and players' interaction.

  14. Images multiplexing by code division technique

    NASA Astrophysics Data System (ADS)

    Kuo, Chung J.; Rigas, Harriett

    Spread Spectrum System (SSS) or Code Division Multiple Access System (CDMAS) has been studied for a long time, but most of the attention was focused on the transmission problems. In this paper, we study the results when the code division technique is applied to the image at the source stage. The idea is to convolve the N different images with the corresponding m-sequence to obtain the encrypted image. The superimposed image (summation of the encrypted images) is then stored or transmitted. The benefit of this is that no one knows what is stored or transmitted unless the m-sequence is known. The recovery of the original image is recovered by correlating the superimposed image with corresponding m-sequence. Two cases are studied in this paper. First, the two-dimensional image is treated as a long one-dimensional vector and the m-sequence is employed to obtain the results. Secondly, the two-dimensional quasi m-array is proposed and used for the code division multiplexing. It is shown that quasi m-array is faster when the image size is 256 x 256. The important features of the proposed technique are not only the image security but also the data compactness. The compression ratio depends on how many images are superimposed.

  15. Images Multiplexing By Code Division Technique

    NASA Astrophysics Data System (ADS)

    Kuo, Chung Jung; Rigas, Harriett B.

    1990-01-01

    Spread Spectrum System (SSS) or Code Division Multiple Access System (CDMAS) has been studied for a long time, but most of the attention was focused on the transmission problems. In this paper, we study the results when the code division technique is applied to the image at the source stage. The idea is to convolve the N different images with the corresponding m-sequence to obtain the encrypted image. The superimposed image (summation of the encrypted images) is then stored or transmitted. The benefit of this is that no one knows what is stored or transmitted unless the m-sequence is known. The recovery of the original image is recovered by correlating the superimposed image with corresponding m-sequence. Two cases are studied in this paper. First, the 2-D image is treated as a long 1-D vector and the m-sequence is employed to obtained the results. Secondly, the 2-D quasi m-array is proposed and used for the code division multiplexing. It is showed that quasi m-array is faster when the image size is 256x256. The important features of the proposed technique are not only the image security but also the data compactness. The compression ratio depends on how many images are superimposed.

  16. Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

    PubMed

    Redwan, R M; Saidin, A; Kumar, S V

    2015-08-12

    Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.

  17. Denitrification gene expression in clay-soil bacterial community

    NASA Astrophysics Data System (ADS)

    Pastorelli, R.; Landi, S.

    2009-04-01

    Our contribution in the Italian research project SOILSINK was focused on microbial denitrification gene expression in Mediterranean agricultural soils. In ecosystems with high inputs of nitrogen, such as agricultural soils, denitrification causes a net loss of nitrogen since nitrate is reduced to gaseous forms, which are released into the atmosphere. Moreover, incomplete denitrification can lead to emission of nitrous oxide, a potent greenhouse gas which contributes to global warming and destruction of ozone layer. A critical role in denitrification is played by microorganisms and the ability to denitrify is widespread among a variety of phylogenetically unrelated organisms. Data reported here are referred to wheat cultivation in a clay-rich soil under different environmental impact management (Agugliano, AN, Italy). We analysed the RNA directly extracted from soil to provide information on in situ activities of specific populations. The expression of genes coding for two nitrate reductases (narG and napA), two nitrite reductases (nirS and nirK), two nitric oxide reductases (cnorB and qnorB) and nitrous oxide reductase (nosZ) was analyzed by reverse transcription (RT)-nested PCR. Only napA, nirS, nirK, qnorB and nosZ were detected and fragments sequenced showed high similarity with the corresponding gene sequences deposited in GenBank database. These results suggest the suitability of the method for the qualitative detection of denitrifying bacteria in environmental samples and they offered us the possibility to perform the denaturing gradient gel electrophoresis (DGGE) analyzes for denitrification genes.. Earlier conclusions showed nirK gene is more widely distributed in soil environment than nirS gene. The results concerning the nosZ expression indicated that microbial activity was clearly present only in no-tilled and no-fertilized soils.

  18. The Complete Mitochondrial Genome of Coptotermes ‘suzhouensis’ (syn. Coptotermes formosanus) (Isoptera: Rhinotermitidae) and Molecular Phylogeny Analysis

    PubMed Central

    Li, Juan; Zhu, Jin-long; Lou, Shi-di; Wang, Ping; Zhang, You-sen; Wang, Lin; Yin, Ruo-chun; Zhang, Ping-ping

    2018-01-01

    Abstract Coptotermes suzhouensis (Isoptera: Rhinotermitidae) is a significant subterranean termite pest of wooden structures and is widely distributed in southeastern China. The complete mitochondrial DNA sequence of C. suzhouensis was analyzed in this study. The mitogenome was a circular molecule of 15,764 bp in length, which contained 13 protein-coding genes (PCGs), 22 transfer RNA genes, two ribosomal RNA genes, and an A+T-rich region with a gene arrangement typical of Isoptera mitogenomes. All PCGs were initiated by ATN codons and terminated by complete termination codons (TAA), except COX2, ND5, and Cytb, which ended with an incomplete termination codon T. All tRNAs displayed a typical clover-leaf structure, except for tRNASer(AGN), which did not contain the stem-loop structure in the DHU arm. The A+T content (69.23%) of the A+T-rich region (949 bp) was higher than that of the entire mitogenome (65.60%), and two different sets of repeat units (A+B) were distributed in this region. Comparison of complete mitogenome sequences with those of Coptotermes formosanus indicated that the two taxa have very high genetic similarity. Forty-one representative termite species were used to construct phylogenetic trees by maximum likelihood, maximum parsimony, and Bayesian inference methods. The phylogenetic analyses also strongly supported (BPP, MLBP, and MPBP = 100%) that all C. suzhouensis and C. formosanus samples gathered into one clade with genetic distances between 0.000 and 0.002. This study provides molecular evidence for a more robust phylogenetic position of C. suzhouensis and inferrs that C. suzhouensis was the synonymy of C. formosanus. PMID:29718488

  19. Mutations in the novel gene FOPV are associated with familial autosomal dominant and non-familial obliterative portal venopathy.

    PubMed

    Besmond, Claude; Valla, Dominique; Hubert, Laurence; Poirier, Karine; Grosse, Brigitte; Guettier, Catherine; Bernard, Olivier; Gonzales, Emmanuel; Jacquemin, Emmanuel

    2018-02-01

    Obliterative portal venopathy (OPV) is characterized by lesions of portal vein intrahepatic branches and is thought to be responsible for many cases of portal hypertension in the absence of cirrhosis or obstruction of large portal or hepatic veins. In most cases the cause of OPV remains unknown. The aim was to identify a candidate gene of OPV. Whole exome sequencing was performed in two families, including 6 patients with OPV. Identified mutations were confirmed by Sanger sequencing and expression of candidate gene transcript was studied by real time qPCR in human tissues. In both families, no mutations were identified in genes previously reported to be associated with OPV. In each family, we identified a heterozygous mutation (c.1783G>A, p.Gly595Arg and c.4895C>T, p.Thr1632Ile) in a novel gene located on chromosome 4, that we called FOPV (Familial Obliterative Portal Venopathy), and having a cDNA coding for 1793 amino acids. The FOPV mutations segregated with the disease in families and the pattern of inheritance was suggestive of autosomal dominant inherited OPV, with incomplete penetrance and variable expressivity. In silico analysis predicted a deleterious effect of each mutant and mutations concerned highly conserved amino acids in mammals. A deleterious heterozygous FOPV missense mutation (c.4244T>C, p.Phe1415Ser) was also identified in a patient with non-familial OPV. Expression study in liver veins showed that FOPV transcript was mainly expressed in intrahepatic portal vein. This report suggests that FOPV mutations may have a pathogenic role in some cases of familial and non-familial OPV. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. Variant in the RFWD3 gene associated with PATN1, a modifier of leopard complex spotting.

    PubMed

    Holl, H M; Brooks, S A; Archer, S; Brown, K; Malvick, J; Penedo, M C T; Bellone, R R

    2016-02-01

    Leopard complex spotting (LP), the result of an incompletely dominant mutation in TRPM1, produces a collection of unique depigmentation patterns in the horse. Although the LP mutation allows for expression of the various patterns, other loci are responsible for modification of the extent of white. Pedigree analysis of families segregating for high levels of patterning indicated a single dominant gene, named Pattern-1 (PATN1), as a major modifier of LP. Linkage analysis in two half-sibling families segregating for PATN1 identified a 15-Mb region on ECA3p that warranted further investigation. Whole transcriptome sequencing of skin samples from horses with and without the PATN1 allele was performed to identify genic SNPs for fine mapping. Two Sequenom assays were utilized to genotype 192 individuals from five LP-carrying breeds. The initial panel highlighted a 1.6-Mb region without a clear candidate gene. In the second round of fine mapping, SNP ECA3:23 658 447T>G in the 3'-UTR of RING finger and WD repeat domain 3 (RFWD3) reached a significance level of P = 1.063 × 10(-39). Sequencing of RFWD3 did not identify any coding polymorphisms specific to PATN1 horses. Genotyping of the RFWD3 3'-UTR SNP in 54 additional LP animals and 327 horses from nine breeds not segregating for LP further supported the association (P = 4.17 × 10(-115)). This variant is a strong candidate for PATN1 and may be particularly useful for LP breeders to select for high levels of white patterning. © 2015 Stichting International Foundation for Animal Genetics.

  1. Identification a novel MYOC gene mutation in a Chinese family with juvenile-onset open angle glaucoma.

    PubMed

    Zhao, Xin; Yang, Chaoshan; Tong, Yi; Zhang, Xiaohui; Xu, Liang; Li, Yang

    2010-08-25

    To describe the clinical and genetic findings in one Chinese family with juvenile-onset open angle glaucoma (JOAG). One family was examined clinically and a follow-up took place 5 years later. After informed consent was obtained, genomic DNA was extracted from the venous blood of all participants. Linkage analysis was performed with three microsatellite markers around the MYOC gene (D1S196, D1S2815, and D1S218) in the family. Mutation screening of all coding exons of MYOC was performed by direct sequencing of PCR-amplified DNA fragments and restriction fragment length polymorphism (RFLP) analysis. Bioinformatics analysis by the Garnier-Osguthorpe-Robson (GOR) method predicted the effects of variants detected on secondary structures of the MYOC protein. Clinical examination and pedigree analysis revealed a three- generation family with seven members diagnosed with JOAG, three with ocular hypertension, and five normal individuals. Through genotyping, the pedigree showed a linkage to the MYOC on chromosome 1q24-25. Mutation screening of MYOC in this family revealed an A-->T transition at position 1348 (p. N450Y) of the cDNA sequence. This missense mutation co-segregated with the disease phenotype of the family, but was not found in 100 normal controls. Secondary structure prediction of the p.N450Y by the GOR method revealed the replacement of a coil with a beta sheet at the amino acid 447. Early onset JOAG, with incomplete penetrance, is consistent with a novel mutation in MYOC. The finding provides pre-symptomatic molecular diagnosis for the members of this family and is useful for further genetic consultation.

  2. Novel variants of the 5S rRNA genes in Eruca sativa.

    PubMed

    Singh, K; Bhatia, S; Lakshmikumaran, M

    1994-02-01

    The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)

  3. Structure of genes for dermaseptins B, antimicrobial peptides from frog skin. Exon 1-encoded prepropeptide is conserved in genes for peptides of highly different structures and activities.

    PubMed

    Vouille, V; Amiche, M; Nicolas, P

    1997-09-01

    We cloned the genes of two members of the dermaseptin family, broad-spectrum antimicrobial peptides isolated from the skin of the arboreal frog Phyllomedusa bicolor. The dermaseptin gene Drg2 has a 2-exon coding structure interrupted by a small 137-bp intron, wherein exon 1 encoded a 22-residue hydrophobic signal peptide and the first three amino acids of the acidic propiece; exon 2 contained the 18 additional acidic residues of the propiece plus a typical prohormone processing signal Lys-Arg and a 32-residue dermaseptin progenitor sequence. The dermaseptin genes Drg2 and Drg1g2 have conserved sequences at both untranslated ends and in the first and second coding exons. In contrast, Drg1g2 comprises a third coding exon for a short version of the acidic propiece and a second dermaseptin progenitor sequence. Structural conservation between the two genes suggests that Drg1g2 arose recently from an ancestral Drg2-like gene through amplification of part of the second coding exon and 3'-untranslated region. Analysis of the cDNAs coding precursors for several frog skin peptides of highly different structures and activities demonstrates that the signal peptides and part of the acidic propieces are encoded by conserved nucleotides encompassed by the first coding exon of the dermaseptin genes. The organization of the genes that belong to this family, with the signal peptide and the progenitor sequence on separate exons, permits strikingly different peptides to be directed into the secretory pathway. The recruitment of such a homologous 'secretory' exon by otherwise non-homologous genes may have been an early event in the evolution of amphibian.

  4. Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

    PubMed

    Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

    2018-04-24

    mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.

  5. Temporal Code-Driven Stimulation: Definition and Application to Electric Fish Signaling

    PubMed Central

    Lareo, Angel; Forlim, Caroline G.; Pinto, Reynaldo D.; Varona, Pablo; Rodriguez, Francisco de Borja

    2016-01-01

    Closed-loop activity-dependent stimulation is a powerful methodology to assess information processing in biological systems. In this context, the development of novel protocols, their implementation in bioinformatics toolboxes and their application to different description levels open up a wide range of possibilities in the study of biological systems. We developed a methodology for studying biological signals representing them as temporal sequences of binary events. A specific sequence of these events (code) is chosen to deliver a predefined stimulation in a closed-loop manner. The response to this code-driven stimulation can be used to characterize the system. This methodology was implemented in a real time toolbox and tested in the context of electric fish signaling. We show that while there are codes that evoke a response that cannot be distinguished from a control recording without stimulation, other codes evoke a characteristic distinct response. We also compare the code-driven response to open-loop stimulation. The discussed experiments validate the proposed methodology and the software toolbox. PMID:27766078

  6. Temporal Code-Driven Stimulation: Definition and Application to Electric Fish Signaling.

    PubMed

    Lareo, Angel; Forlim, Caroline G; Pinto, Reynaldo D; Varona, Pablo; Rodriguez, Francisco de Borja

    2016-01-01

    Closed-loop activity-dependent stimulation is a powerful methodology to assess information processing in biological systems. In this context, the development of novel protocols, their implementation in bioinformatics toolboxes and their application to different description levels open up a wide range of possibilities in the study of biological systems. We developed a methodology for studying biological signals representing them as temporal sequences of binary events. A specific sequence of these events (code) is chosen to deliver a predefined stimulation in a closed-loop manner. The response to this code-driven stimulation can be used to characterize the system. This methodology was implemented in a real time toolbox and tested in the context of electric fish signaling. We show that while there are codes that evoke a response that cannot be distinguished from a control recording without stimulation, other codes evoke a characteristic distinct response. We also compare the code-driven response to open-loop stimulation. The discussed experiments validate the proposed methodology and the software toolbox.

  7. Genomic structure of two ras family genes in the slime mold Physarum polycephalum.

    PubMed

    Trzcińska-Danielewicz, Joanna; Kozlowski, Piotr; Gierdal, Katarzyna; Wiejak, Jolanta; Jagielski, Adam; Toczko, Kazimierz; Fronk, Jan

    2002-08-01

    Genomic structure of two Physarum polycephalum ras family genes, Ppras2 and Pprap1, has been determined, including the upstream region of the latter. The genes are interrupted by three and four introns, respectively. The first intron of Ppras2 has the same location within the coding sequence as the first intron in another ras homolog from this organism, Ppras1 [Trzcińska-Danielewicz, J., Kozlowski, P., and Toczko, K. (1996). "Cloning and genomic sequence of the Physarum polycephalum Ppras1 gene, a homologue of the ras protooncogene", Gene 169, pp. 143-144]. All introns, ranging from 53 to ca. 460 base pairs, have the canonical 5' and 3' ends, are greatly enriched in pyrimidines in the coding strand and have frequent pyrimidines-only tracts. These latter features seem to be responsible for the difficulties in cloning and sequencing of parts of these genes. Short sequences shared with P. polycephalum transposon-like repeats are common in the introns, indicating a possible role of transposition in intron evolution. In all three ras family genes phase zero introns are located mostly between sequences coding for regular protein secondary structure elements.

  8. Glutamate cysteine ligase (GCL) in the freshwater bivalve Unio tumidus: impact of storage conditions and seasons on activity and identification of partial coding sequence of the catalytic subunit.

    PubMed

    Coffinet, Stéphanie; Cossu-Leguille, Carole; Rodius, François; Vasseur, Paule

    2008-09-01

    Glutamate cysteine ligase (GCL; EC 6.3.2.2) is the first enzyme involved in the synthesis of glutathione. A HPLC method with fluorimetric detection was used to measure GCL activity in the gills and the digestive gland of the freshwater bivalve, Unio tumidus. Storage conditions were optimized in order to prevent decrease of GCL activity and consisted in freezing the cytosolic fraction in the presence of protease (1 mM phenylmethylsulfonic fluoric acid) and gamma-glutamyltranspeptidase (1 mM L-serine borate mixture and 0.5 mM acivicin) inhibitors. Seasonal variations of activity in the digestive gland and to a lesser extent in the gills were found with activity increasing in spring compared to winter. No sex differences were revealed. The GCL coding sequence was identified using degenerated primers designed in the highly conserved regions of the catalytic subunit of GCL. The partial sequence identified encoded for 121 amino acids. The comparison of the identified partial coding sequence of U. tumidus with those available from vertebrates and invertebrates indicated that GCL sequence was highly conserved.

  9. Isolation and characterization of a cDNA clone for the complete protein coding region of the delta subunit of the mouse acetylcholine receptor.

    PubMed Central

    LaPolla, R J; Mayne, K M; Davidson, N

    1984-01-01

    A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870

  10. Reading the Second Code: Mapping Epigenomes to Understand Plant Growth, Development, and Adaptation to the Environment[OA

    PubMed Central

    2012-01-01

    We have entered a new era in agricultural and biomedical science made possible by remarkable advances in DNA sequencing technologies. The complete sequence of an individual’s set of chromosomes (collectively, its genome) provides a primary genetic code for what makes that individual unique, just as the contents of every personal computer reflect the unique attributes of its owner. But a second code, composed of “epigenetic” layers of information, affects the accessibility of the stored information and the execution of specific tasks. Nature’s second code is enigmatic and must be deciphered if we are to fully understand and optimize the genetic potential of crop plants. The goal of the Epigenomics of Plants International Consortium is to crack this second code, and ultimately master its control, to help catalyze a new green revolution. PMID:22751210

  11. Iterative Code-Aided ML Phase Estimation and Phase Ambiguity Resolution

    NASA Astrophysics Data System (ADS)

    Wymeersch, Henk; Moeneclaey, Marc

    2005-12-01

    As many coded systems operate at very low signal-to-noise ratios, synchronization becomes a very difficult task. In many cases, conventional algorithms will either require long training sequences or result in large BER degradations. By exploiting code properties, these problems can be avoided. In this contribution, we present several iterative maximum-likelihood (ML) algorithms for joint carrier phase estimation and ambiguity resolution. These algorithms operate on coded signals by accepting soft information from the MAP decoder. Issues of convergence and initialization are addressed in detail. Simulation results are presented for turbo codes, and are compared to performance results of conventional algorithms. Performance comparisons are carried out in terms of BER performance and mean square estimation error (MSEE). We show that the proposed algorithm reduces the MSEE and, more importantly, the BER degradation. Additionally, phase ambiguity resolution can be performed without resorting to a pilot sequence, thus improving the spectral efficiency.

  12. Identification of coding and non-coding mutational hotspots in cancer genomes.

    PubMed

    Piraino, Scott W; Furney, Simon J

    2017-01-05

    The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from likely passenger regions susceptible to somatic mutation.

  13. Investigation of the mechanism of meiotic DNA cleavage by VMA1-derived endonuclease uncovers a meiotic alteration in chromatin structure around the target site.

    PubMed

    Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu

    2006-06-01

    VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation.

  14. Investigation of the Mechanism of Meiotic DNA Cleavage by VMA1-Derived Endonuclease Uncovers a Meiotic Alteration in Chromatin Structure around the Target Site

    PubMed Central

    Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu

    2006-01-01

    VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation. PMID:16757746

  15. Both coding exons of the c-myc gene contribute to its posttranscriptional regulation in the quiescent liver and regenerating liver and after protein synthesis inhibition.

    PubMed Central

    Lavenu, A; Pistoi, S; Pournin, S; Babinet, C; Morello, D

    1995-01-01

    In vivo, the steady-state level of c-myc mRNA is mainly controlled by posttranscriptional mechanisms. Using a panel of transgenic mice in which various versions of the human c-myc proto-oncogene were under the control of major histocompatibility complex H-2Kb class I regulatory sequences, we have shown that the 5' and the 3' noncoding sequences are dispensable for obtaining a regulated expression of the transgene in adult quiescent tissues, at the start of liver regeneration, and after inhibition of protein synthesis. These results indicated that the coding sequences were sufficient to ensure a regulated c-myc expression. In the present study, we have pursued this analysis with transgenes containing one or the other of the two c-myc coding exons either alone or in association with the c-myc 3' untranslated region. We demonstrate that each of the exons contains determinants which control c-myc mRNA expression. Moreover, we show that in the liver, c-myc exon 2 sequences are able to down-regulate an otherwise stable H-2K mRNA when embedded within it and to induce its transient accumulation after cycloheximide treatment and soon after liver ablation. Finally, the use of transgenes with different coding capacities has allowed us to postulate that the primary mRNA sequence itself and not c-Myc peptides is an important component of c-myc posttranscriptional regulation. PMID:7623834

  16. Method and apparatus for determining position using global positioning satellites

    NASA Technical Reports Server (NTRS)

    Ward, John (Inventor); Ward, William S. (Inventor)

    1998-01-01

    A global positioning satellite receiver having an antenna for receiving a L1 signal from a satellite. The L1 signal is processed by a preamplifier stage including a band pass filter and a low noise amplifier and output as a radio frequency (RF) signal. A mixer receives and de-spreads the RF signal in response to a pseudo-random noise code, i.e., Gold code, generated by an internal pseudo-random noise code generator. A microprocessor enters a code tracking loop, such that during the code tracking loop, it addresses the pseudo-random code generator to cause the pseudo-random code generator to sequentially output pseudo-random codes corresponding to satellite codes used to spread the L1 signal, until correlation occurs. When an output of the mixer is indicative of the occurrence of correlation between the RF signal and the generated pseudo-random codes, the microprocessor enters an operational state which slows the receiver code sequence to stay locked with the satellite code sequence. The output of the mixer is provided to a detector which, in turn, controls certain routines of the microprocessor. The microprocessor will output pseudo range information according to an interrupt routine in response detection of correlation. The pseudo range information is to be telemetered to a ground station which determines the position of the global positioning satellite receiver.

  17. Algal Species and Light Microenvironment in a Low-pH, Geothermal Microbial Mat Community

    PubMed Central

    Ferris, M. J.; Sheehan, K. B.; Kühl, M.; Cooksey, K.; Wigglesworth-Cooksey, B.; Harvey, R.; Henson, J. M.

    2005-01-01

    Unicellular algae are the predominant microbial mat-forming phototrophs in the extreme environments of acidic geothermal springs. The ecology of these algae is not well known because concepts of species composition are inferred from cultivated isolates and microscopic observations, methods known to provide incomplete and inaccurate assessments of species in situ. We used sequence analysis of 18S rRNA genes PCR amplified from mat samples from different seasons and different temperatures along a thermal gradient to identify algae in an often-studied acidic (pH 2.7) geothermal creek in Yellowstone National Park. Fiber-optic microprobes were used to show that light for algal photosynthesis is attenuated to <1% over the 1-mm surface interval of the mat. Three algal sequences were detected, and each was present year-round. A Cyanidioschyzon merolae sequence was predominant at temperatures of ≥49°C. A Chlorella protothecoides var. acidicola sequence and a Paradoxia multisita-like sequence were predominant at temperatures of ≤39°C. PMID:16269755

  18. Algal species and light microenvironment in a low-pH, geothermal microbial mat community.

    PubMed

    Ferris, M J; Sheehan, K B; Kühl, M; Cooksey, K; Wigglesworth-Cooksey, B; Harvey, R; Henson, J M

    2005-11-01

    Unicellular algae are the predominant microbial mat-forming phototrophs in the extreme environments of acidic geothermal springs. The ecology of these algae is not well known because concepts of species composition are inferred from cultivated isolates and microscopic observations, methods known to provide incomplete and inaccurate assessments of species in situ. We used sequence analysis of 18S rRNA genes PCR amplified from mat samples from different seasons and different temperatures along a thermal gradient to identify algae in an often-studied acidic (pH 2.7) geothermal creek in Yellowstone National Park. Fiber-optic microprobes were used to show that light for algal photosynthesis is attenuated to < 1% over the 1-mm surface interval of the mat. Three algal sequences were detected, and each was present year-round. A Cyanidioschyzon merolae sequence was predominant at temperatures of > or = 49 degrees C. A Chlorella protothecoides var. acidicola sequence and a Paradoxia multisita-like sequence were predominant at temperatures of < or = 39 degrees C.

  19. Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

    PubMed

    Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

    2010-11-01

    Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.

  20. Primer on Molecular Genetics; DOE Human Genome Program

    DOE R&D Accomplishments Database

    1992-04-01

    This report is taken from the April 1992 draft of the DOE Human Genome 1991--1992 Program Report, which is expected to be published in May 1992. The primer is intended to be an introduction to basic principles of molecular genetics pertaining to the genome project. The material contained herein is not final and may be incomplete. Techniques of genetic mapping and DNA sequencing are described.

  1. The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).

    PubMed

    Liang, Jian-Ying; Lin, Rui-Qing

    2016-11-01

    In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.

  2. Linear and Nonlinear Statistical Characterization of DNA

    NASA Astrophysics Data System (ADS)

    Norio Oiwa, Nestor; Goldman, Carla; Glazier, James

    2002-03-01

    We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.

  3. Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II

    PubMed Central

    Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter

    2017-01-01

    The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230

  4. Preparation of next-generation sequencing libraries using Nextera™ technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition.

    PubMed

    Caruccio, Nicholas

    2011-01-01

    DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.

  5. 1-deoxy-d-xylulose-5-phosphate reductoisomerases and method of use

    DOEpatents

    Croteau, Rodney B.; Lange, Bernd M.

    2001-01-01

    The present invention relates to isolated DNA sequences which code for the expression of plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein, such as the sequence presented in SEQ ID NO:1 which encodes a 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein from peppermint (Mentha x piperita). Additionally, the present invention relates to isolated plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase, to modified host cells transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention.

  6. 1-deoxy-D-xylulose-5-phosphate reductoisomerases, and methods of use

    DOEpatents

    Croteau, Rodney B.; Lange, Bernd M.

    2002-07-16

    The present invention relates to isolated DNA sequences which code for the expression of plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein, such as the sequence presented in SEQ ID NO:1 which encodes a 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein from peppermint (Mentha x piperita). Additionally, the present invention relates to isolated plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase, to modified host cells transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention.

  7. Gene discovery in Eimeria tenella by immunoscreening cDNA expression libraries of sporozoites and schizonts with chicken intestinal antibodies.

    PubMed

    Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie

    2003-04-02

    Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.

  8. Next generation sequencing yields the complete mitochondrial genome of the Endangered Chilean silverside Basilichthys microlepidotus (Jenyns, 1841) (Teleostei, Atherinopsidae), validated with RNA-seq.

    PubMed

    Véliz, David; Vega-Retter, Caren; Quezada-Romegialli, Claudio

    2016-01-01

    The complete sequence of the mitochondrial genome for the Chilean silverside Basilichthys microlepidotus is reported for the first time. The entire mitochondrial genome was 16,544 bp in length (GenBank accession no. KM245937); gene composition and arrangement was conformed to that reported for most fishes and contained the typical structure of 2 rRNAs, 13 protein-coding genes, 22 tRNAs and a non-coding region. The assembled mitogenome was validated against sequences of COI and Control Region previously sequenced in our lab, functional genes from RNA-Seq data for the same species and the mitogenome of two other atherinopsid species available in Genbank.

  9. Numerical classification of coding sequences

    NASA Technical Reports Server (NTRS)

    Collins, D. W.; Liu, C. C.; Jukes, T. H.

    1992-01-01

    DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.

  10. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Prevalence of transcription promoters within archaeal operons and coding sequences

    PubMed Central

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements. PMID:19536208

  12. Prevalence of transcription promoters within archaeal operons and coding sequences.

    PubMed

    Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S

    2009-01-01

    Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.

  13. MIMO Radar System for Respiratory Monitoring Using Tx and Rx Modulation with M-Sequence Codes

    NASA Astrophysics Data System (ADS)

    Miwa, Takashi; Ogiwara, Shun; Yamakoshi, Yoshiki

    The importance of respiratory monitoring systems during sleep have increased due to early diagnosis of sleep apnea syndrome (SAS) in the home. This paper presents a simple respiratory monitoring system suitable for home use having 3D ranging of targets. The range resolution and azimuth resolution are obtained by a stepped frequency transmitting signal and MIMO arrays with preferred pair M-sequence codes doubly modulating in transmission and reception, respectively. Due to the use of these codes, Gold sequence codes corresponding to all the antenna combinations are equivalently modulated in receiver. The signal to interchannel interference ratio of the reconstructed image is evaluated by numerical simulations. The results of experiments on a developed prototype 3D-MIMO radar system show that this system can extract only the motion of respiration of a human subject 2m apart from a metallic rotatable reflector. Moreover, it is found that this system can successfully measure the respiration information of sleeping human subjects for 96.6 percent of the whole measurement time except for instances of large posture change.

  14. Complex alternative splicing of acetylcholinesterase transcripts in Torpedo electric organ; primary structure of the precursor of the glycolipid-anchored dimeric form.

    PubMed Central

    Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J

    1988-01-01

    In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125

  15. A Comparison of Robotic, Body Weight Supported Locomotor Training and Aquatic Therapy in Chronic Motor Incomplete Spinal Cord Injury Subjects

    DTIC Science & Technology

    2015-06-01

    Award Number: W81XWH-10-1-0981 TITLE: "A Comparison of Robotic , Body Weight-Supported Locomotor Training and Aquatic Therapy in Chronic Motor...ABSTRACT U c. THIS PAGE U 19b. TELEPHONE NUMBER (include area code) email: pgorman@umm.edu "A Comparison of Robotic , Body Weight-Supported...months, three times a week aquatic therapy with similar intensity robotically assisted, body weight supported locomotor training (RABWSLT) upon

  16. File compression and encryption based on LLS and arithmetic coding

    NASA Astrophysics Data System (ADS)

    Yu, Changzhi; Li, Hengjian; Wang, Xiyu

    2018-03-01

    e propose a file compression model based on arithmetic coding. Firstly, the original symbols, to be encoded, are input to the encoder one by one, we produce a set of chaotic sequences by using the Logistic and sine chaos system(LLS), and the values of this chaotic sequences are randomly modified the Upper and lower limits of current symbols probability. In order to achieve the purpose of encryption, we modify the upper and lower limits of all character probabilities when encoding each symbols. Experimental results show that the proposed model can achieve the purpose of data encryption while achieving almost the same compression efficiency as the arithmetic coding.

  17. RNA-protein interactions in an unstructured context.

    PubMed

    Zagrovic, Bojan; Bartonek, Lukas; Polyansky, Anton A

    2018-05-31

    Despite their importance, our understanding of noncovalent RNA-protein interactions is incomplete. This especially concerns the binding between RNA and unstructured protein regions, a widespread class of such interactions. Here, we review the recent experimental and computational work on RNA-protein interactions in an unstructured context with a particular focus on how such interactions may be shaped by the intrinsic interaction affinities between individual nucleobases and protein side chains. Specifically, we articulate the claim that the universal genetic code reflects the binding specificity between nucleobases and protein side chains and that, in turn, the code may be seen as the Rosetta stone for understanding RNA-protein interactions in general. © 2018 The Authors. FEBS Letters published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.

  18. Characterization and correction of eddy-current artifacts in unipolar and bipolar diffusion sequences using magnetic field monitoring.

    PubMed

    Chan, Rachel W; von Deuster, Constantin; Giese, Daniel; Stoeck, Christian T; Harmer, Jack; Aitken, Andrew P; Atkinson, David; Kozerke, Sebastian

    2014-07-01

    Diffusion tensor imaging (DTI) of moving organs is gaining increasing attention but robust performance requires sequence modifications and dedicated correction methods to account for system imperfections. In this study, eddy currents in the "unipolar" Stejskal-Tanner and the velocity-compensated "bipolar" spin-echo diffusion sequences were investigated and corrected for using a magnetic field monitoring approach in combination with higher-order image reconstruction. From the field-camera measurements, increased levels of second-order eddy currents were quantified in the unipolar sequence relative to the bipolar diffusion sequence while zeroth and linear orders were found to be similar between both sequences. Second-order image reconstruction based on field-monitoring data resulted in reduced spatial misalignment artifacts and residual displacements of less than 0.43 mm and 0.29 mm (in the unipolar and bipolar sequences, respectively) after second-order eddy-current correction. Results demonstrate the need for second-order correction in unipolar encoding schemes but also show that bipolar sequences benefit from second-order reconstruction to correct for incomplete intrinsic cancellation of eddy-currents. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Compressive Sensing for Radar and Radar Sensor Networks

    DTIC Science & Technology

    2013-12-02

    Zero Correlation Zone Sequence Pair Sets for MIMO Radar Inspired by recent advances in MIMO radar, we apply orthogonal phase coded waveforms to MIMO ...radar system in order to gain better range resolution and target direction finding performance [2]. We provide and investigate a generalized MIMO radar...ZCZ) sequence-Pair Set (ZCZPS). We also study the MIMO radar ambiguity function of the system using phase coded waveforms, based on which we analyze

  20. The complete mitochondrial genome and phylogenetic analysis of the giant panda (Ailuropoda melanoleuca).

    PubMed

    Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong

    2007-08-01

    The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.

Top