Uversky, Vladimir N
2015-03-01
Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Draft Genome Sequence of the Spore-Forming Probiotic Strain Bacillus coagulans Unique IS-2
Upadrasta, Aditya; Pitta, Swetha
2016-01-01
Bacillus coagulans Unique IS-2 is a potential spore-forming probiotic that is commercially available on the market. The draft genome sequence presented here provides deep insight into the beneficial features of this strain for its safe use as a probiotic for various human and animal health applications. PMID:27103709
ERIC Educational Resources Information Center
Lundblad, Heidemarie; Wilson, Barbara A.
2008-01-01
The Department of Accounting at California State University Northridge (CSUN) has developed a unique sequence of courses designed to ensure that accounting students are trained not only in technical accounting, but also acquire critical thinking, research and communication skills. The courses have proven effective and have embedded assessment…
Giraffe genome sequence reveals clues to its unique morphology and physiology
Agaba, Morris; Ishengoma, Edson; Miller, Webb C.; McGrath, Barbara C.; Hudson, Chelsea N.; Bedoya Reina, Oscar C.; Ratan, Aakrosh; Burhans, Rico; Chikhi, Rayan; Medvedev, Paul; Praul, Craig A.; Wu-Cavener, Lan; Wood, Brendan; Robertson, Heather; Penfold, Linda; Cavener, Douglas R.
2016-01-01
The origins of giraffe's imposing stature and associated cardiovascular adaptations are unknown. Okapi, which lacks these unique features, is giraffe's closest relative and provides a useful comparison, to identify genetic variation underlying giraffe's long neck and cardiovascular system. The genomes of giraffe and okapi were sequenced, and through comparative analyses genes and pathways were identified that exhibit unique genetic changes and likely contribute to giraffe's unique features. Some of these genes are in the HOX, NOTCH and FGF signalling pathways, which regulate both skeletal and cardiovascular development, suggesting that giraffe's stature and cardiovascular adaptations evolved in parallel through changes in a small number of genes. Mitochondrial metabolism and volatile fatty acids transport genes are also evolutionarily diverged in giraffe and may be related to its unusual diet that includes toxic plants. Unexpectedly, substantial evolutionary changes have occurred in giraffe and okapi in double-strand break repair and centrosome functions. PMID:27187213
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter.
Mohamed Hashim, Ezzeddin Kamil; Abdullah, Rosni
2015-12-21
Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hudson, Corey M.; Williams, Kelly P.
We report that the transfer-messenger RNA (tmRNA) and its partner protein SmpB act together in resolving problems arising when translating bacterial ribosomes reach the end of mRNA with no stop codon. Their genes have been found in nearly all bacterial genomes and in some organelles. The tmRNA Website serves tmRNA sequences, alignments and feature annotations, and has recently moved to http: //bioinformatics.sandia.gov/tmrna/. New features include software used to find the sequences, an update raising the number of unique tmRNA sequences from 492 to 1716, and a database of SmpB sequences which are served along with the tmRNA sequence from themore » same organism.« less
Hudson, Corey M.; Williams, Kelly P.
2014-11-05
We report that the transfer-messenger RNA (tmRNA) and its partner protein SmpB act together in resolving problems arising when translating bacterial ribosomes reach the end of mRNA with no stop codon. Their genes have been found in nearly all bacterial genomes and in some organelles. The tmRNA Website serves tmRNA sequences, alignments and feature annotations, and has recently moved to http: //bioinformatics.sandia.gov/tmrna/. New features include software used to find the sequences, an update raising the number of unique tmRNA sequences from 492 to 1716, and a database of SmpB sequences which are served along with the tmRNA sequence from themore » same organism.« less
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.
Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen
2016-02-02
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome
Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen
2016-01-01
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814
MySSP: Non-stationary evolutionary sequence simulation, including indels
Rosenberg, Michael S.
2007-01-01
MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package. PMID:19325855
Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.
Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia
2017-07-24
Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.
Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)
Singh, Ranjan K.; Tanner, John J.
2013-01-01
Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760
Sass, Chodon; Little, Damon P.; Stevenson, Dennis Wm.; Specht, Chelsea D.
2007-01-01
Barcodes are short segments of DNA that can be used to uniquely identify an unknown specimen to species, particularly when diagnostic morphological features are absent. These sequences could offer a new forensic tool in plant and animal conservation—especially for endangered species such as members of the Cycadales. Ideally, barcodes could be used to positively identify illegally obtained material even in cases where diagnostic features have been purposefully removed or to release confiscated organisms into the proper breeding population. In order to be useful, a DNA barcode sequence must not only easily PCR amplify with universal or near-universal reaction conditions and primers, but also contain enough variation to generate unique identifiers at either the species or population levels. Chloroplast regions suggested by the Plant Working Group of the Consortium for the Barcode of Life (CBoL), and two alternatives, the chloroplast psbA-trnH intergenic spacer and the nuclear ribosomal internal transcribed spacer (nrITS), were tested for their utility in generating unique identifiers for members of the Cycadales. Ease of amplification and sequence generation with universal primers and reaction conditions was determined for each of the seven proposed markers. While none of the proposed markers provided unique identifiers for all species tested, nrITS showed the most promise in terms of variability, although sequencing difficulties remain a drawback. We suggest a workflow for DNA barcoding, including database generation and management, which will ultimately be necessary if we are to succeed in establishing a universal DNA barcode for plants. PMID:17987130
A novel, privacy-preserving cryptographic approach for sharing sequencing data
Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D
2013-01-01
Objective DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence. PMID:23125421
Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric
2017-02-01
Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Prevalence and genome characteristics of canine astrovirus in southwest China.
Li, Mingxiang; Yan, Nan; Ji, Conghui; Wang, Min; Zhang, Bin; Yue, Hua; Tang, Cheng
2018-05-30
The aim of this study was to investigate canine astrovirus (CaAstV) infection in southwest China. We collected 107 faecal samples from domestic dogs with obvious diarrhoea. Forty-two diarrhoeic samples (39.3 %) were positive for CaAstV by RT-PCR, and 41/42 samples showed co-infection with canine coronavirus (CCoV), canine parvovirus-2 (CPV-2) and canine distemper virus (CDV). Phylogenetic analysis based on 26 CaAstV partial ORF1a and ORF1b sequences revealed that most CaAstV strains showed unique evolutionary features. Interestingly, putative recombination events were observed among four of the five complete ORF2 sequences cloned in this study, and three of the five complete ORF2 sequences formed a single unique group, suggesting that these strains could be a novel genotype. We successfully sequenced the complete genome of one CaAstV strain (designated 2017/44/CHN), which was 6628 nt in length. The features of this genome include putative recombination events in the ORF1a, ORF1b and ORF2 genes, while the ORF2 gene had a continuous insertion of 7 aa in region II compared with the other complete ORF2 sequences available in GenBank. Phylogenetic analysis showed that 2017/44/CHN formed a single group based on genome sequences, suggesting that this strain might be a novel genotype. The results of this study revealed that CaAstV circulates widely in diarrhoeic dogs in southwest China and exhibits unique evolutionary events. To the best of our knowledge, this is the first report of recombination events in CaAstV, and it contributes to further understanding of the genetic evolution of CaAstV.
Distribution and Features of the Six Classes of Peroxiredoxins
Poole, Leslie B.; Nelson, Kimberly J.
2016-01-01
Peroxiredoxins are cysteine-dependent peroxide reductases that group into 6 different, structurally discernable classes. In 2011, our research team reported the application of a bioinformatic approach called active site profiling to extract active site-proximal sequence segments from the 29 distinct, structurally-characterized peroxiredoxins available at the time. These extracted sequences were then used to create unique profiles for the six groups which were subsequently used to search GenBank(nr), allowing identification of ∼3500 peroxiredoxin sequences and their respective subgroups. Summarized in this minireview are the features and phylogenetic distributions of each of these peroxiredoxin subgroups; an example is also provided illustrating the use of the web accessible, searchable database known as PREX to identify subfamily-specific peroxiredoxin sequences for the organism Vitis vinifera (grape). PMID:26810075
Ishimaru, M; Morikawa, K; Hifumi, E; Itoh, T; Uda, T
2000-01-01
A monoclonal antibody against methamphetamine (MA-3 mAb) was found to be strongly bound to ephedrine. This feature was quite different from that of other fourteen mAbs against MA. Analyses of cDNA sequence and steric conformation by molecular modeling revealed that one hydrophilic pocket was generated in the heavy chain of MA-3 mAb involving CDRH-1 and CDRH-2. Asn33, Asn35, Asn50 and Asp52 were the main components of the unique pocket capable of binding to the hydroxyl group of ephedrine.
The Physics and Mathematics of MRI
NASA Astrophysics Data System (ADS)
Ansorge, Richard; Graves, Martin
2016-10-01
Magnetic Resonance Imaging is a very important clinical imaging tool. It combines different fields of physics and engineering in a uniquely complex way. MRI is also surprisingly versatile, `pulse sequences' can be designed to yield many different types of contrast. This versatility is unique to MRI. This short book gives both an in depth account of the methods used for the operation and construction of modern MRI systems and also the principles of sequence design and many examples of applications. An important additional feature of this book is the detailed discussion of the mathematical principles used in building optimal MRI systems and for sequence design. The mathematical discussion is very suitable for undergraduates attending medical physics courses. It is also more complete than usually found in alternative books for physical scientists or more clinically orientated works.
Novel Insights into Tree Biology and Genome Evolution as Revealed Through Genomics.
Neale, David B; Martínez-García, Pedro J; De La Torre, Amanda R; Montanari, Sara; Wei, Xiao-Xin
2017-04-28
Reference genome sequences are the key to the discovery of genes and gene families that determine traits of interest. Recent progress in sequencing technologies has enabled a rapid increase in genome sequencing of tree species, allowing the dissection of complex characters of economic importance, such as fruit and wood quality and resistance to biotic and abiotic stresses. Although the number of reference genome sequences for trees lags behind those for other plant species, it is not too early to gain insight into the unique features that distinguish trees from nontree plants. Our review of the published data suggests that, although many gene families are conserved among herbaceous and tree species, some gene families, such as those involved in resistance to biotic and abiotic stresses and in the synthesis and transport of sugars, are often expanded in tree genomes. As the genomes of more tree species are sequenced, comparative genomics will further elucidate the complexity of tree genomes and how this relates to traits unique to trees.
Pal Choudhury, Pabitra
2017-01-01
Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850
Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A
2014-01-01
Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.
McGillewie, Lara; Ramesh, Muthusamy; Soliman, Mahmoud E
2017-10-01
Aspartic proteases are a class of hydrolytic enzymes that have been implicated in a number of diseases such as HIV, malaria, cancer and Alzheimer's. The flap region of aspartic proteases is a characteristic unique structural feature of these enzymes; and found to have a profound impact on protein overall structure, function and dynamics. Flap dynamics also plays a crucial role in drug binding and drug resistance. Therefore, understanding the structure and dynamic behavior of this flap regions is crucial in the design of potent and selective inhibitors against aspartic proteases. Defining metrics that can describe the flap motion/dynamics has been a challenging topic in literature. This review is the first attempt to compile comprehensive information on sequence, structure, motion and metrics used to assess the dynamics of the flap region of different aspartic proteases in "one pot". We believe that this review would be of critical importance to the researchers from different scientific domains.
Classifying Noisy Protein Sequence Data: A Case Study of Immunoglobulin Light Chains
2005-01-01
collected from patients with and without amyloidosis , and indicates that the proposed modified classifi- ers are more robust to sequence variability than...piled from patients with and without amyloidosis provides unique features to serve as a model system, not only for conformational disease studies but...produced by patients with amyloidosis . SVMs have been used recently in a wide variety of applica- tions in computational biology (Noble, 2004). Most
The Placenta in Twin-to-Twin Transfusion Syndrome and Twin Anemia Polycythemia Sequence.
Couck, Isabel; Lewi, Liesbeth
2016-06-01
Twin-to-twin transfusion syndrome (TTTS) and twin anemia polycythemia sequence (TAPS) are complications unique to monochorionic twin pregnancies and their shared circulation. Both are the result of the transfusion imbalance in the intertwin circulation. TTTS is characterized by an amniotic fluid discordance, whereas in TAPS, there is a severe discordance in hemoglobin levels. The article gives an overview of the typical features of TTTS and TAPS placentas.
Droege, Marcus; Hill, Brendon
2008-08-31
The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
Hegazi, Moustafa Abdelaal; Manou, Sommen; Sakr, Hazem; Camp, Guy Van
2017-01-01
Inherited Palmoplantar Keratodermas are rare disorders of genodermatosis that are conventionally regarded as autosomal dominant in inheritance with extensive clinical and genetic heterogeneity. This is the first report of a unique autosomal recessive Inherited Palmoplantar keratoderma - sensorineural hearing loss syndrome which has not been reported before in 3 siblings of a large consanguineous family. The patients presented unique clinical features that were different from other known Inherited Palmoplantar Keratodermas - hearing loss syndromes. Mutations in GJB2 or GJB6 and the mitochondrial A7445G mutation, known to be the major causes of diverse Inherited Palmoplantar Keratodermas -hearing loss syndromes were not detected by Sanger sequencing. Moreover, the pathogenic mutation could not be identified using whole exome sequencing. Other known Inherited Palmoplantar keratoderma syndromes were excluded based on both clinical criteria and genetic analysis. PMID:29267478
Egorov, Evgeny S; Merzlyak, Ekaterina M; Shelenkov, Andrew A; Britanova, Olga V; Sharonov, George V; Staroverov, Dmitriy B; Bolotin, Dmitriy A; Davydov, Alexey N; Barsova, Ekaterina; Lebedev, Yuriy B; Shugay, Mikhail; Chudakov, Dmitriy M
2015-06-15
Emerging high-throughput sequencing methods for the analyses of complex structure of TCR and BCR repertoires give a powerful impulse to adaptive immunity studies. However, there are still essential technical obstacles for performing a truly quantitative analysis. Specifically, it remains challenging to obtain comprehensive information on the clonal composition of small lymphocyte populations, such as Ag-specific, functional, or tissue-resident cell subsets isolated by sorting, microdissection, or fine needle aspirates. In this study, we report a robust approach based on unique molecular identifiers that allows profiling Ag receptors for several hundred to thousand lymphocytes while preserving qualitative and quantitative information on clonal composition of the sample. We also describe several general features regarding the data analysis with unique molecular identifiers that are critical for accurate counting of starting molecules in high-throughput sequencing applications. Copyright © 2015 by The American Association of Immunologists, Inc.
Using cellular automata to generate image representation for biological sequences.
Xiao, X; Shao, S; Ding, Y; Huang, Z; Chen, X; Chou, K-C
2005-02-01
A novel approach to visualize biological sequences is developed based on cellular automata (Wolfram, S. Nature 1984, 311, 419-424), a set of discrete dynamical systems in which space and time are discrete. By transforming the symbolic sequence codes into the digital codes, and using some optimal space-time evolvement rules of cellular automata, a biological sequence can be represented by a unique image, the so-called cellular automata image. Many important features, which are originally hidden in a long and complicated biological sequence, can be clearly revealed thru its cellular automata image. With biological sequences entering into databanks rapidly increasing in the post-genomic era, it is anticipated that the cellular automata image will become a very useful vehicle for investigation into their key features, identification of their function, as well as revelation of their "fingerprint". It is anticipated that by using the concept of the pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43, 246-255), the cellular automata image approach can also be used to improve the quality of predicting protein attributes, such as structural class and subcellular location.
Learning and Visualizing Modulation Discriminative Radio Signal Features
2016-09-01
implemented as a mapping of a sequence of in-phase quadrature ( IQ ) measurements generated by a software-defined radio to a probability distri- bution...over modulation classes. 3.1 TRAINING SNR EVALUATION Training CNNs on RF data raises the unique question of determining an optimal training SNR, that
Nguyen, Tuan; Ruan, Zheng; Oruganty, Krishnadev; Kannan, Natarajan
2015-01-01
Mitogen activated protein kinases (MAPKs) form a closely related family of kinases that control critical pathways associated with cell growth and survival. Although MAPKs have been extensively characterized at the biochemical, cellular, and structural level, an integrated evolutionary understanding of how MAPKs differ from other closely related protein kinases is currently lacking. Here, we perform statistical sequence comparisons of MAPKs and related protein kinases to identify sequence and structural features associated with MAPK functional divergence. We show, for the first time, that virtually all MAPK-distinguishing sequence features, including an unappreciated short insert segment in the β4-β5 loop, physically couple distal functional sites in the kinase domain to the D-domain peptide docking groove via the C-terminal flanking tail (C-tail). The coupling mediated by MAPK-specific residues confers an allosteric regulatory mechanism unique to MAPKs. In particular, the regulatory αC-helix conformation is controlled by a MAPK-conserved salt bridge interaction between an arginine in the αC-helix and an acidic residue in the C-tail. The salt-bridge interaction is modulated in unique ways in individual sub-families to achieve regulatory specificity. Our study is consistent with a model in which the C-tail co-evolved with the D-domain docking site to allosterically control MAPK activity. Our study provides testable mechanistic hypotheses for biochemical characterization of MAPK-conserved residues and new avenues for the design of allosteric MAPK inhibitors. PMID:25799139
Amelogenin Evolution and Tetrapod Enamel Structure
Diekwisch, Thomas G.H.; Jin, Tianquan; Wang, Xinping; Ito, Yoshihiro; Schmidt, Marcella; Druzinsky, Robert; Yamane, Akira; Luan, Xianghong
2009-01-01
Amelogenins are the major proteins involved in tooth enamel formation. In the present study we have cloned and sequenced four novel amelogenins from three amphibian species in order to analyze similarities and differences between mammalian and non-mammalian amelogenins. The newly sequenced amphibian amelogenin sequences were from a Red-eyed tree frog (Litoria chloris) and a Mexican axolotl (Ambystoma mexicanum). We identified two amelogenin isoforms in the Eastern Red-backed Salamander (Plethodon cinereus). Sequence comparisons confirmed that non-mammalian amelogenins are overall shorter than their mammalian counterparts, contain less proline and less glutamine, and feature shorter polyproline tripeptide repeat stretches than mammalian amelogenins. We propose that unique sequence parameters of mammalian amelogenins might be a pre-requisite for complex mammalian enamel prism architecture. PMID:19828974
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome
USDA-ARS?s Scientific Manuscript database
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the last two decades. This global resurgence is likely linked to increased international travel and commerce and widespread insecticide resistance. Analyses of the C. le...
Nakhaei-Rad, Saeideh; Nakhaeizadeh, Hossein; Kordes, Claus; Cirstea, Ion C; Schmick, Malte; Dvorsky, Radovan; Bastiaens, Philippe I H; Häussinger, Dieter; Ahmadian, Mohammad Reza
2015-06-19
E-RAS is a member of the RAS family specifically expressed in embryonic stem cells, gastric tumors, and hepatic stellate cells. Unlike classical RAS isoforms (H-, N-, and K-RAS4B), E-RAS has, in addition to striking and remarkable sequence deviations, an extended 38-amino acid-long unique N-terminal region with still unknown functions. We investigated the molecular mechanism of E-RAS regulation and function with respect to its sequence and structural features. We found that N-terminal extension of E-RAS is important for E-RAS signaling activity. E-RAS protein most remarkably revealed a different mode of effector interaction as compared with H-RAS, which correlates with deviations in the effector-binding site of E-RAS. Of all these residues, tryptophan 79 (arginine 41 in H-RAS), in the interswitch region, modulates the effector selectivity of RAS proteins from H-RAS to E-RAS features. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system
Sunkin, Susan M.; Ng, Lydia; Lau, Chris; Dolbeare, Tim; Gilbert, Terri L.; Thompson, Carol L.; Hawrylycz, Michael; Dang, Chinh
2013-01-01
The Allen Brain Atlas (http://www.brain-map.org) provides a unique online public resource integrating extensive gene expression data, connectivity data and neuroanatomical information with powerful search and viewing tools for the adult and developing brain in mouse, human and non-human primate. Here, we review the resources available at the Allen Brain Atlas, describing each product and data type [such as in situ hybridization (ISH) and supporting histology, microarray, RNA sequencing, reference atlases, projection mapping and magnetic resonance imaging]. In addition, standardized and unique features in the web applications are described that enable users to search and mine the various data sets. Features include both simple and sophisticated methods for gene searches, colorimetric and fluorescent ISH image viewers, graphical displays of ISH, microarray and RNA sequencing data, Brain Explorer software for 3D navigation of anatomy and gene expression, and an interactive reference atlas viewer. In addition, cross data set searches enable users to query multiple Allen Brain Atlas data sets simultaneously. All of the Allen Brain Atlas resources can be accessed through the Allen Brain Atlas data portal. PMID:23193282
Selvin, Joseph; Sathiyanarayanan, Ganesan; Lipton, Anuj N.; Al-Dhabi, Naif Abdullah; Valan Arasu, Mariadhas; Kiran, George S.
2016-01-01
The important biological macromolecules, such as lipopeptide and glycolipid biosurfactant producing marine actinobacteria were analyzed and their potential linkage between type II polyketide synthase (PKS) genes was explored. A unique feature of type II PKS genes is their high amino acid (AA) sequence homology and conserved gene organization. These enzymes mediate the biosynthesis of polyketide natural products with enormous structural complexity and chemical nature by combinatorial use of various domains. Therefore, deciphering the order of AA sequence encoded by PKS domains tailored the chemical structure of polyketide analogs still remains a great challenge. The present work deals with an in vitro and in silico analysis of PKS type II genes from five actinobacterial species to correlate KS domain architecture and structural features. Our present analysis reveals the unique protein domain organization of iterative type II PKS and KS domain of marine actinobacteria. The findings of this study would have implications in metabolic pathway reconstruction and design of semi-synthetic genomes to achieve rational design of novel natural products. PMID:26903957
BrucellaBase: Genome information resource.
Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash
2016-09-01
Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html. Copyright © 2016 Elsevier B.V. All rights reserved.
Application of Wavelet Transform for PDZ Domain Classification
Daqrouq, Khaled; Alhmouz, Rami; Balamesh, Ahmed; Memic, Adnan
2015-01-01
PDZ domains have been identified as part of an array of signaling proteins that are often unrelated, except for the well-conserved structural PDZ domain they contain. These domains have been linked to many disease processes including common Avian influenza, as well as very rare conditions such as Fraser and Usher syndromes. Historically, based on the interactions and the nature of bonds they form, PDZ domains have most often been classified into one of three classes (class I, class II and others - class III), that is directly dependent on their binding partner. In this study, we report on three unique feature extraction approaches based on the bigram and trigram occurrence and existence rearrangements within the domain's primary amino acid sequences in assisting PDZ domain classification. Wavelet packet transform (WPT) and Shannon entropy denoted by wavelet entropy (WE) feature extraction methods were proposed. Using 115 unique human and mouse PDZ domains, the existence rearrangement approach yielded a high recognition rate (78.34%), which outperformed our occurrence rearrangements based method. The recognition rate was (81.41%) with validation technique. The method reported for PDZ domain classification from primary sequences proved to be an encouraging approach for obtaining consistent classification results. We anticipate that by increasing the database size, we can further improve feature extraction and correct classification. PMID:25860375
Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.
Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh
2018-01-01
Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.
Karimi, Zahra; Ahmadi, Ali; Najafi, Ali; Ranjbar, Reza
2018-01-01
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci as novel and applicable regions in prokaryotic genomes have gained great attraction in the post genomics era. These unique regions are diverse in number and sequence composition in different pathogenic bacteria and thereby can be a suitable candidate for molecular epidemiology and genotyping studies. Results:Furthermore, the arrayed structure of CRISPR loci (several unique repeats spaced with the variable sequence) and associated cas genes act as an active prokaryotic immune system against viral replication and conjugative elements. This property can be used as a tool for RNA editing in bioengineering studies. The aim of this review was to survey some details about the history, nature, and potential applications of CRISPR arrays in both genetic engineering and bacterial genotyping studies.
Seo, Young-Su; Lim, Jae Yun; Park, Jungwook; Kim, Sunyoung; Lee, Hyun-Hee; Cheong, Hoon; Kim, Sang-Mok; Moon, Jae Sun; Hwang, Ingyu
2015-05-06
In addition to human and animal diseases, bacteria of the genus Burkholderia can cause plant diseases. The representative species of rice-pathogenic Burkholderia are Burkholderia glumae, B. gladioli, and B. plantarii, which primarily cause grain rot, sheath rot, and seedling blight, respectively, resulting in severe reductions in rice production. Though Burkholderia rice pathogens cause problems in rice-growing countries, comprehensive studies of these rice-pathogenic species aiming to control Burkholderia-mediated diseases are only in the early stages. We first sequenced the complete genome of B. plantarii ATCC 43733T. Second, we conducted comparative analysis of the newly sequenced B. plantarii ATCC 43733T genome with eleven complete or draft genomes of B. glumae and B. gladioli strains. Furthermore, we compared the genome of three rice Burkholderia pathogens with those of other Burkholderia species such as those found in environmental habitats and those known as animal/human pathogens. These B. glumae, B. gladioli, and B. plantarii strains have unique genes involved in toxoflavin or tropolone toxin production and the clustered regularly interspaced short palindromic repeats (CRISPR)-mediated bacterial immune system. Although the genome of B. plantarii ATCC 43733T has many common features with those of B. glumae and B. gladioli, this B. plantarii strain has several unique features, including quorum sensing and CRISPR/CRISPR-associated protein (Cas) systems. The complete genome sequence of B. plantarii ATCC 43733T and publicly available genomes of B. glumae BGR1 and B. gladioli BSR3 enabled comprehensive comparative genome analyses among three rice-pathogenic Burkholderia species responsible for tissue rotting and seedling blight. Our results suggest that B. glumae has evolved rapidly, or has undergone rapid genome rearrangements or deletions, in response to the hosts. It also, clarifies the unique features of rice pathogenic Burkholderia species relative to other animal and human Burkholderia species.
zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs.
Parekh, Swati; Ziegenhain, Christoph; Vieth, Beate; Enard, Wolfgang; Hellmann, Ines
2018-06-01
Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data.
On continuous user authentication via typing behavior.
Roth, Joseph; Liu, Xiaoming; Metaxas, Dimitris
2014-10-01
We hypothesize that an individual computer user has a unique and consistent habitual pattern of hand movements, independent of the text, while typing on a keyboard. As a result, this paper proposes a novel biometric modality named typing behavior (TB) for continuous user authentication. Given a webcam pointing toward a keyboard, we develop real-time computer vision algorithms to automatically extract hand movement patterns from the video stream. Unlike the typical continuous biometrics, such as keystroke dynamics (KD), TB provides a reliable authentication with a short delay, while avoiding explicit key-logging. We collect a video database where 63 unique subjects type static text and free text for multiple sessions. For one typing video, the hands are segmented in each frame and a unique descriptor is extracted based on the shape and position of hands, as well as their temporal dynamics in the video sequence. We propose a novel approach, named bag of multi-dimensional phrases, to match the cross-feature and cross-temporal pattern between a gallery sequence and probe sequence. The experimental results demonstrate a superior performance of TB when compared with KD, which, together with our ultrareal-time demo system, warrant further investigation of this novel vision application and biometric modality.
Church, George M.; Kieffer-Higgins, Stephen
1992-01-01
This invention features vectors and a method for sequencing DNA. The method includes the steps of: a) ligating the DNA into a vector comprising a tag sequence, the tag sequence includes at least 15 bases, wherein the tag sequence will not hybridize to the DNA under stringent hybridization conditions and is unique in the vector, to form a hybrid vector, b) treating the hybrid vector in a plurality of vessels to produce fragments comprising the tag sequence, wherein the fragments differ in length and terminate at a fixed known base or bases, wherein the fixed known base or bases differs in each vessel, c) separating the fragments from each vessel according to their size, d) hybridizing the fragments with an oligonucleotide able to hybridize specifically with the tag sequence, and e) detecting the pattern of hybridization of the tag sequence, wherein the pattern reflects the nucleotide sequence of the DNA.
Bauer, Patricia J; Lukowski, Angela F
2010-09-01
The second year of life is marked by pronounced changes in the length of time over which events are remembered. We tested whether the age-related differences are related to differences in memory for the specific features of events. In our study, 16- and 20-month-olds were tested for immediate and long-term recall of individual actions and temporal order of actions of three-step sequences in an elicited imitation paradigm as well as for forced-choice recognition of the specific feature of the props used to produce the sequences. Memory for the props was related to long-term recall of the events only for the 20-month-olds. It accounted for unique variance above and beyond the variance explained by immediate recall of the individual actions and the temporal order of actions of the sequences. The different pattern of relations in the older and younger infants seemingly reflects a developmental difference in the determinants of long-term recall over the second year of life. Copyright 2010 Elsevier Inc. All rights reserved.
Kahlke, Tim; Goesmann, Alexander; Hjerde, Erik; Willassen, Nils Peder; Haugen, Peik
2012-05-10
The criteria for defining bacterial species and even the concept of bacterial species itself are under debate, and the discussion is apparently intensifying as more genome sequence data is becoming available. However, it is still unclear how the new advances in genomics should be used most efficiently to address this question. In this study we identify genes that are common to any group of genomes in our dataset, to determine whether genes specific to a particular taxon exist and to investigate their potential role in adaptation of bacteria to their specific niche. These genes were named unique core genes. Additionally, we investigate the existence and importance of unique core genes that are found in isolates of phylogenetically non-coherent groups. These groups of isolates, that share a genetic feature without sharing a closest common ancestor, are termed genophyletic groups. The bacterial family Vibrionaceae was used as the model, and we compiled and compared genome sequences of 64 different isolates. Using the software orthoMCL we determined clusters of homologous genes among the investigated genome sequences. We used multilocus sequence analysis to build a host phylogeny and mapped the numbers of unique core genes of all distinct groups of isolates onto the tree. The results show that unique core genes are more likely to be found in monophyletic groups of isolates. Genophyletic groups of isolates, in contrast, are less common especially for large groups of isolate. The subsequent annotation of unique core genes that are present in genophyletic groups indicate a high degree of horizontally transferred genes. Finally, the annotation of the unique core genes of Vibrio cholerae revealed genes involved in aerotaxis and biosynthesis of the iron-chelator vibriobactin. The presented work indicates that genes specific for any taxon inside the bacterial family Vibrionaceae exist. These unique core genes encode conserved metabolic functions that can shed light on the adaptation of a species to its ecological niche. Additionally, our study suggests that unique core genes can be used to aid classification of bacteria and contribute to a bacterial species definition on a genomic level. Furthermore, these genes may be of importance in clinical diagnostics and drug development.
Computational Tools and Algorithms for Designing Customized Synthetic Genes
Gould, Nathan; Hendy, Oliver; Papamichail, Dimitris
2014-01-01
Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations. PMID:25340050
Registration methods for nonblind watermark detection in digital cinema applications
NASA Astrophysics Data System (ADS)
Nguyen, Philippe; Balter, Raphaele; Montfort, Nicolas; Baudry, Severine
2003-06-01
Digital watermarking may be used to enforce copyright protection of digital cinema, by embedding in each projected movie an unique identifier (fingerprint). By identifying the source of illegal copies, watermarking will thus incite movie theatre managers to enforce copyright protection, in particular by preventing people from coming in with a handy cam. We propose here a non-blind watermark method to improve the watermark detection on very impaired sequences. We first present a study on the picture impairments caused by the projection on a screen, then acquisition with a handy cam. We show that images undergo geometric deformations, which are fully described by a projective geometry model. The sequence also undergoes spatial and temporal luminance variation. Based on this study and on the impairments models which follow, we propose a method to match the retrieved sequence to the original one. First, temporal registration is performed by comparing the average luminance variation on both sequences. To compensate for geometric transformations, we used paired points from both sequences, obtained by applying a feature points detector. The matching of the feature points then enables to retrieve the geometric transform parameters. Tests show that the watermark retrieval on rectified sequences is greatly improved.
NASA Astrophysics Data System (ADS)
Derelle, Evelyne; Ferraz, Conchita; Rombauts, Stephane; Rouzé, Pierre; Worden, Alexandra Z.; Robbens, Steven; Partensky, Frédéric; Degroeve, Sven; Echeynié, Sophie; Cooke, Richard; Saeys, Yvan; Wuyts, Jan; Jabbari, Kamel; Bowler, Chris; Panaud, Olivier; Piégu, Benoît; Ball, Steven G.; Ral, Jean-Philippe; Bouget, François-Yves; Piganeau, Gwenael; de Baets, Bernard; Picard, André; Delseny, Michel; Demaille, Jacques; van de Peer, Yves; Moreau, Hervé
2006-08-01
The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry. genome heterogeneity | genome sequence | green alga | Prasinophyceae | gene prediction
Yin, Long-Lin; Song, Bin; Guan, Ying; Li, Ying-Chun; Chen, Guang-Wen; Zhao, Li-Ming; Lai, Li
2014-09-01
To investigate MRI features and associated histological and pathological changes of hilar and extrahepatic big bile duct cholangiocarcinoma with different morphological sub-types, and its value in differentiating between nodular cholangiocarcinoma (NCC) and intraductal growing cholangiocarcinoma (IDCC). Imaging data of 152 patients with pathologically confirmed hilar and extrahepatic big bile duct cholangiocarcinoma were reviewed, which included 86 periductal infiltrating cholangiocarcinoma (PDCC), 55 NCC, and 11 IDCC. Imaging features of the three morphological sub-types were compared. Each of the subtypes demonstrated its unique imaging features. Significant differences (P < 0.05) were found between NCC and IDCC in tumor shape, dynamic enhanced pattern, enhancement degree during equilibrium phase, multiplicity or singleness of tumor, changes in wall and lumen of bile duct at the tumor-bearing segment, dilatation of tumor upstream or downstream bile duct, and invasion of adjacent organs. Imaging features reveal tumor growth patterns of hilar and extrahepatic big bile duct cholangiocarcinoma. MRI united-sequences examination can accurately describe those imaging features for differentiation diagnosis.
USDA-ARS?s Scientific Manuscript database
The rye genome features a high percentage of repetitive elements, especially transposable elements (TEs). However, studies about the constitution and organization of TEs on rye chromosomes are limited. In this study, 97 unique TE segments were isolated and characterized; 50 TE segmemts showed varyin...
Feature Acquisition with Imbalanced Training Data
NASA Technical Reports Server (NTRS)
Thompson, David R.; Wagstaff, Kiri L.; Majid, Walid A.; Jones, Dayton L.
2011-01-01
This work considers cost-sensitive feature acquisition that attempts to classify a candidate datapoint from incomplete information. In this task, an agent acquires features of the datapoint using one or more costly diagnostic tests, and eventually ascribes a classification label. A cost function describes both the penalties for feature acquisition, as well as misclassification errors. A common solution is a Cost Sensitive Decision Tree (CSDT), a branching sequence of tests with features acquired at interior decision points and class assignment at the leaves. CSDT's can incorporate a wide range of diagnostic tests and can reflect arbitrary cost structures. They are particularly useful for online applications due to their low computational overhead. In this innovation, CSDT's are applied to cost-sensitive feature acquisition where the goal is to recognize very rare or unique phenomena in real time. Example applications from this domain include four areas. In stream processing, one seeks unique events in a real time data stream that is too large to store. In fault protection, a system must adapt quickly to react to anticipated errors by triggering repair activities or follow- up diagnostics. With real-time sensor networks, one seeks to classify unique, new events as they occur. With observational sciences, a new generation of instrumentation seeks unique events through online analysis of large observational datasets. This work presents a solution based on transfer learning principles that permits principled CSDT learning while exploiting any prior knowledge of the designer to correct both between-class and withinclass imbalance. Training examples are adaptively reweighted based on a decomposition of the data attributes. The result is a new, nonparametric representation that matches the anticipated attribute distribution for the target events.
Adrian-Kalchhauser, Irene; Svensson, Ola; Kutschera, Verena E; Alm Rosenblad, Magnus; Pippel, Martin; Winkler, Sylke; Schloissnig, Siegfried; Blomberg, Anders; Burkhardt-Holm, Patricia
2017-02-16
Vertebrate mitochondrial genomes are optimized for fast replication and low cost of RNA expression. Accordingly, they are devoid of introns, are transcribed as polycistrons and contain very little intergenic sequences. Usually, vertebrate mitochondrial genomes measure between 16.5 and 17 kilobases (kb). During genome sequencing projects for two novel vertebrate models, the invasive round goby and the sand goby, we found that the sand goby genome is exceptionally small (16.4 kb), while the mitochondrial genome of the round goby is much larger than expected for a vertebrate. It is 19 kb in size and is thus one of the largest fish and even vertebrate mitochondrial genomes known to date. The expansion is attributable to a sequence insertion downstream of the putative transcriptional start site. This insertion carries traces of repeats from the control region, but is mostly novel. To get more information about this phenomenon, we gathered all available mitochondrial genomes of Gobiidae and of nine gobioid species, performed phylogenetic analyses, analysed gene arrangements, and compared gobiid mitochondrial genome sizes, ecological information and other species characteristics with respect to the mitochondrial phylogeny. This allowed us amongst others to identify a unique arrangement of tRNAs among Ponto-Caspian gobies. Our results indicate that the round goby mitochondrial genome may contain novel features. Since mitochondrial genome organisation is tightly linked to energy metabolism, these features may be linked to its invasion success. Also, the unique tRNA arrangement among Ponto-Caspian gobies may be helpful in studying the evolution of this highly adaptive and invasive species group. Finally, we find that the phylogeny of gobiids can be further refined by the use of longer stretches of linked DNA sequence.
Karimi, Zahra; Ahmadi, Ali; Najafi, Ali; Ranjbar, Reza
2018-01-01
Introduction: CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci as novel and applicable regions in prokaryotic genomes have gained great attraction in the post genomics era. Methods: These unique regions are diverse in number and sequence composition in different pathogenic bacteria and thereby can be a suitable candidate for molecular epidemiology and genotyping studies. Results:Furthermore, the arrayed structure of CRISPR loci (several unique repeats spaced with the variable sequence) and associated cas genes act as an active prokaryotic immune system against viral replication and conjugative elements. This property can be used as a tool for RNA editing in bioengineering studies. Conclusion: The aim of this review was to survey some details about the history, nature, and potential applications of CRISPR arrays in both genetic engineering and bacterial genotyping studies. PMID:29755603
Arambula, Diego; Wong, Wenge; Medhekar, Bob A; Guo, Huatao; Gingery, Mari; Czornyj, Elizabeth; Liu, Minghsun; Dey, Sanghamitra; Ghosh, Partho; Miller, Jeff F
2013-05-14
Diversity-generating retroelements (DGRs) are a unique family of retroelements that confer selective advantages to their hosts by facilitating localized DNA sequence evolution through a specialized error-prone reverse transcription process. We characterized a DGR in Legionella pneumophila, an opportunistic human pathogen that causes Legionnaires disease. The L. pneumophila DGR is found within a horizontally acquired genomic island, and it can theoretically generate 10(26) unique nucleotide sequences in its target gene, legionella determinent target A (ldtA), creating a repertoire of 10(19) distinct proteins. Expression of the L. pneumophila DGR resulted in transfer of DNA sequence information from a template repeat to a variable repeat (VR) accompanied by adenine-specific mutagenesis of progeny VRs at the 3'end of ldtA. ldtA encodes a twin-arginine translocated lipoprotein that is anchored in the outer leaflet of the outer membrane, with its C-terminal variable region surface exposed. Related DGRs were identified in L. pneumophila clinical isolates that encode unique target proteins with homologous VRs, demonstrating the adaptability of DGR components. This work characterizes a DGR that diversifies a bacterial protein and confirms the hypothesis that DGR-mediated mutagenic homing occurs through a conserved mechanism. Comparative bioinformatics predicts that surface display of massively variable proteins is a defining feature of a subset of bacterial DGRs.
Exploration of the relationship between topology and designability of conformations
NASA Astrophysics Data System (ADS)
Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej
2011-06-01
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.
Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou
2014-11-01
In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.
Wegrzyn, Jill L.; Liechty, John D.; Stevens, Kristian A.; Wu, Le-Shin; Loopstra, Carol A.; Vasquez-Gross, Hans A.; Dougherty, William M.; Lin, Brian Y.; Zieve, Jacob J.; Martínez-García, Pedro J.; Holt, Carson; Yandell, Mark; Zimin, Aleksey V.; Yorke, James A.; Crepeau, Marc W.; Puiu, Daniela; Salzberg, Steven L.; de Jong, Pieter J.; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H.; Neale, David B.
2014-01-01
The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%. PMID:24653211
TCRmodel: high resolution modeling of T cell receptors from sequence.
Gowthaman, Ragul; Pierce, Brian G
2018-05-22
T cell receptors (TCRs), along with antibodies, are responsible for specific antigen recognition in the adaptive immune response, and millions of unique TCRs are estimated to be present in each individual. Understanding the structural basis of TCR targeting has implications in vaccine design, autoimmunity, as well as T cell therapies for cancer. Given advances in deep sequencing leading to immune repertoire-level TCR sequence data, fast and accurate modeling methods are needed to elucidate shared and unique 3D structural features of these molecules which lead to their antigen targeting and cross-reactivity. We developed a new algorithm in the program Rosetta to model TCRs from sequence, and implemented this functionality in a web server, TCRmodel. This web server provides an easy to use interface, and models are generated quickly that users can investigate in the browser and download. Benchmarking of this method using a set of nonredundant recently released TCR crystal structures shows that models are accurate and compare favorably to models from another available modeling method. This server enables the community to obtain insights into TCRs of interest, and can be combined with methods to model and design TCR recognition of antigens. The TCRmodel server is available at: http://tcrmodel.ibbr.umd.edu/.
In vivo generation of DNA sequence diversity for cellular barcoding
Peikon, Ian D.; Gizatullina, Diana I.; Zador, Anthony M.
2014-01-01
Heterogeneity is a ubiquitous feature of biological systems. A complete understanding of such systems requires a method for uniquely identifying and tracking individual components and their interactions with each other. We have developed a novel method of uniquely tagging individual cells in vivo with a genetic ‘barcode’ that can be recovered by DNA sequencing. Our method is a two-component system comprised of a genetic barcode cassette whose fragments are shuffled by Rci, a site-specific DNA invertase. The system is highly scalable, with the potential to generate theoretical diversities in the billions. We demonstrate the feasibility of this technique in Escherichia coli. Currently, this method could be employed to track the dynamics of populations of microbes through various bottlenecks. Advances of this method should prove useful in tracking interactions of cells within a network, and/or heterogeneity within complex biological samples. PMID:25013177
New mutations in the NHS gene in Nance-Horan Syndrome families from the Netherlands.
Florijn, Ralph J; Loves, Willem; Maillette de Buy Wenniger-Prick, Liesbeth J J M; Mannens, Marcel M A M; Tijmes, Nel; Brooks, Simon P; Hardcastle, Alison J; Bergen, Arthur A B
2006-09-01
Mutations in the NHS gene cause Nance-Horan Syndrome (NHS), a rare X-chromosomal recessive disorder with variable features, including congenital cataract, microphthalmia, a peculiar form of the ear and dental anomalies. We investigated the NHS gene in four additional families with NHS from the Netherlands, by dHPLC and direct sequencing. We identified an unique mutation in each family. Three out of these four mutations were not reported before. We report here the first splice site sequence alteration mutation and three protein truncating mutations. Our results suggest that X-linked cataract and NHS are allelic disorders.
Kawasaki, Junna; Kawamura, Maki; Ohsato, Yoshiharu; Ito, Jumpei; Nishigaki, Kazuo
2017-10-15
Recombination events induce significant genetic changes, and this process can result in virus genetic diversity or in the generation of novel pathogenicity. We discovered a new recombinant feline leukemia virus (FeLV) gag gene harboring an unrelated insertion, termed the X region, which was derived from Felis catus endogenous gammaretrovirus 4 (FcERV-gamma4). The identified FcERV-gamma4 proviruses have lost their coding capabilities, but some can express their viral RNA in feline tissues. Although the X-region-carrying recombinant FeLVs appeared to be replication-defective viruses, they were detected in 6.4% of tested FeLV-infected cats. All isolated recombinant FeLV clones commonly incorporated a middle part of the FcERV-gamma4 5'-leader region as an X region. Surprisingly, a sequence corresponding to the portion contained in all X regions is also present in at least 13 endogenous retroviruses (ERVs) observed in the cat, human, primate, and pig genomes. We termed this shared genetic feature the commonly shared (CS) sequence. Despite our phylogenetic analysis indicating that all CS-sequence-carrying ERVs are classified as gammaretroviruses, no obvious closeness was revealed among these ERVs. However, the Shannon entropy in the CS sequence was lower than that in other parts of the provirus genome. Notably, the CS sequence of human endogenous retrovirus T had 73.8% similarity with that of FcERV-gamma4, and specific signals were detected in the human genome by Southern blot analysis using a probe for the FcERV-gamma4 CS sequence. Our results provide an interesting evolutionary history for CS-sequence circulation among several distinct ancestral viruses and a novel recombined virus over a prolonged period. IMPORTANCE Recombination among ERVs or modern viral genomes causes a rapid evolution of retroviruses, and this phenomenon can result in the serious situation of viral disease reemergence. We identified a novel recombinant FeLV gag gene that contains an unrelated sequence, termed the X region. This region originated from the 5' leader of FcERV-gamma4, a replication-incompetent feline ERV. Surprisingly, a sequence corresponding to the X region is also present in the 5' portion of other ERVs, including human endogenous retroviruses. Scattered copies of the ERVs carrying the unique genetic feature, here named the commonly shared (CS) sequence, were found in each host genome, suggesting that ancestral viruses may have captured and maintained the CS sequence. More recently, a novel recombinant FeLV hijacked the CS sequence from inactivated FcERV-gamma4 as the X region. Therefore, tracing the CS sequences can provide unique models for not only the modern reservoir of new recombinant viruses but also the genetic features shared among ancient retroviruses. Copyright © 2017 American Society for Microbiology.
Diatom centromeres suggest a mechanism for nuclear DNA acquisition
Diner, Rachel E.; Noddings, Chari M.; Lian, Nathan C.; ...
2017-07-18
Centromeres are essential for cell division and growth in all eukaryotes, and knowledge of their sequence and structure guides the development of artificial chromosomes for functional cellular biology studies. Centromeric proteins are conserved among eukaryotes; however, centromeric DNA sequences are highly variable. We combined forward and reverse genetic approaches with chromatin immunoprecipitation to identify centromeres of the model diatom Phaeodactylum tricornutum. We observed 25 unique centromere sequences typically occurring once per chromosome, a finding that helps to resolve nuclear genome organization and indicates monocentric regional centromeres. Diatom centromere sequences contain low-GC content regions but lack repeats or other conserved sequencemore » features. Native and foreign sequences with similar GC content to P. tricornutum centromeres can maintain episomes and recruit the diatom centromeric histone protein CENH3, suggesting nonnative sequences can also function as diatom centromeres. Thus, simple sequence requirements may enable DNA from foreign sources to persist in the nucleus as extrachromosomal episomes, revealing a potential mechanism for organellar and foreign DNA acquisition.« less
Diatom centromeres suggest a mechanism for nuclear DNA acquisition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Diner, Rachel E.; Noddings, Chari M.; Lian, Nathan C.
Centromeres are essential for cell division and growth in all eukaryotes, and knowledge of their sequence and structure guides the development of artificial chromosomes for functional cellular biology studies. Centromeric proteins are conserved among eukaryotes; however, centromeric DNA sequences are highly variable. We combined forward and reverse genetic approaches with chromatin immunoprecipitation to identify centromeres of the model diatom Phaeodactylum tricornutum. We observed 25 unique centromere sequences typically occurring once per chromosome, a finding that helps to resolve nuclear genome organization and indicates monocentric regional centromeres. Diatom centromere sequences contain low-GC content regions but lack repeats or other conserved sequencemore » features. Native and foreign sequences with similar GC content to P. tricornutum centromeres can maintain episomes and recruit the diatom centromeric histone protein CENH3, suggesting nonnative sequences can also function as diatom centromeres. Thus, simple sequence requirements may enable DNA from foreign sources to persist in the nucleus as extrachromosomal episomes, revealing a potential mechanism for organellar and foreign DNA acquisition.« less
Fast Lithium-Ion Transportation in Crystalline Polymer Electrolytes.
Fu, Xiao-Bin; Yang, Guang; Wu, Jin-Ze; Wang, Jia-Chen; Chen, Qun; Yao, Ye-Feng
2018-01-05
Fast lithium-ion transportation is found in the crystalline polymer electrolytes, α-CD-PEO n /Li + (n=12, 40), prepared by self-assembly of α-cyclodextrin (CD), polyethylene oxide (PEO) and Li + salts. A detailed solid-state NMR study combined with the X-ray diffraction technique reveals the unique structural features of the samples, that is, a) the tunnel structure formed by the assembled CDs, providing the ordered long-range pathway for Li + ion transportation; b) the all-trans conformational sequence of the PEO chains in the tunnels, attenuating significantly the coordination between Li + and the EO segments. The origin of the fast lithium-ion transportation has been attributed to these unique structural features. This work demonstrates the first example in solid polymer electrolytes (SPEs) for "creating" fast ion transportation through material design and will find potential applications in the design of new ionconducting SPE materials. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
ClusterMine360: a database of microbial PKS/NRPS biosynthesis
Conway, Kyle R.; Boddy, Christopher N.
2013-01-01
ClusterMine360 (http://www.clustermine360.ca/) is a database of microbial polyketide and non-ribosomal peptide gene clusters. It takes advantage of crowd-sourcing by allowing members of the community to make contributions while automation is used to help achieve high data consistency and quality. The database currently has >200 gene clusters from >185 compound families. It also features a unique sequence repository containing >10 000 polyketide synthase/non-ribosomal peptide synthetase domains. The sequences are filterable and downloadable as individual or multiple sequence FASTA files. We are confident that this database will be a useful resource for members of the polyketide synthases/non-ribosomal peptide synthetases research community, enabling them to keep up with the growing number of sequenced gene clusters and rapidly mine these clusters for functional information. PMID:23104377
Spink, N; Brown, D G; Skelly, J V; Neidle, S
1994-01-01
The bis-benzimidazole drug Hoechst 33258 has been co-crystallized with the dodecanucleotide sequence d(CGCAAATTTGCG)2. The structure has been solved by molecular replacement and refined to an R factor of 18.5% for 2125 reflections collected on a Xentronics area detector. The drug is bound in the minor groove, at the five base-pair site 5'-ATTTG and is in a unique orientation. This is displaced by one base pair in the 5' direction compared to previously-determined structures of this drug with the sequence d(CGCGAATTCGCG)2. Reasons for this difference in behaviour are discussed in terms of several sequence-dependent structural features of the DNA, with particular reference to differences in propeller twist and minor-groove width. Images PMID:7515488
Automatic seed selection for segmentation of liver cirrhosis in laparoscopic sequences
NASA Astrophysics Data System (ADS)
Sinha, Rahul; Marcinczak, Jan Marek; Grigat, Rolf-Rainer
2014-03-01
For computer aided diagnosis based on laparoscopic sequences, image segmentation is one of the basic steps which define the success of all further processing. However, many image segmentation algorithms require prior knowledge which is given by interaction with the clinician. We propose an automatic seed selection algorithm for segmentation of liver cirrhosis in laparoscopic sequences which assigns each pixel a probability of being cirrhotic liver tissue or background tissue. Our approach is based on a trained classifier using SIFT and RGB features with PCA. Due to the unique illumination conditions in laparoscopic sequences of the liver, a very low dimensional feature space can be used for classification via logistic regression. The methodology is evaluated on 718 cirrhotic liver and background patches that are taken from laparoscopic sequences of 7 patients. Using a linear classifier we achieve a precision of 91% in a leave-one-patient-out cross-validation. Furthermore, we demonstrate that with logistic probability estimates, seeds with high certainty of being cirrhotic liver tissue can be obtained. For example, our precision of liver seeds increases to 98.5% if only seeds with more than 95% probability of being liver are used. Finally, these automatically selected seeds can be used as priors in Graph Cuts which is demonstrated in this paper.
Formal Synthesis of (±)-Aplykurodinone-1 through a Hetero-Pauson-Khand Cycloaddition Approach.
Tao, Cheng; Zhang, Jing; Chen, Xiaoming; Wang, Huifei; Li, Yun; Cheng, Bin; Zhai, Hongbin
2017-03-03
The tricyclic intermediate 2 has been synthesized in eight steps from known compound 6 in 20% overall yield. As such, this constitutes a highly efficient formal synthesis of (±)-aplykurodinone-1. This synthesis features a unique, one-pot, intramolecular hetero-Pauson-Khand reaction (h-PKR)/desilylation sequence to expeditiously construct the tricyclic framework, providing valuable insights for expanding the scope and boundaries of h-PKR.
Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; Carmona e Ferreira, Renata; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana
2013-12-01
We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3'-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment.
Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana
2013-01-01
We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904
King, Justin J.; Amemiya, Chris T.; Hsu, Ellen
2017-01-01
ABSTRACT Activation-induced cytidine deaminase (AID) is a genome-mutating enzyme that initiates class switch recombination and somatic hypermutation of antibodies in jawed vertebrates. We previously described the biochemical properties of human AID and found that it is an unusual enzyme in that it exhibits binding affinities for its substrate DNA and catalytic rates several orders of magnitude higher and lower, respectively, than a typical enzyme. Recently, we solved the functional structure of AID and demonstrated that these properties are due to nonspecific DNA binding on its surface, along with a catalytic pocket that predominantly assumes a closed conformation. Here we investigated the biochemical properties of AID from a sea lamprey, nurse shark, tetraodon, and coelacanth: representative species chosen because their lineages diverged at the earliest critical junctures in evolution of adaptive immunity. We found that these earliest-diverged AID orthologs are active cytidine deaminases that exhibit unique substrate specificities and thermosensitivities. Significant amino acid sequence divergence among these AID orthologs is predicted to manifest as notable structural differences. However, despite major differences in sequence specificities, thermosensitivities, and structural features, all orthologs share the unusually high DNA binding affinities and low catalytic rates. This absolute conservation is evidence for biological significance of these unique biochemical properties. PMID:28716949
Mission Management Computer and Sequencing Hardware for RLV-TD HEX-01 Mission
NASA Astrophysics Data System (ADS)
Gupta, Sukrat; Raj, Remya; Mathew, Asha Mary; Koshy, Anna Priya; Paramasivam, R.; Mookiah, T.
2017-12-01
Reusable Launch Vehicle-Technology Demonstrator Hypersonic Experiment (RLV-TD HEX-01) mission posed some unique challenges in the design and development of avionics hardware. This work presents the details of mission critical avionics hardware mainly Mission Management Computer (MMC) and sequencing hardware. The Navigation, Guidance and Control (NGC) chain for RLV-TD is dual redundant with cross-strapped Remote Terminals (RTs) interfaced through MIL-STD-1553B bus. MMC is Bus Controller on the 1553 bus, which does the function of GPS aided navigation, guidance, digital autopilot and sequencing for the RLV-TD launch vehicle in different periodicities (10, 20, 500 ms). Digital autopilot execution in MMC with a periodicity of 10 ms (in ascent phase) is introduced for the first time and successfully demonstrated in the flight. MMC is built around Intel i960 processor and has inbuilt fault tolerance features like ECC for memories. Fault Detection and Isolation schemes are implemented to isolate the failed MMC. The sequencing hardware comprises Stage Processing System (SPS) and Command Execution Module (CEM). SPS is `RT' on the 1553 bus which receives the sequencing and control related commands from MMCs and posts to downstream modules after proper error handling for final execution. SPS is designed as a high reliability system by incorporating various fault tolerance and fault detection features. CEM is a relay based module for sequence command execution.
Kawagoshi, Taiki; Nishida, Chizuko; Ota, Hidetoshi; Kumazawa, Yoshinori; Endo, Hideki; Matsuda, Yoichi
2008-01-01
Crocodilians have several unique karyotypic features, such as small diploid chromosome numbers (30-42) and the absence of dot-shaped microchromosomes. Of the extant crocodilian species, the Siamese crocodile (Crocodylus siamensis) has no more than 2n = 30, comprising mostly bi-armed chromosomes with large centromeric heterochromatin blocks. To investigate the molecular structures of C-heterochromatin and genomic compartmentalization in the karyotype, characterized by the disappearance of tiny microchromosomes and reduced chromosome number, we performed molecular cloning of centromeric repetitive sequences and chromosome mapping of the 18S-28S rDNA and telomeric (TTAGGG)( n ) sequences. The centromeric heterochromatin was composed mainly of two repetitive sequence families whose characteristics were quite different. Two types of GC-rich CSI-HindIII family sequences, the 305 bp CSI-HindIII-S (G+C content, 61.3%) and 424 bp CSI-HindIII-M (63.1%), were localized to the intensely PI-stained centric regions of all chromosomes, except for chromosome 2 with PI-negative heterochromatin. The 94 bp CSI-DraI (G+C content, 48.9%) was tandem-arrayed satellite DNA and localized to chromosome 2 and four pairs of small-sized chromosomes. The chromosomal size-dependent genomic compartmentalization that is supposedly unique to the Archosauromorpha was probably lost in the crocodilian lineage with the disappearance of microchromosomes followed by the homogenization of centromeric repetitive sequences between chromosomes, except for chromosome 2.
Phylogenetic relations of humans and African apes from DNA sequences in the Psi eta-globin region
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miyamoto, M.M.; Slightom, J.L.; Goodman, M.
Sequences from the upstream and downstream flanking DNA regions of the Psi eta-globin locus in Pan troglodytes (common chimpanzee), Gorilla gorilla (gorilla), and Pongo pygmaeus (orangutan, the closest living relative to Homo, Pan, and Gorilla) provided further data for evaluating the phylogenetic relations of humans and African apes. These newly sequenced orthologs (an additional 4.9 kilobase pairs (kbp) for each species) were combined with published Psi eta-gene sequences and then compared to the same orthologous stretch (a continuous 7.1-kbp region) available for humans. Phylogenetic analysis of these nucleotide sequences by the parsimony method indicated (i) that human and chimpanzee aremore » more closely related to each other than either is to gorilla and (ii) that the slowdown in the rate of sequence evolution evident in higher primates is especially pronounced in humans. These results indicate that features unique to African apes (but not to humans) are primitive and that even local molecular clocks should be applied with caution.« less
Reconciling the Structural Attributes of Avian Antibodies*
Conroy, Paul J.; Law, Ruby H. P.; Gilgunn, Sarah; Hearty, Stephen; Caradoc-Davies, Tom T.; Lloyd, Gordon; O'Kennedy, Richard J.; Whisstock, James C.
2014-01-01
Antibodies are high value therapeutic, diagnostic, biotechnological, and research tools. Combinatorial approaches to antibody discovery have facilitated access to unique antibodies by surpassing the diversity limitations of the natural repertoire, exploitation of immune repertoires from multiple species, and tailoring selections to isolate antibodies with desirable biophysical attributes. The V-gene repertoire of the chicken does not utilize highly diverse sequence and structures, which is in stark contrast to the mechanism employed by humans, mice, and primates. Recent exploitation of the avian immune system has generated high quality, high affinity antibodies to a wide range of antigens for a number of therapeutic, diagnostic and biotechnological applications. Furthermore, extensive examination of the amino acid characteristics of the chicken repertoire has provided significant insight into mechanisms employed by the avian immune system. A paucity of avian antibody crystal structures has limited our understanding of the structural consequences of these uniquely chicken features. This paper presents the crystal structure of two chicken single chain fragment variable (scFv) antibodies generated from large libraries by phage display against important human antigen targets, which capture two unique CDRL1 canonical classes in the presence and absence of a non-canonical disulfide constrained CDRH3. These structures cast light on the unique structural features of chicken antibodies and contribute further to our collective understanding of the unique mechanisms of diversity and biochemical attributes that render the chicken repertoire of particular value for antibody generation. PMID:24737329
Greiner, Stephan; Wang, Xi; Rauwolf, Uwe; Silber, Martina V; Mayer, Klaus; Meurer, Jörg; Haberer, Georg; Herrmann, Reinhold G
2008-04-01
The flowering plant genus Oenothera is uniquely suited for studying molecular mechanisms of speciation. It assembles an intriguing combination of genetic features, including permanent translocation heterozygosity, biparental transmission of plastids, and a general interfertility of well-defined species. This allows an exchange of plastids and nuclei between species often resulting in plastome-genome incompatibility. For evaluation of its molecular determinants we present the complete nucleotide sequences of the five basic, genetically distinguishable plastid chromosomes of subsection Oenothera (=Euoenothera) of the genus, which are associated in distinct combinations with six basic genomes. Sizes of the chromosomes range from 163 365 bp (plastome IV) to 165 728 bp (plastome I), display between 96.3% and 98.6% sequence similarity and encode a total of 113 unique genes. Plastome diversification is caused by an abundance of nucleotide substitutions, small insertions, deletions and repetitions. The five plastomes deviate from the general ancestral design of plastid chromosomes of vascular plants by a subsection-specific 56 kb inversion within the large single-copy segment. This inversion disrupted operon structures and predates the divergence of the subsection presumably 1 My ago. Phylogenetic relationships suggest plastomes I-III in one clade, while plastome IV appears to be closest to the common ancestor.
Boscaro, Vittorio; Fokin, Sergei I; Schrallhammer, Martina; Schweikert, Michael; Petroni, Giulio
2013-01-01
The genus Holospora (Rickettsiales) includes highly infectious nuclear symbionts of the ciliate Paramecium with unique morphology and life cycle. To date, nine species have been described, but a molecular characterization is lacking for most of them. In this study, we have characterized a novel Holospora-like bacterium (HLB) living in the macronuclei of a Paramecium jenningsi population. This bacterium was morphologically and ultrastructurally investigated in detail, and its life cycle and infection capabilities were described. We also obtained its 16S rRNA gene sequence and developed a specific probe for fluorescence in situ hybridization experiments. A new taxon, "Candidatus Gortzia infectiva", was established for this HLB according to its unique characteristics and the relatively low DNA sequence similarities shared with other bacteria. The phylogeny of the order Rickettsiales based on 16S rRNA gene sequences has been inferred, adding to the available data the sequence of the novel bacterium and those of two Holospora species (Holospora obtusa and Holospora undulata) characterized for the purpose. Our phylogenetic analysis provided molecular support for the monophyly of HLBs and showed a possible pattern of evolution for some of their features. We suggested to classify inside the family Holosporaceae only HLBs, excluding other more distantly related and phenotypically different Paramecium endosymbionts.
Greiner, Stephan; Wang, Xi; Rauwolf, Uwe; Silber, Martina V.; Mayer, Klaus; Meurer, Jörg; Haberer, Georg; Herrmann, Reinhold G.
2008-01-01
The flowering plant genus Oenothera is uniquely suited for studying molecular mechanisms of speciation. It assembles an intriguing combination of genetic features, including permanent translocation heterozygosity, biparental transmission of plastids, and a general interfertility of well-defined species. This allows an exchange of plastids and nuclei between species often resulting in plastome–genome incompatibility. For evaluation of its molecular determinants we present the complete nucleotide sequences of the five basic, genetically distinguishable plastid chromosomes of subsection Oenothera (=Euoenothera) of the genus, which are associated in distinct combinations with six basic genomes. Sizes of the chromosomes range from 163 365 bp (plastome IV) to 165 728 bp (plastome I), display between 96.3% and 98.6% sequence similarity and encode a total of 113 unique genes. Plastome diversification is caused by an abundance of nucleotide substitutions, small insertions, deletions and repetitions. The five plastomes deviate from the general ancestral design of plastid chromosomes of vascular plants by a subsection-specific 56 kb inversion within the large single-copy segment. This inversion disrupted operon structures and predates the divergence of the subsection presumably 1 My ago. Phylogenetic relationships suggest plastomes I–III in one clade, while plastome IV appears to be closest to the common ancestor. PMID:18299283
NASA Astrophysics Data System (ADS)
Richa, Tambi; Ide, Soichiro; Suzuki, Ryosuke; Ebina, Teppei; Kuroda, Yutaka
2017-02-01
Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely 2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.
MACSIMS : multiple alignment of complete sequences information management system
Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier
2006-01-01
Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
Vander Lugt correlation of DNA sequence data
NASA Astrophysics Data System (ADS)
Christens-Barry, William A.; Hawk, James F.; Martin, James C.
1990-12-01
DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.
Microfluidic droplet enrichment for targeted sequencing
Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.
2015-01-01
Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
Mizianty, Marcin J; Kurgan, Lukasz
2009-12-13
Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
2009-01-01
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388
Das, Debanu; Hervé, Mireille; Feuerhelm, Julie; Farr, Carol L.; Chiu, Hsiu-Ju; Elsliger, Marc-André; Knuth, Mark W.; Klock, Heath E.; Miller, Mitchell D.; Godzik, Adam; Lesley, Scott A.; Deacon, Ashley M.; Mengin-Lecreulx, Dominique; Wilson, Ian A.
2011-01-01
Bacterial cell walls contain peptidoglycan, an essential polymer made by enzymes in the Mur pathway. These proteins are specific to bacteria, which make them targets for drug discovery. MurC, MurD, MurE and MurF catalyze the synthesis of the peptidoglycan precursor UDP-N-acetylmuramoyl-L-alanyl-γ-D-glutamyl-meso-diaminopimelyl-D-alanyl-D-alanine by the sequential addition of amino acids onto UDP-N-acetylmuramic acid (UDP-MurNAc). MurC-F enzymes have been extensively studied by biochemistry and X-ray crystallography. In Gram-negative bacteria, ∼30–60% of the bacterial cell wall is recycled during each generation. Part of this recycling process involves the murein peptide ligase (Mpl), which attaches the breakdown product, the tripeptide L-alanyl-γ-D-glutamyl-meso-diaminopimelate, to UDP-MurNAc. We present the crystal structure at 1.65 Å resolution of a full-length Mpl from the permafrost bacterium Psychrobacter arcticus 273-4 (PaMpl). Although the Mpl structure has similarities to Mur enzymes, it has unique sequence and structure features that are likely related to its role in cell wall recycling, a function that differentiates it from the MurC-F enzymes. We have analyzed the sequence-structure relationships that are unique to Mpl proteins and compared them to MurC-F ligases. We have also characterized the biochemical properties of this enzyme (optimal temperature, pH and magnesium binding profiles and kinetic parameters). Although the structure does not contain any bound substrates, we have identified ∼30 residues that are likely to be important for recognition of the tripeptide and UDP-MurNAc substrates, as well as features that are unique to Psychrobacter Mpl proteins. These results provide the basis for future mutational studies for more extensive function characterization of the Mpl sequence-structure relationships. PMID:21445265
Das, Debanu; Hervé, Mireille; Feuerhelm, Julie; Farr, Carol L; Chiu, Hsiu-Ju; Elsliger, Marc-André; Knuth, Mark W; Klock, Heath E; Miller, Mitchell D; Godzik, Adam; Lesley, Scott A; Deacon, Ashley M; Mengin-Lecreulx, Dominique; Wilson, Ian A
2011-03-18
Bacterial cell walls contain peptidoglycan, an essential polymer made by enzymes in the Mur pathway. These proteins are specific to bacteria, which make them targets for drug discovery. MurC, MurD, MurE and MurF catalyze the synthesis of the peptidoglycan precursor UDP-N-acetylmuramoyl-L-alanyl-γ-D-glutamyl-meso-diaminopimelyl-D-alanyl-D-alanine by the sequential addition of amino acids onto UDP-N-acetylmuramic acid (UDP-MurNAc). MurC-F enzymes have been extensively studied by biochemistry and X-ray crystallography. In gram-negative bacteria, ∼30-60% of the bacterial cell wall is recycled during each generation. Part of this recycling process involves the murein peptide ligase (Mpl), which attaches the breakdown product, the tripeptide L-alanyl-γ-D-glutamyl-meso-diaminopimelate, to UDP-MurNAc. We present the crystal structure at 1.65 Å resolution of a full-length Mpl from the permafrost bacterium Psychrobacter arcticus 273-4 (PaMpl). Although the Mpl structure has similarities to Mur enzymes, it has unique sequence and structure features that are likely related to its role in cell wall recycling, a function that differentiates it from the MurC-F enzymes. We have analyzed the sequence-structure relationships that are unique to Mpl proteins and compared them to MurC-F ligases. We have also characterized the biochemical properties of this enzyme (optimal temperature, pH and magnesium binding profiles and kinetic parameters). Although the structure does not contain any bound substrates, we have identified ∼30 residues that are likely to be important for recognition of the tripeptide and UDP-MurNAc substrates, as well as features that are unique to Psychrobacter Mpl proteins. These results provide the basis for future mutational studies for more extensive function characterization of the Mpl sequence-structure relationships.
Wagner, Josiah T; Herrejon Chavez, Florisela; Podrabsky, Jason E
2016-01-01
The annual killifish Austrofundulus limnaeus inhabits ephemeral ponds in regions of Venezuela, South America. Permanent populations of A. limnaeus are maintained by production of stress-tolerant embryos that are able to persist in the desiccated sediment. Previous work has demonstrated that A. limnaeus have a remarkable ability to tolerate extended periods of anoxia and desiccating conditions. After considering temperature, A. limnaeus embryos have the highest known tolerance to anoxia when compared to any other vertebrate yet studied. Oxygen is completely essential for the process of oxidative phosphorylation by mitochondria, the intracellular organelle responsible for the majority of adenosine triphosphate production. Thus, understanding the unique properties of A. limnaeus mitochondria is of great interest. In this work, we describe the first complete mitochondrial genome (mtgenome) sequence of a single adult A. limnaeus individual and compare both coding and non-coding regions to several other closely related fish mtgenomes. Mitochondrial features were predicted using MitoAnnotator and polyadenylation sites were predicted using RNAseq mapping. To estimate the responsiveness of A. limnaeus mitochondria to anoxia treatment, we measure relative mitochondrial DNA copy number and total citrate synthase activity in both relatively anoxia-tolerant and anoxia-sensitive embryonic stages. Our cross-species comparative approach identifies unique features of ND1, ND5, ND6, and ATPase-6 that may facilitate the unique phenotype of A. limnaeus embryos. Additionally, we do not find evidence for mitochondrial degradation or biogenesis during anoxia/reoxygenation treatment in A. limnaeus embryos, suggesting that anoxia-tolerant mitochondria do not respond to anoxia in a manner similar to anoxia-sensitive mitochondria.
Clustering method for counting passengers getting in a bus with single camera
NASA Astrophysics Data System (ADS)
Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying
2010-03-01
Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.
The genome of Mesobuthus martensii reveals a unique adaptation model of arthropods
Cao, Zhijian; Yu, Yao; Wu, Yingliang; Hao, Pei; Di, Zhiyong; He, Yawen; Chen, Zongyun; Yang, Weishan; Shen, Zhiyong; He, Xiaohua; Sheng, Jia; Xu, Xiaobo; Pan, Bohu; Feng, Jing; Yang, Xiaojuan; Hong, Wei; Zhao, Wenjuan; Li, Zhongjie; Huang, Kai; Li, Tian; Kong, Yimeng; Liu, Hui; Jiang, Dahe; Zhang, Binyan; Hu, Jun; Hu, Youtian; Wang, Bin; Dai, Jianliang; Yuan, Bifeng; Feng, Yuqi; Huang, Wei; Xing, Xiaojing; Zhao, Guoping; Li, Xuan; Li, Yixue; Li, Wenxin
2013-01-01
Representing a basal branch of arachnids, scorpions are known as ‘living fossils’ that maintain an ancient anatomy and are adapted to have survived extreme climate changes. Here we report the genome sequence of Mesobuthus martensii, containing 32,016 protein-coding genes, the most among sequenced arthropods. Although M. martensii appears to evolve conservatively, it has a greater gene family turnover than the insects that have undergone diverse morphological and physiological changes, suggesting the decoupling of the molecular and morphological evolution in scorpions. Underlying the long-term adaptation of scorpions is the expansion of the gene families enriched in basic metabolic pathways, signalling pathways, neurotoxins and cytochrome P450, and the different dynamics of expansion between the shared and the scorpion lineage-specific gene families. Genomic and transcriptomic analyses further illustrate the important genetic features associated with prey, nocturnal behaviour, feeding and detoxification. The M. martensii genome reveals a unique adaptation model of arthropods, offering new insights into the genetic bases of the living fossils. PMID:24129506
Genetic Characterization of Feline Leukemia Virus from Florida Panthers
Brown, Meredith A.; Cunningham, Mark W.; Roca, Alfred L.; Troyer, Jennifer L.; Johnson, Warren E.
2008-01-01
From 2002 through 2005, an outbreak of feline leukemia virus (FeLV) occurred in Florida panthers (Puma concolor coryi). Clinical signs included lymphadenopathy, anemia, septicemia, and weight loss; 5 panthers died. Not associated with FeLV outcome were the genetic heritage of the panthers (pure Florida vs. Texas/Florida crosses) and co-infection with feline immunodeficiency virus. Genetic analysis of panther FeLV, designated FeLV-Pco, determined that the outbreak likely came from 1 cross-species transmission from a domestic cat. The FeLV-Pco virus was closely related to the domestic cat exogenous FeLV-A subgroup in lacking recombinant segments derived from endogenous FeLV. FeLV-Pco sequences were most similar to the well-characterized FeLV-945 strain, which is highly virulent and strongly pathogenic in domestic cats because of unique long terminal repeat and envelope sequences. These unique features may also account for the severity of the outbreak after cross-species transmission to the panther. PMID:18258118
Genetic characterization of feline leukemia virus from Florida panthers.
Brown, Meredith A; Cunningham, Mark W; Roca, Alfred L; Troyer, Jennifer L; Johnson, Warren E; O'Brien, Stephen J
2008-02-01
From 2002 through 2005, an outbreak of feline leukemia virus (FeLV) occurred in Florida panthers (Puma concolor coryi). Clinical signs included lymphadenopathy, anemia, septicemia, and weight loss; 5 panthers died. Not associated with FeLV outcome were the genetic heritage of the panthers (pure Florida vs. Texas/Florida crosses) and co-infection with feline immunodeficiency virus. Genetic analysis of panther FeLV, designated FeLV-Pco, determined that the outbreak likely came from 1 cross-species transmission from a domestic cat. The FeLV-Pco virus was closely related to the domestic cat exogenous FeLV-A subgroup in lacking recombinant segments derived from endogenous FeLV. FeLV-Pco sequences were most similar to the well-characterized FeLV-945 strain, which is highly virulent and strongly pathogenic in domestic cats because of unique long terminal repeat and envelope sequences. These unique features may also account for the severity of the outbreak after cross-species transmission to the panther.
Fritz-Laylin, Lillian K.; Ginger, Michael L.; Walsh, Charles; Dawson, Scott C.; Fulton, Chandler
2016-01-01
Naegleria gruberi, a free-living protist, has long been treasured as a model for basal body and flagellar assembly due to its ability to differentiate from crawling amoebae into swimming flagellates. The full genome sequence of Naegleria gruberi has recently been used to estimate gene families ancestral to all eukaryotes and to identify novel aspects of Naegleria biology, including likely facultative anaerobic metabolism, extensive signaling cascades, and evidence for sexuality. Distinctive features of the Naegleria genome and nuclear biology provide unique perspectives for comparative cell biology, including cell division, RNA processing and nucleolar assembly. We highlight here exciting new and novel aspects of Naegleria biology identified through genomic analysis. PMID:21392573
Chætognath transcriptome reveals ancestral and unique features among bilaterians
Marlétaz, Ferdinand; Gilles, André; Caubit, Xavier; Perez, Yvan; Dossat, Carole; Samain, Sylvie; Gyapay, Gabor; Wincker, Patrick; Le Parco, Yannick
2008-01-01
Background The chætognaths (arrow worms) have puzzled zoologists for years because of their astonishing morphological and developmental characteristics. Despite their deuterostome-like development, phylogenomic studies recently positioned the chætognath phylum in protostomes, most likely in an early branching. This key phylogenetic position and the peculiar characteristics of chætognaths prompted further investigation of their genomic features. Results Transcriptomic and genomic data were collected from the chætognath Spadella cephaloptera through the sequencing of expressed sequence tags and genomic bacterial artificial chromosome clones. Transcript comparisons at various taxonomic scales emphasized the conservation of a core gene set and phylogenomic analysis confirmed the basal position of chætognaths among protostomes. A detailed survey of transcript diversity and individual genotyping revealed a past genome duplication event in the chætognath lineage, which was, surprisingly, followed by a high retention rate of duplicated genes. Moreover, striking genetic heterogeneity was detected within the sampled population at the nuclear and mitochondrial levels but cannot be explained by cryptic speciation. Finally, we found evidence for trans-splicing maturation of transcripts through splice-leader addition in the chætognath phylum and we further report that this processing is associated with operonic transcription. Conclusion These findings reveal both shared ancestral and unique derived characteristics of the chætognath genome, which suggests that this genome is likely the product of a very original evolutionary history. These features promote chætognaths as a pivotal model for comparative genomics, which could provide new clues for the investigation of the evolution of animal genomes. PMID:18533022
Structural features of the rice chromosome 4 centromere.
Zhang, Yu; Huang, Yuchen; Zhang, Lei; Li, Ying; Lu, Tingting; Lu, Yiqi; Feng, Qi; Zhao, Qiang; Cheng, Zhukuan; Xue, Yongbiao; Wing, Rod A; Han, Bin
2004-01-01
A complete sequence of a chromosome centromere is necessary for fully understanding centromere function. We reported the sequence structures of the first complete rice chromosome centromere through sequencing a large insert bacterial artificial chromosome clone-based contig, which covered the rice chromosome 4 centromere. Complete sequencing of the 124-kb rice chromosome 4 centromere revealed that it consisted of 18 tracts of 379 tandemly arrayed repeats known as CentO and a total of 19 centromeric retroelements (CRs) but no unique sequences were detected. Four tracts, composed of 65 CentO repeats, were located in the opposite orientation, and 18 CentO tracts were flanked by 19 retroelements. The CRs were classified into four types, and the type I retroelements appeared to be more specific to rice centromeres. The preferential insert of the CRs among CentO repeats indicated that the centromere-specific retroelements may contribute to centromere expansion during evolution. The presence of three intact retrotransposons in the centromere suggests that they may be responsible for functional centromere initiation through a transcription-mediated mechanism.
Integrated databanks access and sequence/structure analysis services at the PBIL.
Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert
2003-07-01
The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
Bentolila, Stéphane; Stefanov, Stefan
2012-01-01
Plant mitochondrial genomes have features that distinguish them radically from their animal counterparts: a high rate of rearrangement, of uptake and loss of DNA sequences, and an extremely low point mutation rate. Perhaps the most unique structural feature of plant mitochondrial DNAs is the presence of large repeated sequences involved in intramolecular and intermolecular recombination. In addition, rare recombination events can occur across shorter repeats, creating rearrangements that result in aberrant phenotypes, including pollen abortion, which is known as cytoplasmic male sterility (CMS). Using next-generation sequencing, we pyrosequenced two rice (Oryza sativa) mitochondrial genomes that belong to the indica subspecies. One genome is normal, while the other carries the wild abortive-CMS. We find that numerous rearrangements in the rice mitochondrial genome occur even between close cytotypes during rice evolution. Unlike maize (Zea mays), a closely related species also belonging to the grass family, integration of plastid sequences did not play a role in the sequence divergence between rice cytotypes. This study also uncovered an excellent candidate for the wild abortive-CMS-encoding gene; like most of the CMS-associated open reading frames that are known in other species, this candidate was created via a rearrangement, is chimeric in structure, possesses predicted transmembrane domains, and coopted the promoter of a genuine mitochondrial gene. Our data give new insights into rice mitochondrial evolution, correcting previous reports. PMID:22128137
Pérez-Dorado, Inmaculada; Bortolotti, Ana; Cortez, Néstor; Hermoso, Juan A
2013-01-09
Analysis of the crystal structure of NifF from Rhodobacter capsulatus and its homologues reported so far reflects the existence of unique structural features in nif flavodoxins: a leucine at the re face of the isoalloxazine, an eight-residue insertion at the C-terminus of the 50's loop and a remarkable difference in the electrostatic potential surface with respect to non-nif flavodoxins. A phylogenetic study on 64 sequences from 52 bacterial species revealed four clusters, including different functional prototypes, correlating the previously defined as "short-chain" with the firmicutes flavodoxins and the "long-chain" with gram-negative species. The comparison of Rhodobacter NifF structure with other bacterial flavodoxin prototypes discloses the concurrence of specific features of these functional electron donors to nitrogenase.
Siponen, Marina I.; Wisniewska, Magdalena; Lehtiö, Lari; Johansson, Ida; Svensson, Linda; Raszewski, Grzegorz; Nilsson, Lennart; Sigvardsson, Mikael; Berglund, Helena
2010-01-01
The early B-cell factor (EBF) transcription factors are central regulators of development in several organs and tissues. This protein family shows low sequence similarity to other protein families, which is why structural information for the functional domains of these proteins is crucial to understand their biochemical features. We have used a modular approach to determine the crystal structures of the structured domains in the EBF family. The DNA binding domain reveals a striking resemblance to the DNA binding domains of the Rel homology superfamily of transcription factors but contains a unique zinc binding structure, termed zinc knuckle. Further the EBF proteins contain an IPT/TIG domain and an atypical helix-loop-helix domain with a novel type of dimerization motif. The data presented here provide insights into unique structural features of the EBF proteins and open possibilities for detailed molecular investigations of this important transcription factor family. PMID:20592035
A unique TBX5 microdeletion with microinsertion detected in patient with Holt-Oram syndrome.
Morine, Mikio; Kohmoto, Tomohiro; Masuda, Kiyoshi; Inagaki, Hidehito; Watanabe, Miki; Naruto, Takuya; Kurahashi, Hiroki; Maeda, Kazuhisa; Imoto, Issei
2015-12-01
Holt-Oram syndrome (HOS) is an autosomal dominant condition characterized by upper limb and congenital heart defects and caused by numerous germline mutations of TBX5 producing preterminal stop codons. Here, we report on a novel and unusual heterozygous TBX5 microdeletion with microinsertion (microindel) mutation (c.627delinsGTGACTCAGGAAACGCTTTCCTGA), which is predicted to synthesize a truncated TBX5 protein, detected in a sporadic patient with clinical features of HOS prenatally diagnosed by ultrasonography. This uncommon and relatively large inserted sequence contains sequences derived from nearby but not adjacent templates on both sense and antisense strands, suggesting two possible models, which require no repeat sequences, causing this complex microindel through the bypass of large DNA adducts via an error-prone DNA polymerase-mediated translesion synthesis. © 2015 Wiley Periodicals, Inc.
Evolution of the arginase fold and functional diversity
Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.
2009-01-01
The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740
Ishiguro, Naotaka; Inoshima, Yasuo; Yanai, Tokuma; Sasaki, Motoki; Matsui, Akira; Kikuchi, Hiroki; Maruyama, Masashi; Hongo, Hitomi; Vostretsov, Yuri E; Gasilin, Viatcheslav; Kosintsev, Pavel A; Quanjia, Chen; Chunxue, Wang
2016-02-01
The mitochondrial DNA (mtDNA) control region (198- to 598-bp) of four ancient Canis specimens (two Canis mandibles, a cranium, and a first phalanx) was examined, and each specimen was genetically identified as Japanese wolf. Two unique nucleotide substitutions, the 78-C insertion and the 482-G deletion, both of which are specific for Japanese wolf, were observed in each sample. Based on the mtDNA sequences analyzed, these four specimens and 10 additional Japanese wolf samples could be classified into two groups- Group A (10 samples) and Group B (4 samples)-which contain or lack an 8-bp insertion/deletion (indel), respectively. Interestingly, three dogs (Akita-b, Kishu 25, and S-husky 102) that each contained Japanese wolf-specific features were also classified into Group A or B based on the 8-bp indel. To determine the origin or ancestor of the Japanese wolf, mtDNA control regions of ancient continental Canis specimens were examined; 84 specimens were from Russia, and 29 were from China. However, none of these 113 specimens contained Japanese wolf-specific sequences. Moreover, none of 426 Japanese modern hunting dogs examined contained these Japanese wolf-specific mtDNA sequences. The mtDNA control region sequences of Groups A and B appeared to be unique to grey wolf and dog populations.
A Sieving ANN for Emotion-Based Movie Clip Classification
NASA Astrophysics Data System (ADS)
Watanapa, Saowaluk C.; Thipakorn, Bundit; Charoenkitkarn, Nipon
Effective classification and analysis of semantic contents are very important for the content-based indexing and retrieval of video database. Our research attempts to classify movie clips into three groups of commonly elicited emotions, namely excitement, joy and sadness, based on a set of abstract-level semantic features extracted from the film sequence. In particular, these features consist of six visual and audio measures grounded on the artistic film theories. A unique sieving-structured neural network is proposed to be the classifying model due to its robustness. The performance of the proposed model is tested with 101 movie clips excerpted from 24 award-winning and well-known Hollywood feature films. The experimental result of 97.8% correct classification rate, measured against the collected human-judges, indicates the great potential of using abstract-level semantic features as an engineered tool for the application of video-content retrieval/indexing.
Conserved and variable domains of RNase MRP RNA.
Dávila López, Marcela; Rosenblad, Magnus Alm; Samuelsson, Tore
2009-01-01
Ribonuclease MRP is a eukaryotic ribonucleoprotein complex consisting of one RNA molecule and 7-10 protein subunits. One important function of MRP is to catalyze an endonucleolytic cleavage during processing of rRNA precursors. RNase MRP is evolutionary related to RNase P which is critical for tRNA processing. A large number of MRP RNA sequences that now are available have been used to identify conserved primary and secondary structure features of the molecule. MRP RNA has structural features in common with P RNA such as a conserved catalytic core, but it also has unique features and is characterized by a domain highly variable between species. Information regarding primary and secondary structure features is of interest not only in basic studies of the function of MRP RNA, but also because mutations in the RNA give rise to human genetic diseases such as cartilage-hair hypoplasia.
enoLOGOS: a versatile web tool for energy normalized sequence logos
Workman, Christopher T.; Yin, Yutong; Corcoran, David L.; Ideker, Trey; Stormo, Gary D.; Benos, Panayiotis V.
2005-01-01
enoLOGOS is a web-based tool that generates sequence logos from various input sources. Sequence logos have become a popular way to graphically represent DNA and amino acid sequence patterns from a set of aligned sequences. Each position of the alignment is represented by a column of stacked symbols with its total height reflecting the information content in this position. Currently, the available web servers are able to create logo images from a set of aligned sequences, but none of them generates weighted sequence logos directly from energy measurements or other sources. With the advent of high-throughput technologies for estimating the contact energy of different DNA sequences, tools that can create logos directly from binding affinity data are useful to researchers. enoLOGOS generates sequence logos from a variety of input data, including energy measurements, probability matrices, alignment matrices, count matrices and aligned sequences. Furthermore, enoLOGOS can represent the mutual information of different positions of the consensus sequence, a unique feature of this tool. Another web interface for our software, C2H2-enoLOGOS, generates logos for the DNA-binding preferences of the C2H2 zinc-finger transcription factor family members. enoLOGOS and C2H2-enoLOGOS are accessible over the web at . PMID:15980495
NASA Technical Reports Server (NTRS)
Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; Detweiler, Angela M.; Lee, Jackson Zan; Woebken, Dagmar; Bebout, Leslie E.; Pett-Ridge, Jennifer
2016-01-01
The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. One striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. Additionally, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.
Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; ...
2016-08-24
The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. Onemore » striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. In addition, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.« less
Issotta, Francisco; Galleguillos, Pedro A; Moya-Beltrán, Ana; Davis-Belmar, Carol S; Rautenbach, George; Covarrubias, Paulo C; Acosta, Mauricio; Ossandon, Francisco J; Contador, Yasna; Holmes, David S; Marín-Eliantonio, Sabrina; Quatrini, Raquel; Demergasso, Cecilia
2016-01-01
Leptospirillum ferriphilum Sp-Cl is a Gram negative, thermotolerant, curved, rod-shaped bacterium, isolated from an industrial bioleaching operation in northern Chile, where chalcocite is the major copper mineral and copper hydroxychloride atacamite is present in variable proportions in the ore. This strain has unique features as compared to the other members of the species, namely resistance to elevated concentrations of chloride, sulfate and metals. Basic microbiological features and genomic properties of this biotechnologically relevant strain are described in this work. The 2,475,669 bp draft genome is arranged into 74 scaffolds of 74 contigs. A total of 48 RNA genes and 2,834 protein coding genes were predicted from its annotation; 55 % of these were assigned a putative function. Release of the genome sequence of this strain will provide further understanding of the mechanisms used by acidophilic bacteria to endure high osmotic stress and high chloride levels and of the role of chloride-tolerant iron-oxidizers in industrial bioleaching operations.
Zhalnina, Kateryna V.; Dias, Raquel; Leonard, Michael T.; Dorr de Quadros, Patricia; Camargo, Flavio A. O.; Drew, Jennifer C.; Farmerie, William G.; Daroub, Samira H.; Triplett, Eric W.
2014-01-01
The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group. PMID:24999826
Willams, A Mark; Hodges, Nicola J; North, Jamie S; Barton, Gabor
2006-01-01
The perceptual-cognitive information used to support pattern-recognition skill in soccer was examined. In experiment 1, skilled players were quicker and more accurate than less-skilled players at recognising familiar and unfamiliar soccer action sequences presented on film. In experiment 2, these action sequences were converted into point-light displays, with superficial display features removed and the positions of players and the relational information between them made more salient. Skilled players were more accurate than less-skilled players in recognising sequences presented in point-light form, implying that each pattern of play can be defined by the unique relations between players. In experiment 3, various offensive and defensive players were occluded for the duration of each trial in an attempt to identify the most important sources of information underpinning successful performance. A decrease in response accuracy was observed under occluded compared with non-occluded conditions and the expertise effect was no longer observed. The relational information between certain key players, team-mates and their defensive counterparts may provide the essential information for effective pattern-recognition skill in soccer. Structural feature analysis, temporal phase relations, and knowledge-based information are effectively integrated to facilitate pattern recognition in dynamic sport tasks.
Miller, Eric S.; Heidelberg, John F.; Eisen, Jonathan A.; Nelson, William C.; Durkin, A. Scott; Ciecko, Ann; Feldblyum, Tamara V.; White, Owen; Paulsen, Ian T.; Nierman, William C.; Lee, Jong; Szczypinski, Bridget; Fraser, Claire M.
2003-01-01
The complete genome sequence of the T4-like, broad-host-range vibriophage KVP40 has been determined. The genome sequence is 244,835 bp, with an overall G+C content of 42.6%. It encodes 386 putative protein-encoding open reading frames (CDSs), 30 tRNAs, 33 T4-like late promoters, and 57 potential rho-independent terminators. Overall, 92.1% of the KVP40 genome is coding, with an average CDS size of 587 bp. While 65% of the CDSs were unique to KVP40 and had no known function, the genome sequence and organization show specific regions of extensive conservation with phage T4. At least 99 KVP40 CDSs have homologs in the T4 genome (Blast alignments of 45 to 68% amino acid similarity). The shared CDSs represent 36% of all T4 CDSs but only 26% of those from KVP40. There is extensive representation of the DNA replication, recombination, and repair enzymes as well as the viral capsid and tail structural genes. KVP40 lacks several T4 enzymes involved in host DNA degradation, appears not to synthesize the modified cytosine (hydroxymethyl glucose) present in T-even phages, and lacks group I introns. KVP40 likely utilizes the T4-type sigma-55 late transcription apparatus, but features of early- or middle-mode transcription were not identified. There are 26 CDSs that have no viral homolog, and many did not necessarily originate from Vibrio spp., suggesting an even broader host range for KVP40. From these latter CDSs, an NAD salvage pathway was inferred that appears to be unique among bacteriophages. Features of the KVP40 genome that distinguish it from T4 are presented, as well as those, such as the replication and virion gene clusters, that are substantially conserved. PMID:12923095
Kanhayuwa, Lakkhana; Coutts, Robert H. A.
2016-01-01
Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4–14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140–493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3’-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50–65% and 60–75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259–343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity. PMID:27736869
Kanhayuwa, Lakkhana; Coutts, Robert H A
2016-01-01
Novel families of short interspersed nuclear element (SINE) sequences in the human pathogenic fungus Aspergillus fumigatus, clinical isolate Af293, were identified and categorised into tRNA-related and 5S rRNA-related SINEs. Eight predicted tRNA-related SINE families originating from different tRNAs, and nominated as AfuSINE2 sequences, contained target site duplications of short direct repeat sequences (4-14 bp) flanking the elements, an extended tRNA-unrelated region and typical features of RNA polymerase III promoter sequences. The elements ranged in size from 140-493 bp and were present in low copy number in the genome and five out of eight were actively transcribed. One putative tRNAArg-derived sequence, AfuSINE2-1a possessed a unique feature of repeated trinucleotide ACT residues at its 3'-terminus. This element was similar in sequence to the I-4_AO element found in A. oryzae and an I-1_AF long nuclear interspersed element-like sequence identified in A. fumigatus Af293. Families of 5S rRNA-related SINE sequences, nominated as AfuSINE3, were also identified and their 5'-5S rRNA-related regions show 50-65% and 60-75% similarity to respectively A. fumigatus 5S rRNAs and SINE3-1_AO found in A. oryzae. A. fumigatus Af293 contains five copies of AfuSINE3 sequences ranging in size from 259-343 bp and two out of five AfuSINE3 sequences were actively transcribed. Investigations on AfuSINE distribution in the fungal genome revealed that the elements are enriched in pericentromeric and subtelomeric regions and inserted within gene-rich regions. We also demonstrated that some, but not all, AfuSINE sequences are targeted by host RNA silencing mechanisms. Finally, we demonstrated that infection of the fungus with mycoviruses had no apparent effects on SINE activity.
Tulman, E. R.; Delhon, G.; Afonso, C. L.; Lu, Z.; Zsak, L.; Sandybaev, N. T.; Kerembekova, U. Z.; Zaitsev, V. L.; Kutish, G. F.; Rock, D. L.
2006-01-01
Here we present the genomic sequence of horsepox virus (HSPV) isolate MNR-76, an orthopoxvirus (OPV) isolated in 1976 from diseased Mongolian horses. The 212-kbp genome contained 7.5-kbp inverted terminal repeats and lacked extensive terminal tandem repetition. HSPV contained 236 open reading frames (ORFs) with similarity to those in other OPVs, with those in the central 100-kbp region most conserved relative to other OPVs. Phylogenetic analysis of the conserved region indicated that HSPV is closely related to sequenced isolates of vaccinia virus (VACV) and rabbitpox virus, clearly grouping together these VACV-like viruses. Fifty-four HSPV ORFs likely represented fragments of 25 orthologous OPV genes, including in the central region the only known fragmented form of an OPV ribonucleotide reductase large subunit gene. In terminal genomic regions, HSPV lacked full-length homologues of genes variably fragmented in other VACV-like viruses but was unique in fragmentation of the homologue of VACV strain Copenhagen B6R, a gene intact in other known VACV-like viruses. Notably, HSPV contained in terminal genomic regions 17 kbp of OPV-like sequence absent in known VACV-like viruses, including fragments of genes intact in other OPVs and approximately 1.4 kb of sequence present only in cowpox virus (CPXV). HSPV also contained seven full-length genes fragmented or missing in other VACV-like viruses, including intact homologues of the CPXV strain GRI-90 D2L/I4R CrmB and D13L CD30-like tumor necrosis factor receptors, D3L/I3R and C1L ankyrin repeat proteins, B19R kelch-like protein, D7L BTB/POZ domain protein, and B22R variola virus B22R-like protein. These results indicated that HSPV contains unique genomic features likely contributing to a unique virulence/host range phenotype. They also indicated that while closely related to known VACV-like viruses, HSPV contains additional, potentially ancestral sequences absent in other VACV-like viruses. PMID:16940536
Lu, You; Ishimaru, Carol A; Glazebrook, Jane; Samac, Deborah A
2018-02-01
Clavibacter michiganensis is the most economically important gram-positive bacterial plant pathogen, with subspecies that cause serious diseases of maize, wheat, tomato, potato, and alfalfa. Much less is known about pathogenesis involving gram-positive plant pathogens than is known for gram-negative bacteria. Comparative genome analyses of C. michiganensis subspecies affecting tomato, potato, and maize have provided insights on pathogenicity. In this study, we identified strains of C. michiganensis subsp. insidiosus with contrasting pathogenicity on three accessions of the model legume Medicago truncatula. We generated complete genome sequences for two strains and compared these to a previously sequenced strain and genome sequences of four other subspecies. The three C. michiganensis subsp. insidiosus strains varied in gene content due to genome rearrangements, most likely facilitated by insertion elements, and plasmid number, which varied from one to three depending on strain. The core C. michiganensis genome consisted of 1,917 genes, with 379 genes unique to C. michiganensis subsp. insidiosus. An operon for synthesis of the extracellular blue pigment indigoidine, enzymes for pectin degradation, and an operon for inositol metabolism are among the unique features. Secreted serine proteases belonging to both the pat-1 and ppa families were present but highly diverged from those in other subspecies.
Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro
2015-01-01
Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.
Chang, D D; Clayton, D A
1986-01-01
Transcription of the heavy strand of mouse mitochondrial DNA starts from two closely spaced, distinct sites located in the displacement loop region of the genome. We report here an analysis of regulatory sequences required for faithful transcription from these two sites. Data obtained from in vitro assays demonstrated that a 51-base-pair region, encompassing nucleotides -40 to +11 of the downstream start site, contains sufficient information for accurate transcription from both start sites. Deletion of the 3' flanking sequences, including one or both start sites to -17, resulted in the initiation of transcription by the mitochondrial RNA polymerase from alternative sites within vector DNA sequences. This feature places the mouse heavy-strand promoter uniquely among other known mitochondrial promoters, all of which absolutely require cognate start sites for transcription. Comparison of the heavy-strand promoter with those of other vertebrate mitochondrial DNAs revealed a remarkably high rate of sequence divergence among species. Images PMID:3785226
On the structural context and identification of enzyme catalytic residues.
Chien, Yu-Tung; Huang, Shao-Wei
2013-01-01
Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.
Liu, Yue; Huo, Naxin; Dong, Lingli; Wang, Yi; Zhang, Shuixian; Young, Hugh A.; Feng, Xiaoxiao; Gu, Yong Qiang
2013-01-01
Background Artemisia frigida Willd. is an important Mongolian traditional medicinal plant with pharmacological functions of stanch and detumescence. However, there is little sequence and genomic information available for Artemisia frigida, which makes phylogenetic identification, evolutionary studies, and genetic improvement of its value very difficult. We report the complete chloroplast genome sequence of Artemisia frigida based on 454 pyrosequencing. Methodology/Principal Findings The complete chloroplast genome of Artemisia frigida is 151,076 bp including a large single copy (LSC) region of 82,740 bp, a small single copy (SSC) region of 18,394 bp and a pair of inverted repeats (IRs) of 24,971 bp. The genome contains 114 unique genes and 18 duplicated genes. The chloroplast genome of Artemisia frigida contains a small 3.4 kb inversion within a large 23 kb inversion in the LSC region, a unique feature in Asteraceae. The gene order in the SSC region of Artemisia frigida is inverted compared with the other 6 Asteraceae species with the chloroplast genomes sequenced. This inversion is likely caused by an intramolecular recombination event only occurred in Artemisia frigida. The existence of rich SSR loci in the Artemisia frigida chloroplast genome provides a rare opportunity to study population genetics of this Mongolian medicinal plant. Phylogenetic analysis demonstrates a sister relationship between Artemisia frigida and four other species in Asteraceae, including Ageratina adenophora, Helianthus annuus, Guizotia abyssinica and Lactuca sativa, based on 61 protein-coding sequences. Furthermore, Artemisia frigida was placed in the tribe Anthemideae in the subfamily Asteroideae (Asteraceae) based on ndhF and trnL-F sequence comparisons. Conclusion The chloroplast genome sequence of Artemisia frigida was assembled and analyzed in this study, representing the first plastid genome sequenced in the Anthemideae tribe. This complete chloroplast genome sequence will be useful for molecular ecology and molecular phylogeny studies within Artemisia species and also within the Asteraceae family. PMID:23460871
Ultraviolet spectral morphology of the O stars. IV - The OB supergiant sequence
NASA Technical Reports Server (NTRS)
Walborn, Nolan R.; Nichols-Bohlin, Joy
1987-01-01
An atlas of 25 O3-B8 supergiant spectra in the wavelength ranges 1320-1580 A and 1620-1880 A is presented, based on high-resolution data from the IUE archives. The remarkably detailed relationship between the stellar-wind profiles and the optical spectral classifications throughout this sequence is emphasized. For instance, the (Si IV)/(C IV) ratio reverses between O4 and O6.5; and the B0, B0.5, and B0.7 Ia wind characteristics are each qualitatively unique and distinct from one another. The systematic behavior of nine stellar-wind features with ionization potentials ranging from 114 to 19 eV is summarized as a function of advancing spectral type.
Reece, Kimberly S; Scott, Gail P; Dang, Cécile; Dungan, Christopher F
2017-09-01
A monoclonal Perkinsus chesapeaki isolate was established from 1 of 10 infected Australian Anadara trapezia cockles. Morphological features were similar to those of described P. chesapeaki isolates, and also included a unique vermiform schizont cell-type. Perkinsus olseni-specific PCR primers amplified DNAs from all 10 cockles. Perkinsus chesapeaki-specific primers also amplified DNAs from 4/10 cockles, including DNA from the isolate source cockle. Three different sets of DNA sequences from the monoclonal isolate grouped with the homologous, previously deposited, P. chesapeaki sequences in phylogenetic analyses. In situ hybridization assays detected both P. chesapeaki and P. olseni cells in histological sections from the source cockle for monoclonal isolate ATCC PRA-425. Copyright © 2017 Elsevier Inc. All rights reserved.
Complete genome sequence of Dyadobacter fermentans type strain (NS114T)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lang, Elke; Lapidus, Alla; Chertkov, Olga
Dyadobacter fermentans (Chelius MK and Triplett EW, 2000) is the type species of the genus Dyadobacter. It is of phylogenetic interest because of its location in the Cytophagaceae, a very diverse family within the order 'Sphingobacteriales'. D. fermentans has a mainly respiratory metabolism, stains Gram-negative, is non-motile and oxidase and catalase positive. It is characterized by the production of cell filaments in ageing cultures, a flexirubin-like pigment and its ability to ferment glucose, which is almost unique in the aerobically living members of this taxonomically difficult family. Here we describe the features of this organism, together with the complete genomemore » sequence, and annotation. This is the first complete genome sequence of the 'sphingobacterial' genus Dyadobacter, and this 6,967,790 bp long single replicon genome with its 5804 protein-coding and 50 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less
phiGENOME: an integrative navigation throughout bacteriophage genomes.
Stano, Matej; Klucar, Lubos
2011-11-01
phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright © 2011 Elsevier Inc. All rights reserved.
Stayton, M M; Black, M; Bedbrook, J; Dunsmuir, P
1986-12-22
The 16 petunia Cab genes which have been characterized are all closely related at the nucleotide sequence level and they encode Cab precursor polypeptides which are similar in sequence and length. Here we describe a novel petunia Cab gene which encodes a unique Cab precursor protein. This protein is a member of the smallest class of Cab precursor proteins for which no gene has previously been assigned in petunia or any other species. The features of this Cab precursor protein are that it is shorter by 2-3 amino acids than the formerly characterized Cab precursors, its transit peptide sequence is unrelated, and the mature polypeptide is significantly diverged at the functionally important N terminus from other petunia Cab proteins. Gene structure also discriminates this gene which is the only intron containing Cab gene in petunia genomic DNA.
Gut microbial profile analysis by MiSeq sequencing of pancreatic carcinoma patients in China
Xie, Haiyang; Li, Ang; Lu, Haifeng; Xu, Shaoyan; Zhou, Lin; Zhang, Hua; Cui, Guangying; Chen, Xinhua; Liu, Yuanxing; Wu, Liming; Qin, Nan; Sun, Ranran; Wang, Wei; Li, Lanjuan; Wang, Weilin; Zheng, Shusen
2017-01-01
Pancreatic carcinoma (PC) is a lethal cancer. Gut microbiota is associated with some risk factors of PC, e.g. obesity and types II diabetes. However, the specific gut microbial profile in clinical PC in China has never been reported. This prospective study collected 85 PC and 57 matched healthy controls (HC) to analyze microbial characteristics by MiSeq sequencing. The results showed that gut microbial diversity was decreased in PC with an unique microbial profile, which partly attributed to its decrease of alpha diversity. Microbial alterations in PC featured by the increase of certain pathogens and lipopolysaccharides-producing bacteria, and the decrease of probiotics and butyrate-producing bacteria. Microbial community in obstruction cases was separated from the un-obstructed cases. Streptococcus was associated with the bile. Furthermore, 23 microbial functions e.g. Leucine and LPS biosynthesis were enriched, while 13 functions were reduced in PC. Importantly, based on 40 genera associated with PC, microbial markers achieves a high classification power with AUC of 0.842. In conclusion, gut microbial profile was unique in PC, providing a microbial marker for non-invasive PC diagnosis. PMID:29221120
The resurrection genome of Boea hygrometrica: A blueprint for survival of dehydration.
Xiao, Lihong; Yang, Ge; Zhang, Liechi; Yang, Xinhua; Zhao, Shuang; Ji, Zhongzhong; Zhou, Qing; Hu, Min; Wang, Yu; Chen, Ming; Xu, Yu; Jin, Haijing; Xiao, Xuan; Hu, Guipeng; Bao, Fang; Hu, Yong; Wan, Ping; Li, Legong; Deng, Xin; Kuang, Tingyun; Xiang, Chengbin; Zhu, Jian-Kang; Oliver, Melvin J; He, Yikun
2015-05-05
"Drying without dying" is an essential trait in land plant evolution. Unraveling how a unique group of angiosperms, the Resurrection Plants, survive desiccation of their leaves and roots has been hampered by the lack of a foundational genome perspective. Here we report the ∼1,691-Mb sequenced genome of Boea hygrometrica, an important resurrection plant model. The sequence revealed evidence for two historical genome-wide duplication events, a compliment of 49,374 protein-coding genes, 29.15% of which are unique (orphan) to Boea and 20% of which (9,888) significantly respond to desiccation at the transcript level. Expansion of early light-inducible protein (ELIP) and 5S rRNA genes highlights the importance of the protection of the photosynthetic apparatus during drying and the rapid resumption of protein synthesis in the resurrection capability of Boea. Transcriptome analysis reveals extensive alternative splicing of transcripts and a focus on cellular protection strategies. The lack of desiccation tolerance-specific genome organizational features suggests the resurrection phenotype evolved mainly by an alteration in the control of dehydration response genes.
Giessen, Tobias W
2016-10-01
Compartmentalization is one of the defining features of life. Cells use protein compartments to exert spatial control over their metabolism, store nutrients and create unique microenvironments needed for essential physiological processes. Encapsulins are a recently discovered class of protein nanocompartments found in bacteria and archaea that naturally encapsulate cargo proteins. A short C-terminal targeting sequence directs the highly specific encapsulation process in vivo. Here, I will initially discuss the properties, diversity and putative function of encapsulins. The unique characteristics and potential uses of the self-sorting cargo-packaging process found in encapsulin systems will then be highlighted. Examples for the application of encapsulins as cell-specific optical nanoprobes and targeted therapeutic delivery systems will be discussed with an emphasis on the ability to integrate multiple functionalities within a single nanodevice. By fusing targeting sequences to non-native proteins, encapsulins can also be used as specific nanocontainers and enzymatic nanoreactors in vivo. I will end by briefly discussing future avenues for encapsulin research related to both basic microbial metabolism and applications in biomedicine, catalysis and materials science. Copyright © 2016 Elsevier Ltd. All rights reserved.
Diversity and evolution of the primate skin microbiome
Council, Sarah E.; Savage, Amy M.; Urban, Julie M.; Ehlers, Megan E.; Skene, J. H. Pate; Platt, Michael L.; Dunn, Robert R.; Horvath, Julie E.
2016-01-01
Skin microbes play a role in human body odour, health and disease. Compared with gut microbes, we know little about the changes in the composition of skin microbes in response to evolutionary changes in hosts, or more recent behavioural and cultural changes in humans. No studies have used sequence-based approaches to consider the skin microbe communities of gorillas and chimpanzees, for example. Comparison of the microbial associates of non-human primates with those of humans offers unique insights into both the ancient and modern features of our skin-associated microbes. Here we describe the microbes found on the skin of humans, chimpanzees, gorillas, rhesus macaques and baboons. We focus on the bacterial and archaeal residents in the axilla using high-throughput sequencing of the 16S rRNA gene. We find that human skin microbial communities are unique relative to those of other primates, in terms of both their diversity and their composition. These differences appear to reflect both ancient shifts during millions of years of primate evolution and more recent changes due to modern hygiene. PMID:26763711
Pan, Zhangyuan; Li, Shengdi; Liu, Qiuyue; Wang, Zhen; Zhou, Zhengkui; Di, Ran; Miao, Benpeng; Hu, Wenping; Wang, Xiangyu; Hu, Xiaoxiang; Xu, Ze; Wei, Dongkai; He, Xiaoyun; Yuan, Liyun; Guo, Xiaofei; Liang, Benmeng; Wang, Ruichao; Li, Xiaoyu; Cao, Xiaohan; Dong, Xinlong; Xia, Qing; Shi, Hongcai; Hao, Geng; Yang, Jean; Luosang, Cuicheng; Zhao, Yiqiang; Jin, Mei; Zhang, Yingjie; Lv, Shenjin; Li, Fukuan; Ding, Guohui; Chu, Mingxing; Li, Yixue
2018-04-01
Animal domestication has been extensively studied, but the process of feralization remains poorly understood. Here, we performed whole-genome sequencing of 99 sheep and identified a primary genetic divergence between 2 heterogeneous populations in the Tibetan Plateau, including 1 semi-feral lineage. Selective sweep and candidate gene analysis revealed local adaptations of these sheep associated with sensory perception, muscle strength, eating habit, mating process, and aggressive behavior. In particular, a horn-related gene, RXFP2, showed signs of rapid evolution specifically in the semi-feral breeds. A unique haplotype and repressed horn-related tissue expression of RXFP2 were correlated with higher horn length, as well as spiral and horizontally extended horn shape. Semi-feralization has an extensive impact on diverse phenotypic traits of sheep. By acquiring features like those of their wild ancestors, semi-feral sheep were able to regain fitness while in frequent contact with wild surroundings and rare human interventions. This study provides a new insight into the evolution of domestic animals when human interventions are no longer dominant.
Genomics in Cardiovascular Disease
Roberts, Robert; Marian, A.J.; Dandona, Sonny; Stewart, Alexandre F.R.
2013-01-01
A paradigm shift towards biology occurred in the 1990’s subsequently catalyzed by the sequencing of the human genome in 2000. The cost of DNA sequencing has gone from millions to thousands of dollars with sequencing of one’s entire genome costing only $1,000. Rapid DNA sequencing is being embraced for single gene disorders, particularly for sporadic cases and those from small families. Transmission of lethal genes such as associated with Huntington’s disease can, through in-vitro fertilization, avoid passing it on to one’s offspring. DNA sequencing will meet the challenge of elucidating the genetic predisposition for common polygenic diseases, especially in determining the function of the novel common genetic risk variants and identifying the rare variants, which may also partially ascertain the source of the missing heritability. The challenge for DNA sequencing remains great, despite human genome sequences being 99.5% identical, the 3 million single nucleotide polymorphisms (SNPs) responsible for most of the unique features add up to 60 new mutations per person which, for 7 billion people, is 420 billion mutations. It is claimed that DNA sequencing has increased 10,000 fold while information storage and retrieval only 16 fold. The physician and health user will be challenged by the convergence of two major trends, whole genome sequencing and the storage/retrieval and integration of the data. PMID:23524054
Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji
2016-01-01
Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. PMID:26645327
Skeletal development in the African elephant and ossification timing in placental mammals
Hautier, Lionel; Stansfield, Fiona J.; Allen, W. R. Twink; Asher, Robert J.
2012-01-01
We provide here unique data on elephant skeletal ontogeny. We focus on the sequence of cranial and post-cranial ossification events during growth in the African elephant (Loxodonta africana). Previous analyses on ossification sequences in mammals have focused on monotremes, marsupials, boreoeutherian and xenarthran placentals. Here, we add data on ossification sequences in an afrotherian. We use two different methods to quantify sequence heterochrony: the sequence method and event-paring/Parsimov. Compared with other placentals, elephants show late ossifications of the basicranium, manual and pedal phalanges, and early ossifications of the ischium and metacarpals. Moreover, ossification in elephants starts very early and progresses rapidly. Specifically, the elephant exhibits the same percentage of bones showing an ossification centre at the end of the first third of its gestation period as the mouse and hamster have close to birth. Elephants show a number of features of their ossification patterns that differ from those of other placental mammals. The pattern of the initiation of the ossification evident in the African elephant underscores a possible correlation between the timing of ossification onset and gestation time throughout mammals. PMID:22298853
NASA Technical Reports Server (NTRS)
Villanueva, E.; Delihas, N.; Luehrsen, K. R.; Fox, G. E.; Gibson, J.
1985-01-01
The complete nucleotide sequences of 5S ribosomal RNAs from Rhodocyclus gelatinosa, Rhodobacter sphaeroides, and Pseudomonas cepacia were determined. Comparisons of these 5S RNA sequences show that rather than being phylogenetically related to one another, the two photosynthetic bacterial 5S RNAs share more sequence and signature homology with the RNAs of two nonphotosynthetic strains. Rhodobacter sphaeroides is specifically related to Paracoccus denitrificans and Rc. gelatinosa is related to Ps. cepacia. These results support earlier 16S ribosomal RNA studies and add two important groups to the 5S RNA data base. Unique 5S RNA structural features previously found in P. denitrificans are present also in the 5S RNA of Rb. sphaeroides; these provide the basis for subdivisional signatures. The immediate consequence of obtaining these new sequences is that it is possible to clarify the phylogenetic origins of the plant mitochondrion. In particular, a close phylogenetic relationship is found between the plant mitochondria and members of the alpha subdivision of the purple photosynthetic bacteria, namely, Rb. sphaeroides, P. denitrificans, and Rhodospirillum rubrum.
Yin, H; Medstrand, P; Kristofferson, A; Dietrich, U; Aman, P; Blomberg, J
1999-03-30
Previously, we found a retroviral sequence, HML-6.2BC1, to be expressed at high levels in a multifocal ductal breast cancer from a 41-year-old woman who also developed ovarian carcinoma. The sequence of a human genomic clone (HML-6.28) selected by high-stringency hybridization with HML-6.2BC1 is reported here. It was 99% identical to HML-6.2BC1 and gave the same restriction fragments as total DNA. HML-6.28 is a 4.7-kb provirus with a 5'LTR, truncated in RT. Data from two similar genomic clones and sequences found in GenBank are also reported. Overlaps between them gave a rather complete picture of the HML-6.2BC1-like human endogenous retroviral elements. Work with somatic cell hybrids and FISH localized HML-6.28 to chromosome 6, band p21, close to the MHC region. The causal role of HML-6.28 in breast cancer remains unclear. Nevertheless, the ca. 20 Myr old HML-6 sequences enabled the definition of common and unique features of type A, B, and D (ABD) retroviruses. In Gag, HML-6 has no intervening sequences between matrix and capsid proteins, unlike extant exogenous ABD viruses, possibly an ancestral feature. Alignment of the dUTPase showed it to be present in all ABD viruses, but gave a phylogenetic tree different from trees made from other ABD genes, indicating a distinct phylogeny of dUTPase. A conserved 24-mer sequence in the amino terminus of some ABD envelope genes suggested a conserved function. Copyright 1999 Academic Press.
USDA-ARS?s Scientific Manuscript database
The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...
Yu, Yi; Duan, Lian; Zhang, Qi; Liao, Rijing; Ding, Ying; Pan, Haixue; Wendt-Pienkowski, Evelyn; Tang, Gongli; Shen, Ben; Liu, Wen
2009-01-01
Nosiheptide (NOS), belonging to the e series of thiopeptide antibiotics that exhibit potent activity against various bacterial pathogens, bears a unique indole side ring system and regiospecific hydroxyl groups on the characteristic macrocyclic core. Here, cloning, sequencing and characterization of the nos gene cluster from Streptomyces actuosus ATCC 25421 as a model for this series of thiopeptides has unveiled new insights into their biosynthesis. Bioinformatics-based sequence analysis and in vivo investigation into the gene functions show that NOS biosynthesis shares a common strategy with recently characterized b or c series thiopeptides for forming the characteristic macrocyclic core, which features a ribosomally synthesized precursor peptide with conserved posttranslational modifications. However, it apparently proceeds via a different route for tailoring the thiopeptide framework, allowing the final product to exhibit the distinct structural characteristics of e series thiopeptides, such as the indole side ring system. Chemical complementation supports the notion that the S-adenosylmethionine (AdoMet)-dependent protein NosL may play a central role in converting Trp to the key 3-methylindole moiety by an unusual carbon side chain rearrangement, most likely via a radical-initiated mechanism. Characterization of the indole side ring-opened analog of NOS from the nosN mutant strain is consistent with the proposed methyltransferase activity of its encoded protein, shedding light into the timing of the individual steps for indole side ring biosynthesis. These results also suggest the feasibility of engineering novel thiopeptides for drug discovery by manipulating the NOS biosynthetic machinery. PMID:19678698
Qi, L L; Wu, J J; Friebe, B; Qian, C; Gu, Y Q; Fu, D L; Gill, B S
2013-08-01
Brachypodium distachyon is a wild annual grass belonging to the Pooideae, more closely related to wheat, barley, and forage grasses than rice and maize. As an experimental model, the completed genome sequence of B. distachyon provides a unique opportunity to study centromere evolution during the speciation of grasses. Centromeric satellite sequences have been identified in B. distachyon, but little is known about centromeric retrotransposons in this species. In the present study, bacterial artificial chromosome (BAC)-fluorescence in situ hybridization was conducted in maize, rice, barley, wheat, and rye using B. distachyon (Bd) centromere-specific BAC clones. Eight Bd centromeric BAC clones gave no detectable fluorescence in situ hybridization (FISH) signals on the chromosomes of rice and maize, and three of them also did not yield any FISH signals in barley, wheat, and rye. In addition, four of five Triticeae centromeric BAC clones did not hybridize to the B. distachyon centromeres, implying certain unique features of Brachypodium centromeres. Analysis of Brachypodium centromeric BAC sequences identified a long terminal repeat (LTR)-centromere retrotransposon of B. distachyon (CRBd1). This element was found in high copy number accounting for 1.6 % of the B. distachyon genome, and is enriched in Brachypodium centromeric regions. CRBd1 accumulated in active centromeres, but was lost from inactive ones. The LTR of CRBd1 appears to be specific to B. distachyon centromeres. These results reveal different evolutionary events of this retrotransposon family across grass species.
Poltev, V I; Anisimov, V M; Sanchez, C; Deriabina, A; Gonzalez, E; Garcia, D; Rivas, F; Polteva, N A
2016-01-01
It is generally accepted that the important characteristic features of the Watson-Crick duplex originate from the molecular structure of its subunits. However, it still remains to elucidate what properties of each subunit are responsible for the significant characteristic features of the DNA structure. The computations of desoxydinucleoside monophosphates complexes with Na-ions using density functional theory revealed a pivotal role of DNA conformational properties of single-chain minimal fragments in the development of unique features of the Watson-Crick duplex. We found that directionality of the sugar-phosphate backbone and the preferable ranges of its torsion angles, combined with the difference between purines and pyrimidines. in ring bases, define the dependence of three-dimensional structure of the Watson-Crick duplex on nucleotide base sequence. In this work, we extended these density functional theory computations to the minimal' fragments of DNA duplex, complementary desoxydinucleoside monophosphates complexes with Na-ions. Using several computational methods and various functionals, we performed a search for energy minima of BI-conformation for complementary desoxydinucleoside monophosphates complexes with different nucleoside sequences. Two sequences are optimized using ab initio method at the MP2/6-31++G** level of theory. The analysis of torsion angles, sugar ring puckering and mutual base positions of optimized structures demonstrates that the conformational characteristic features of complementary desoxydinucleoside monophosphates complexes with Na-ions remain within BI ranges and become closer to the corresponding characteristic features of the Watson-Crick duplex crystals. Qualitatively, the main characteristic features of each studied complementary desoxydinucleoside monophosphates complex remain invariant when different computational methods are used, although the quantitative values of some conformational parameters could vary lying within the limits typical for the corresponding family. We observe that popular functionals in density functional theory calculations lead to the overestimated distances between base pairs, while MP2 computations and the newer complex functionals produce the structures that have too close atom-atom contacts. A detailed study of some complementary desoxydinucleoside monophosphate complexes with Na-ions highlights the existence of several energy minima corresponding to BI-conformations, in other words, the complexity of the relief pattern of the potential energy surface of complementary desoxydinucleoside monophosphate complexes. This accounts for variability of conformational parameters of duplex fragments with the same base sequence. Popular molecular mechanics force fields AMBER and CHARMM reproduce most of the conformational characteristics of desoxydinucleoside monophosphates and their complementary complexes with Na-ions but fail to reproduce some details of the dependence of the Watson-Crick duplex conformation on the nucleotide sequence.
Logacheva, Maria D; Samigullin, Tahir H; Dhingra, Amit; Penin, Aleksey A
2008-01-01
Background Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. Results We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. Conclusion Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids. PMID:18492277
TagDust2: a generic method to extract reads from sequencing data.
Lassmann, Timo
2015-01-28
Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial. Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection. Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .
Mitochondrial ADCK3 employs an atypical protein kinase-like fold to enable coenzyme Q biosynthesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stefely, Jonathan A.; Reidenbach, Andrew G.; Ulbrich, Arne
The ancient UbiB protein kinase-like family is involved in isoprenoid lipid biosynthesis and is implicated in human diseases, but demonstration of UbiB kinase activity has remained elusive for unknown reasons. In this paper, we quantitatively define UbiB-specific sequence motifs and reveal their positions within the crystal structure of a UbiB protein, ADCK3. We find that multiple UbiB-specific features are poised to inhibit protein kinase activity, including an N-terminal domain that occupies the typical substrate binding pocket and a unique A-rich loop that limits ATP binding by establishing an unusual selectivity for ADP. A single alanine-to-glycine mutation of this loop flipsmore » this coenzyme selectivity and enables autophosphorylation but inhibits coenzyme Q biosynthesis in vivo, demonstrating functional relevance for this unique feature. Finally, our work provides mechanistic insight into UbiB enzyme activity and establishes a molecular foundation for further investigation of how UbiB family proteins affect diseases and diverse biological pathways.« less
Sunyer, Oriol J.
2016-01-01
Fishes (i.e., teleost fishes) are the largest group of vertebrates. Although their immune system is based on the fundamental receptors, pathways, and cell types found in all groups of vertebrates, fishes show a diversity of particular features that challenge some classical concepts of immunology. In this chapter, we discuss the particularities of fish immune repertoires from a comparative perspective. We examine how allelic exclusion can be achieved when multiple Ig loci are present, how isotypic diversity and functional specificity impact clonal complexity, how loss of the MHC class II molecules affects the cooperation between T and B cells, and how deep sequencing technologies bring new insights about somatic hypermutation in the absence of germinal centers. The unique coexistence of two distinct B-cell lineages respectively specialized in systemic and mucosal responses is also discussed. Finally, we try to show that the diverse adaptations of immune repertoires in teleosts can help in understanding how somatic adaptive mechanisms of immunity evolved in parallel in different lineages across vertebrates. PMID:26537384
IDH2 Mutations Define a Unique Subtype of Breast Cancer with Altered Nuclear Polarity
Chiang, Sarah; Weigelt, Britta; Wen, Huei-Chi; Pareja, Fresia; Raghavendra, Ashwini; Martelotto, Luciano G.; Burke, Kathleen A.; Basili, Thais; Li, Anqi; Geyer, Felipe C.; Piscuoglio, Salvatore; Ng, Charlotte K.Y.; Jungbluth, Achim A.; Balss, Jörg; Pusch, Stefan; Baker, Gabrielle M.; Cole, Kimberly S.; von Deimling, Andreas; Batten, Julie M.; Marotti, Jonathan D.; Soh, Hwei-Choo; McCalip, Benjamin L.; Serrano, Jonathan; Lim, Raymond S.; Siziopikou, Kalliopi P.; Lu, Song; Liu, Xiaolong; Hammour, Tarek; Brogi, Edi; Snuderl, Matija; Iafrate, A. John; Reis-Filho, Jorge S.; Schnitt, Stuart J.
2017-01-01
Solid papillary carcinoma with reverse polarity (SPCRP) is a rare breast cancer subtype with an obscure etiology. In this study, we sought to describe its unique histopathologic features and to identify the genetic alterations that underpin SPCRP using massively parallel whole-exome and targeted sequencing. The morphologic and immunohistochemical features of SPCRP support the invasive nature of this subtype. Ten of 13 (77%) SPCRPs harbored hotspot mutations at R172 of the isocitrate dehydrogenase IDH2, of which 8 of 10 displayed concurrent pathogenic mutations affecting PIK3CA or PIK3R1. One of the IDH2 wild-type SPCRPs harbored a TET2 Q548* truncating mutation coupled with a PIK3CA H1047R mutation. Functional studies demonstrated that IDH2 and PIK3CA hotspot mutations are likely drivers of SPCRP, resulting in its reversed nuclear polarization phenotype. Our results offer a molecular definition of SPCRP as a distinct breast cancer subtype. Concurrent IDH2 and PIK3CA mutations may help diagnose SPCRP and possibly direct effective treatment. PMID:27913435
Mitochondrial ADCK3 employs an atypical protein kinase-like fold to enable coenzyme Q biosynthesis
Stefely, Jonathan A.; Reidenbach, Andrew G.; Ulbrich, Arne; ...
2014-12-11
The ancient UbiB protein kinase-like family is involved in isoprenoid lipid biosynthesis and is implicated in human diseases, but demonstration of UbiB kinase activity has remained elusive for unknown reasons. In this paper, we quantitatively define UbiB-specific sequence motifs and reveal their positions within the crystal structure of a UbiB protein, ADCK3. We find that multiple UbiB-specific features are poised to inhibit protein kinase activity, including an N-terminal domain that occupies the typical substrate binding pocket and a unique A-rich loop that limits ATP binding by establishing an unusual selectivity for ADP. A single alanine-to-glycine mutation of this loop flipsmore » this coenzyme selectivity and enables autophosphorylation but inhibits coenzyme Q biosynthesis in vivo, demonstrating functional relevance for this unique feature. Finally, our work provides mechanistic insight into UbiB enzyme activity and establishes a molecular foundation for further investigation of how UbiB family proteins affect diseases and diverse biological pathways.« less
Unique topographic distribution of greyhound nonsuppurative meningoencephalitis.
Terzo, Eloisa; McConnell, J Fraser; Shiel, Robert E; McAllister, Hester; Behr, Sebastien; Priestnall, Simon L; Smith, Ken C; Nolan, Catherine M; Callanan, John J
2012-01-01
Greyhound nonsuppurative meningoencephalitis is an idiopathic breed-associated fatal meningoencephalitis with lesions usually occurring within the rostral cerebrum. This disorder can only be confirmed by postmortem examination, with a diagnosis based upon the unique topography of inflammatory lesions. Our purpose was to describe the magnetic resonance (MR) imaging features of this disease. Four Greyhounds with confirmed Greyhound nonsuppurative meningoencephalitis were evaluated by MR imaging. Lesions predominantly affected the olfactory lobes and bulbs, frontal, and frontotemporal cortical gray matter, and caudate nuclei bilaterally. Fluid attenuation inversion recovery (FLAIR) and T2 weighted spin-echo (T2W) sequences were most useful to assess the nature, severity, extension, and topographic pattern of lesions. Lesions were predominantly T2-hyperintense and T1-isointense with minimal or absent contrast enhancement. © 2012 Veterinary Radiology & Ultrasound.
Liu, Zhiya; Song, Xiaohong; Seger, Carol A.
2015-01-01
We examined whether the degree to which a feature is uniquely characteristic of a category can affect categorization above and beyond the typicality of the feature. We developed a multiple feature value category structure with different dimensions within which feature uniqueness and typicality could be manipulated independently. Using eye tracking, we found that the highest attentional weighting (operationalized as number of fixations, mean fixation time, and the first fixation of the trial) was given to a dimension that included a feature that was both unique and highly typical of the category. Dimensions that included features that were highly typical but not unique, or were unique but not highly typical, received less attention. A dimension with neither a unique nor a highly typical feature received least attention. On the basis of these results we hypothesized that subjects categorized via a rule learning procedure in which they performed an ordered evaluation of dimensions, beginning with unique and strongly typical dimensions, and in which earlier dimensions received higher weighting in the decision. This hypothesis accounted for performance on transfer stimuli better than simple implementations of two other common theories of category learning, exemplar models and prototype models, in which all dimensions were evaluated in parallel and received equal weighting. PMID:26274332
Liu, Zhiya; Song, Xiaohong; Seger, Carol A
2015-01-01
We examined whether the degree to which a feature is uniquely characteristic of a category can affect categorization above and beyond the typicality of the feature. We developed a multiple feature value category structure with different dimensions within which feature uniqueness and typicality could be manipulated independently. Using eye tracking, we found that the highest attentional weighting (operationalized as number of fixations, mean fixation time, and the first fixation of the trial) was given to a dimension that included a feature that was both unique and highly typical of the category. Dimensions that included features that were highly typical but not unique, or were unique but not highly typical, received less attention. A dimension with neither a unique nor a highly typical feature received least attention. On the basis of these results we hypothesized that subjects categorized via a rule learning procedure in which they performed an ordered evaluation of dimensions, beginning with unique and strongly typical dimensions, and in which earlier dimensions received higher weighting in the decision. This hypothesis accounted for performance on transfer stimuli better than simple implementations of two other common theories of category learning, exemplar models and prototype models, in which all dimensions were evaluated in parallel and received equal weighting.
Desiderato, Joana G; Alvarenga, Danillo O; Constancio, Milena T L; Alves, Lucia M C; Varani, Alessandro M
2018-05-14
Cellulose and its associated polymers are structural components of the plant cell wall, constituting one of the major sources of carbon and energy in nature. The carbon cycle is dependent on cellulose- and lignin-decomposing microbial communities and their enzymatic systems acting as consortia. These microbial consortia are under constant exploration for their potential biotechnological use. Herein, we describe the characterization of the genome of Dyella jiangningensis FCAV SCS01, recovered from the metagenome of a lignocellulose-degrading microbial consortium, which was isolated from a sugarcane crop soil under mechanical harvesting and covered by decomposing straw. The 4.7 Mbp genome encodes 4,194 proteins, including 36 glycoside hydrolases (GH), supporting the hypothesis that this bacterium may contribute to lignocellulose decomposition. Comparative analysis among fully sequenced Dyella species indicate that the genome synteny is not conserved, and that D. jiangningensis FCAV SCS01 carries 372 unique genes, including an alpha-glucosidase and maltodextrin glucosidase coding genes, and other potential biomass degradation related genes. Additional genomic features, such as prophage-like, genomic islands and putative new biosynthetic clusters were also uncovered. Overall, D. jiangningensis FCAV SCS01 represents the first South American Dyella genome sequenced and shows an exclusive feature among its genus, related to biomass degradation.
Tempest: Accelerated MS/MS database search software for heterogeneous computing platforms
Adamo, Mark E.; Gerber, Scott A.
2017-01-01
MS/MS database search algorithms derive a set of candidate peptide sequences from in-silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU generates peptide candidates that are asynchronously sent to a discrete GPU to be scored against experimental spectra in parallel (Milloy et al., 2012). The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. PMID:27603022
Performing skin microbiome research: A method to the madness
Kong, Heidi H.; Andersson, Björn; Clavel, Thomas; Common, John E.; Jackson, Scott A.; Olson, Nathan D.; Segre, Julia A.; Traidl-Hoffmann, Claudia
2017-01-01
Growing interest in microbial contributions to human health and disease has increasingly led investigators to examine the microbiome in both healthy skin and cutaneous disorders, including acne, psoriasis and atopic dermatitis. The need for common language, effective study design, and validated methods are critical for high-quality, standardized research. Features, unique to skin, pose particular challenges when conducting microbiome research. This review discusses microbiome research standards and highlights important factors to consider, including clinical study design, skin sampling, sample processing, DNA sequencing, control inclusion, and data analysis. PMID:28063650
Television observations of Mercury by Mariner 10
NASA Technical Reports Server (NTRS)
Murray, B. C.; Belton, M. J. S.; Danielson, E. G.; Davies, M. E.; Gault, D. E.; Hapke, B.; Oleary, B.; Strom, R. G.; Suomi, V.; Trask, N.
1977-01-01
The morphology and optical properties of the surface of Mercury resemble those of the Moon in remarkable detail, recording a very similar sequence of events; chemical and mineralogical similarity of the outer layers is implied. Mercury is probably a differentiated planet with an iron-rich core. Differentiation is inferred to have occurred very early. No evidence of atmospheric modification of any landform is found. Large-scale scarps and ridges unlike lunar or Martian features may reflect a unique period of planetary compression near the end of heavy bombardment, perhaps related to contraction of the core.
Television observations of Mercury by Mariner 10
NASA Technical Reports Server (NTRS)
Murray, B. C.; Belton, M. J. S.; Danielson, G. E.; Davies, M. E.; Gault, D. E.; Hapke, B.; Oleary, B.; Strom, R. G.; Suomi, V.; Trask, N.
1974-01-01
The morphology and optical properties of the surface of Mercury resemble that of the moon in remarkable detail, recording a very similar sequence of events; chemical and mineralogical similarity of the outer layers is implied. Mercury is probably a differentiated planet with an iron-rich core. Differentiation is inferred to have occurred very early. No evidence of atmospheric modification of any landform is found. Large-scale scarps and ridges unlike lunar or Martian features may reflect a unique period of planetary compression near the end of heavy bombardment, perhaps related to contraction of the core.
Sattley, W Matthew; Blankenship, Robert E
2010-06-01
The complete annotated genome sequence of Heliobacterium modesticaldum strain Ice1 provides our first glimpse into the genetic potential of the Heliobacteriaceae, a unique family of anoxygenic phototrophic bacteria. H. modesticaldum str. Ice1 is the first completely sequenced phototrophic representative of the Firmicutes, and heliobacteria are the only phototrophic members of this large bacterial phylum. The H. modesticaldum genome consists of a single 3.1-Mb circular chromosome with no plasmids. Of special interest are genomic features that lend insight to the physiology and ecology of heliobacteria, including the genetic inventory of the photosynthesis gene cluster. Genes involved in transport, photosynthesis, and central intermediary metabolism are described and catalogued. The obligately heterotrophic metabolism of heliobacteria is a key feature of the physiology and evolution of these phototrophs. The conspicuous absence of recognizable genes encoding the enzyme ATP-citrate lyase prevents autotrophic growth via the reverse citric acid cycle in heliobacteria, thus being a distinguishing differential characteristic between heliobacteria and green sulfur bacteria. The identities of electron carriers that enable energy conservation by cyclic light-driven electron transfer remain in question.
Spatiotemporal Patterns of Contact Across the Rat Vibrissal Array During Exploratory Behavior
Hobbs, Jennifer A.; Towal, R. Blythe; Hartmann, Mitra J. Z.
2016-01-01
The rat vibrissal system is an important model for the study of somatosensation, but the small size and rapid speed of the vibrissae have precluded measuring precise vibrissal-object contact sequences during behavior. We used a laser light sheet to quantify, with 1 ms resolution, the spatiotemporal structure of whisker-surface contact as five naïve rats freely explored a flat, vertical glass wall. Consistent with previous work, we show that the whisk cycle cannot be uniquely defined because different whiskers often move asynchronously, but that quasi-periodic (~8 Hz) variations in head velocity represent a distinct temporal feature on which to lock analysis. Around times of minimum head velocity, whiskers protract to make contact with the surface, and then sustain contact with the surface for extended durations (~25–60 ms) before detaching. This behavior results in discrete temporal windows in which large numbers of whiskers are in contact with the surface. These “sustained collective contact intervals” (SCCIs) were observed on 100% of whisks for all five rats. The overall spatiotemporal structure of the SCCIs can be qualitatively predicted based on information about head pose and the average whisk cycle. In contrast, precise sequences of whisker-surface contact depend on detailed head and whisker kinematics. Sequences of vibrissal contact were highly variable, equally likely to propagate in all directions across the array. Somewhat more structure was found when sequences of contacts were examined on a row-wise basis. In striking contrast to the high variability associated with contact sequences, a consistent feature of each SCCI was that the contact locations of the whiskers on the glass converged and moved more slowly on the sheet. Together, these findings lead us to propose that the rat uses a strategy of “windowed sampling” to extract an object's spatial features: specifically, the rat spatially integrates quasi-static mechanical signals across whiskers during the period of sustained contact, resembling an “enclosing” haptic procedure. PMID:26778990
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Turnbaugh, Peter J.; Quince, Christopher; Faith, Jeremiah J.; McHardy, Alice C.; Yatsunenko, Tanya; Niazi, Faheem; Affourtit, Jason; Egholm, Michael; Henrissat, Bernard; Knight, Rob; Gordon, Jeffrey I.
2010-01-01
We deeply sampled the organismal, genetic, and transcriptional diversity in fecal samples collected from a monozygotic (MZ) twin pair and compared the results to 1,095 communities from the gut and other body habitats of related and unrelated individuals. Using a new scheme for noise reduction in pyrosequencing data, we estimated the total diversity of species-level bacterial phylotypes in the 1.2-1.5 million bacterial 16S rRNA reads obtained from each deeply sampled cotwin to be ~800 (35.9%, 49.1% detected in both). A combined 1.1 million read 16S rRNA dataset representing 281 shallowly sequenced fecal samples from 54 twin pairs and their mothers contained an estimated 4,018 species-level phylotypes, with each sample having a unique species assemblage (53.4 ± 0.6% and 50.3 ± 0.5% overlap with the deeply sampled cotwins). Of the 134 phylotypes with a relative abundance of >0.1% in the combined dataset, only 37 appeared in >50% of the samples, with one phylotype in the Lachnospiraceae family present in 99%. Nongut communities had significantly reduced overlap with the deeply sequenced twins’ fecal microbiota (18.3 ± 0.3%, 15.3 ± 0.3%). The MZ cotwins’ fecal DNA was deeply sequenced (3.8-6.3 Gbp/sample) and assembled reads were assigned to 25 genus-level phylogenetic bins. Only 17% of the genes in these bins were shared between the cotwins. Bins exhibited differences in their degree of sequence variation, gene content including the repertoire of carbohydrate active enzymes present within and between twins (e.g., predicted cellulases, dockerins), and transcriptional activities. These results provide an expanded perspective about features that make each of us unique life forms and directions for future characterization of our gut ecosystems. PMID:20363958
NASBA: A detection and amplification system uniquely suited for RNA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sooknanan, R.; Malek, L.T.
1995-06-01
The invention of PCR (polymerase chain reaction) has revolutionized our ability to amplify and manipulate a nucleic acid sequence in vitro. The commercial rewards of this revolution have driven the development of other nuclei acid amplification and detection methodologies. This has created an alphabet soup of technologies that use different amplification methods, including NASBA (nucleic acid sequence-based amplification), LCR (ligase chain reaction), SDA (strand displacement amplification), QBR (Q-beta replicase), CPR (cycling probe reaction), and bDNA (branched DNA). Despite the differences in their processes, these amplification systems can be separated into two broad categories based on how they achieve their goal:more » sequence-based amplification systems, such as PCR, NASBA, and SDA, amplify a target nucleic acid sequence. Signal-based amplification systems, such as LCR, QBR, CPR and bDNA, amplify or alter a signal from a detection reaction that is target-dependent. While the various methods have relative strengths and weaknesses, only NASBA offers the unique ability to homogeneously amplify an RNA analyte in the presence of homologous genomic DNA under isothermal conditions. Since the detection of RNA sequences almost invariably measures biological activity, it is an excellent prognostic indicator of activities as diverse as virus production, gene expression, and cell viability. The isothermal nature of the reaction makes NASBA especially suitable for large-scale manual screening. These features extend NASBA`s application range from research to commercial diagnostic applications. Field test kits are presently under development for human diagnostics as well as the burgeoning fields of food and environmental diagnostic testing. These developments suggest future integration of NASBA into robotic workstations for high-throughput screening as well. 17 refs., 1 tab.« less
Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii
Krishnan, Neeraja M.
2017-01-01
Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357
Misas, Elizabeth; Muñoz, José Fernando; Gallo, Juan Esteban; McEwen, Juan Guillermo; Clay, Oliver Keatinge
2016-04-01
The presence of repetitive or non-unique DNA persisting over sizable regions of a eukaryotic genome can hinder the genome's successful de novo assembly from short reads: ambiguities in assigning genome locations to the non-unique subsequences can result in premature termination of contigs and thus overfragmented assemblies. Fungal mitochondrial (mtDNA) genomes are compact (typically less than 100 kb), yet often contain short non-unique sequences that can be shown to impede their successful de novo assembly in silico. Such repeats can also confuse processes in the cell in vivo. A well-studied example is ectopic (out-of-register, illegitimate) recombination associated with repeat pairs, which can lead to deletion of functionally important genes that are located between the repeats. Repeats that remain conserved over micro- or macroevolutionary timescales despite such risks may indicate functionally or structurally (e.g., for replication) important regions. This principle could form the basis of a mining strategy for accelerating discovery of function in genome sequences. We present here our screening of a sample of 11 fully sequenced fungal mitochondrial genomes by observing where exact k-mer repeats occurred several times; initial analyses motivated us to focus on 17-mers occurring more than three times. Based on the diverse repeats we observe, we propose that such screening may serve as an efficient expedient for gaining a rapid but representative first insight into the repeat landscapes of sparsely characterized mitochondrial chromosomes. Our matching of the flagged repeats to previously reported regions of interest supports the idea that systems of persisting, non-trivial repeats in genomes can often highlight features meriting further attention. Copyright © 2016 Elsevier Ltd. All rights reserved.
Slusser, Andrea; Bathula, Chandra S.; Sens, Donald A.; Somji, Seema; Sens, Mary Ann; Zhou, Xu Dong; Garrett, Scott H.
2015-01-01
Background Cultures of human proximal tubule cells have been widely utilized to study the role of EMT in renal disease. The goal of this study was to define the role of growth media composition on classic EMT responses, define the expression of E- and N-cadherin, and define the functional epitope of MT-3 that mediates MET in HK-2 cells. Methods Immunohistochemistry, microdissection, real-time PCR, western blotting, and ELISA were used to define the expression of E- and N-cadherin mRNA and protein in HK-2 and HPT cell cultures. Site-directed mutagenesis, stable transfection, measurement of transepithelial resistance and dome formation were used to define the unique amino acid sequence of MT-3 associated with MET in HK-2 cells. Results It was shown that both E- and N-cadherin mRNA and protein are expressed in the human renal proximal tubule. It was shown, based on the pattern of cadherin expression, connexin expression, vectorial active transport, and transepithelial resistance, that the HK-2 cell line has already undergone many of the early features associated with EMT. It was shown that the unique, six amino acid, C-terminal sequence of MT-3 is required for MT-3 to induce MET in HK-2 cells. Conclusions The results show that the HK-2 cell line can be an effective model to study later stages in the conversion of the renal epithelial cell to a mesenchymal cell. The HK-2 cell line, transfected with MT-3, may be an effective model to study the process of MET. The study implicates the unique C-terminal sequence of MT-3 in the conversion of HK-2 cells to display an enhanced epithelial phenotype. PMID:25803827
Conserved small mRNA with an unique, extended Shine-Dalgarno sequence
Hahn, Julia; Migur, Anzhela; von Boeselager, Raphael Freiherr; Kubatova, Nina; Kubareva, Elena; Schwalbe, Harald
2017-01-01
ABSTRACT Up to now, very small protein-coding genes have remained unrecognized in sequenced genomes. We identified an mRNA of 165 nucleotides (nt), which is conserved in Bradyrhizobiaceae and encodes a polypeptide with 14 amino acid residues (aa). The small mRNA harboring a unique Shine-Dalgarno sequence (SD) with a length of 17 nt was localized predominantly in the ribosome-containing P100 fraction of Bradyrhizobium japonicum USDA 110. Strong interaction between the mRNA and 30S ribosomal subunits was demonstrated by their co-sedimentation in sucrose density gradient. Using translational fusions with egfp, we detected weak translation and found that it is impeded by both the extended SD and the GTG start codon (instead of ATG). Biophysical characterization (CD- and NMR-spectroscopy) showed that synthesized polypeptide remained unstructured in physiological puffer. Replacement of the start codon by a stop codon increased the stability of the transcript, strongly suggesting additional posttranscriptional regulation at the ribosome. Therefore, the small gene was named rreB (ribosome-regulated expression in Bradyrhizobiaceae). Assuming that the unique ribosome binding site (RBS) is a hallmark of rreB homologs or similarly regulated genes, we looked for similar putative RBS in bacterial genomes and detected regions with at least 16 nt complementarity to the 3′-end of 16S rRNA upstream of sORFs in Caulobacterales, Rhizobiales, Rhodobacterales and Rhodospirillales. In the Rhodobacter/Roseobacter lineage of α-proteobacteria the corresponding gene (rreR) is conserved and encodes an 18 aa protein. This shows how specific RBS features can be used to identify new genes with presumably similar control of expression at the RNA level. PMID:27834614
Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji
2016-02-01
Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Evolutionary and biophysical relationships among the papillomavirus E2 proteins.
Blakaj, Dukagjin M; Fernandez-Fuentes, Narcis; Chen, Zigui; Hegde, Rashmi; Fiser, Andras; Burk, Robert D; Brenowitz, Michael
2009-01-01
Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed.
Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L
2017-07-05
Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct ® , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Analysis of SINE and LINE repeat content of Y chromosomes in the platypus, Ornithorhynchus anatinus.
Kortschak, R Daniel; Tsend-Ayush, Enkhjargal; Grützner, Frank
2009-01-01
Monotremes feature an extraordinary sex-chromosome system that consists of five X and five Y chromosomes in males. These sex chromosomes share homology with bird sex chromosomes but no homology with the therian X. The genome of a female platypus was recently completed, providing unique insights into sequence and gene content of autosomes and X chromosomes, but no Y-specific sequence has so far been analysed. Here we report the isolation, sequencing and analysis of approximately 700 kb of sequence of the non-recombining regions of Y2, Y3 and Y5, which revealed differences in base composition and repeat content between autosomes and sex chromosomes, and within the sex chromosomes themselves. This provides the first insights into repeat content of Y chromosomes in platypus, which overall show similar patterns of repeat composition to Y chromosomes in other species. Interestingly, we also observed differences between the various Y chromosomes, and in combination with timing and activity patterns we provide an approach that can be used to examine the evolutionary history of the platypus sex-chromosome chain.
Beaton, Ainsley; Lood, Cédric; Cunningham-Oakes, Edward; MacFadyen, Alison; Mullins, Alex J; Bestawy, Walid El; Botelho, João; Chevalier, Sylvie; Dalzell, Chloe; Dolan, Stephen K; Faccenda, Alberto; Ghequire, Maarten G K; Higgins, Steven; Kutschera, Alexander; Murray, Jordan; Redway, Martha; Salih, Talal; Smith, Brian A; Smits, Nathan; Thomson, Ryan; Woodcock, Stuart; Cornelis, Pierre; Lavigne, Rob; van Noort, Vera
2018-01-01
Abstract Pseudomonas baetica strain a390T is the type strain of this recently described species and here we present its high-contiguity draft genome. To celebrate the 16th International Conference on Pseudomonas, the genome of P. baetica strain a390T was sequenced using a unique combination of Ion Torrent semiconductor and Oxford Nanopore methods as part of a collaborative community-led project. The use of high-quality Ion Torrent sequences with long Nanopore reads gave rapid, high-contiguity and -quality, 16-contig genome sequence. Whole genome phylogenetic analysis places P. baetica within the P. koreensis clade of the P. fluorescens group. Comparison of the main genomic features of P. baetica with a variety of other Pseudomonas spp. suggests that it is a highly adaptable organism, typical of the genus. This strain was originally isolated from the liver of a diseased wedge sole fish, and genotypic and phenotypic analyses show that it is tolerant to osmotic stress and to oxytetracycline. PMID:29579234
Mu, Zhendong; Yin, Jinhai; Hu, Jianfeng
2018-01-01
In this paper, a person authentication system that can effectively identify individuals by generating unique electroencephalogram signal features in response to self-face and non-self-face photos is presented. In order to achieve a good stability performance, the sequence of self-face photo including first-occurrence position and non-first-occurrence position are taken into account in the serial occurrence of visual stimuli. In addition, a Fisher linear classification method and event-related potential technique for feature analysis is adapted to yield remarkably better outcomes than that by most of the existing methods in the field. The results have shown that the EEG-based person authentications via brain-computer interface can be considered as a suitable approach for biometric authentication system.
DIALOG: An executive computer program for linking independent programs
NASA Technical Reports Server (NTRS)
Glatt, C. R.; Hague, D. S.; Watson, D. A.
1973-01-01
A very large scale computer programming procedure called the DIALOG Executive System has been developed for the Univac 1100 series computers. The executive computer program, DIALOG, controls the sequence of execution and data management function for a library of independent computer programs. Communication of common information is accomplished by DIALOG through a dynamically constructed and maintained data base of common information. The unique feature of the DIALOG Executive System is the manner in which computer programs are linked. Each program maintains its individual identity and as such is unaware of its contribution to the large scale program. This feature makes any computer program a candidate for use with the DIALOG Executive System. The installation and use of the DIALOG Executive System are described at Johnson Space Center.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schürpf, Thomas; Chen, Qiang; Liu, Jin-huan
Developmental endothelial cell locus-1 (Del-1) glycoprotein is secreted by endothelial cells and a subset of macrophages. Del-1 plays a regulatory role in vascular remodeling and functions in innate immunity through interaction with integrin {alpha}{sub V}{beta}{sub 3}. Del-1 contains 3 epidermal growth factor (EGF)-like repeats and 2 discoidin-like domains. An Arg-Gly-Asp (RGD) motif in the second EGF domain (EGF2) mediates adhesion by endothelial cells and phagocytes. We report the crystal structure of its 3 EGF domains. The RGD motif of EGF2 forms a type II' {beta} turn at the tip of a long protruding loop, dubbed the RGD finger. Whereas EGF2more » and EGF3 constitute a rigid rod via an interdomain calcium ion binding site, the long linker between EGF1 and EGF2 lends considerable flexibility to EGF1. Two unique O-linked glycans and 1 N-linked glycan locate to the opposite side of EGF2 from the RGD motif. These structural features favor integrin binding of the RGD finger. Mutagenesis data confirm the importance of having the RGD motif at the tip of the RGD finger. A database search for EGF domain sequences shows that this RGD finger is likely an evolutionary insertion and unique to the EGF domain of Del-1 and its homologue milk fat globule-EGF 8. The RGD finger of Del-1 is a unique structural feature critical for integrin binding.« less
Hobbs, A A; Rosen, J M
1982-01-01
The complete sequences of rat alpha- and gamma-casein mRNAs have been determined. The 1402-nucleotide alpha- and 864-nucleotide gamma-casein mRNAs both encode 15 amino acid signal peptides and mature proteins of 269 and 164 residues, respectively. Considerable homology between the 5' non-coding regions, and the regions encoding the signal peptides and the phosphorylation sites, in these mRNAs as compared to several other rodent casein mRNAs, was observed. Significant homology was also detected between rat alpha- and bovine alpha s1-casein. Comparison of the rodent and bovine sequences suggests that the caseins evolved at about the time of the appearance of the primitive mammals. This may have occurred by intragenic duplication of a nucleotide sequence encoding a primitive phosphorylation site, -(Ser)n-Glu-Glu-, and intergenic duplication resulting in the small casein multigene family. A unique feature of the rat alpha-casein sequence is an insertion in the coding region containing 10 repeated elements of 18 nucleotides each. This insertion appears to have occurred 7-12 million years ago, just prior to the divergence of rat and mouse. Images PMID:6298707
The MB2 gene family of Plasmodium species has a unique combination of S1 and GTP-binding domains
Romero, Lisa C; Nguyen, Thanh V; Deville, Benoit; Ogunjumo, Oluwasanmi; James, Anthony A
2004-01-01
Background Identification and characterization of novel Plasmodium gene families is necessary for developing new anti-malarial therapeutics. The products of the Plasmodium falciparum gene, MB2, were shown previously to have a stage-specific pattern of subcellular localization and proteolytic processing. Results Genes homologous to MB2 were identified in five additional parasite species, P. knowlesi, P. gallinaceum, P. berghei, P. yoelii, and P. chabaudi. Sequence comparisons among the MB2 gene products reveal amino acid conservation of structural features, including putative S1 and GTP-binding domains, and putative signal peptides and nuclear localization signals. Conclusions The combination of domains is unique to this gene family and indicates that MB2 genes comprise a novel family and therefore may be a good target for drug development. PMID:15222903
Sahl, Jason W; Johnson, J Kristie; Harris, Anthony D; Phillippy, Adam M; Hsiao, William W; Thom, Kerri A; Rasko, David A
2011-06-04
Acinetobacter baumannii has recently emerged as a significant global pathogen, with a surprisingly rapid acquisition of antibiotic resistance and spread within hospitals and health care institutions. This study examines the genomic content of three A. baumannii strains isolated from distinct body sites. Isolates from blood, peri-anal, and wound sources were examined in an attempt to identify genetic features that could be correlated to each isolation source. Pulsed-field gel electrophoresis, multi-locus sequence typing and antibiotic resistance profiles demonstrated genotypic and phenotypic variation. Each isolate was sequenced to high-quality draft status, which allowed for comparative genomic analyses with existing A. baumannii genomes. A high resolution, whole genome alignment method detailed the phylogenetic relationships of sequenced A. baumannii and found no correlation between phylogeny and body site of isolation. This method identified genomic regions unique to both those isolates found on the surface of the skin or in wounds, termed colonization isolates, and those identified from body fluids, termed invasive isolates; these regions may play a role in the pathogenesis and spread of this important pathogen. A PCR-based screen of 74 A. baumanii isolates demonstrated that these unique genes are not exclusive to either phenotype or isolation source; however, a conserved genomic region exclusive to all sequenced A. baumannii was identified and verified. The results of the comparative genome analysis and PCR assay show that A. baumannii is a diverse and genomically variable pathogen that appears to have the potential to cause a range of human disease regardless of the isolation source.
Pan, Zhangyuan; Li, Shengdi; Liu, Qiuyue; Wang, Zhen; Zhou, Zhengkui; Di, Ran; Miao, Benpeng; Hu, Wenping; Wang, Xiangyu; Hu, Xiaoxiang; Xu, Ze; Wei, Dongkai; He, Xiaoyun; Yuan, Liyun; Guo, Xiaofei; Liang, Benmeng; Wang, Ruichao; Li, Xiaoyu; Cao, Xiaohan; Dong, Xinlong; Xia, Qing; Shi, Hongcai; Hao, Geng; Yang, Jean; Luosang, Cuicheng; Zhao, Yiqiang; Jin, Mei; Zhang, Yingjie; Lv, Shenjin; Li, Fukuan; Ding, Guohui; Chu, Mingxing; Li, Yixue
2018-01-01
Abstract Background Animal domestication has been extensively studied, but the process of feralization remains poorly understood. Results Here, we performed whole-genome sequencing of 99 sheep and identified a primary genetic divergence between 2 heterogeneous populations in the Tibetan Plateau, including 1 semi-feral lineage. Selective sweep and candidate gene analysis revealed local adaptations of these sheep associated with sensory perception, muscle strength, eating habit, mating process, and aggressive behavior. In particular, a horn-related gene, RXFP2, showed signs of rapid evolution specifically in the semi-feral breeds. A unique haplotype and repressed horn-related tissue expression of RXFP2 were correlated with higher horn length, as well as spiral and horizontally extended horn shape. Conclusions Semi-feralization has an extensive impact on diverse phenotypic traits of sheep. By acquiring features like those of their wild ancestors, semi-feral sheep were able to regain fitness while in frequent contact with wild surroundings and rare human interventions. This study provides a new insight into the evolution of domestic animals when human interventions are no longer dominant. PMID:29668959
Description of the PMAD DC test bed architecture and integration sequence
NASA Technical Reports Server (NTRS)
Beach, R. F.; Trash, L.; Fong, D.; Bolerjack, B.
1991-01-01
NASA-Lewis is responsible for the development, fabrication, and assembly of the electric power system (EPS) for the Space Station Freedom (SSF). The SSF power system is radically different from previous spacecraft power systems in both the size and complexity of the system. Unlike past spacecraft power system the SSF EPS will grow and be maintained on orbit and must be flexible to meet changing user power needs. The SSF power system is also unique in comparison with terrestrial power systems because it is dominated by power electronic converters which regulate and control the power. Although spacecraft historically have used power converters for regulation they typically involved only a single series regulating element. The SSF EPS involves multiple regulating elements, two or more in series, prior to the load. These unique system features required the construction of a testbed which would allow the development of spacecraft power system technology. A description is provided of the Power Management and Distribution (PMAD) DC Testbed which was assembled to support the design and early evaluation of the SSF EPS. A description of the integration process used in the assembly sequence is also given along with a description of the support facility.
Apple miRNAs and tasiRNAs with novel regulatory networks
2012-01-01
Background MicroRNAs (miRNAs) and their regulatory functions have been extensively characterized in model species but whether apple has evolved similar or unique regulatory features remains unknown. Results We performed deep small RNA-seq and identified 23 conserved, 10 less-conserved and 42 apple-specific miRNAs or families with distinct expression patterns. The identified miRNAs target 118 genes representing a wide range of enzymatic and regulatory activities. Apple also conserves two TAS gene families with similar but unique trans-acting small interfering RNA (tasiRNA) biogenesis profiles and target specificities. Importantly, we found that miR159, miR828 and miR858 can collectively target up to 81 MYB genes potentially involved in diverse aspects of plant growth and development. These miRNA target sites are differentially conserved among MYBs, which is largely influenced by the location and conservation of the encoded amino acid residues in MYB factors. Finally, we found that 10 of the 19 miR828-targeted MYBs undergo small interfering RNA (siRNA) biogenesis at the 3' cleaved, highly divergent transcript regions, generating over 100 sequence-distinct siRNAs that potentially target over 70 diverse genes as confirmed by degradome analysis. Conclusions Our work identified and characterized apple miRNAs, their expression patterns, targets and regulatory functions. We also discovered that three miRNAs and the ensuing siRNAs exploit both conserved and divergent sequence features of MYB genes to initiate distinct regulatory networks targeting a multitude of genes inside and outside the MYB family. PMID:22704043
Linkage map of the fragments of herpesvirus papio DNA.
Lee, Y S; Tanaka, A; Lau, R Y; Nonoyama, M; Rabin, H
1981-01-01
Herpesvirus papio (HVP), an Epstein-Barr-like virus, causes lymphoblastoid disease in baboons. The physical map of HVP DNA was constructed for the fragments produced by cleavage of HVP DNA with restriction endonucleases EcoRI, HindIII, SalI, and PvuI, which produced 12, 12, 10, and 4 fragments, respectively. The total molecular size of HVP DNA was calculated as close to 110 megadaltons. The following methods were used for construction of the map; (i) fragments near the ends of HVP DNA were identified by treating viral DNA with lambda exonuclease before restriction enzyme digestion; (ii) fragments containing nucleotide sequences in common with fragments from the second enzyme digest of HVP DNA were examined by Southern blot hybridization; and (iii) the location of some fragments was determined by isolating individual fragments from agarose gels and redigesting the isolated fragments with a second restriction enzyme. Terminal heterogeneity and internal repeats were found to be unique features of HVP DNA molecule. One to five repeats of 0.8 megadaltons were found at both terminal ends. Although the repeats of both ends shared a certain degree of homology, it was not determined whether they were identical repeats. The internal repeat sequence of HVP DNA was found in the EcoRI-C region, which extended from 8.4 to 23 megadaltons from the left end of the molecule. The average number of the repeats was calculated to be seven, and the molecular size was determined to be 1.8 megadaltons. Similar unique features have been reported in EBV DNA (D. Given and E. Kieff, J. Virol. 28:524-542, 1978). Images PMID:6261015
RECOVIR Software for Identifying Viruses
NASA Technical Reports Server (NTRS)
Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui
2013-01-01
Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.
In vivo insertion pool sequencing identifies virulence factors in a complex fungal–host interaction
Uhse, Simon; Pflug, Florian G.; Stirnberg, Alexandra; Ehrlinger, Klaus; von Haeseler, Arndt
2018-01-01
Large-scale insertional mutagenesis screens can be powerful genome-wide tools if they are streamlined with efficient downstream analysis, which is a serious bottleneck in complex biological systems. A major impediment to the success of next-generation sequencing (NGS)-based screens for virulence factors is that the genetic material of pathogens is often underrepresented within the eukaryotic host, making detection extremely challenging. We therefore established insertion Pool-Sequencing (iPool-Seq) on maize infected with the biotrophic fungus U. maydis. iPool-Seq features tagmentation, unique molecular barcodes, and affinity purification of pathogen insertion mutant DNA from in vivo-infected tissues. In a proof of concept using iPool-Seq, we identified 28 virulence factors, including 23 that were previously uncharacterized, from an initial pool of 195 candidate effector mutants. Because of its sensitivity and quantitative nature, iPool-Seq can be applied to any insertional mutagenesis library and is especially suitable for genetically complex setups like pooled infections of eukaryotic hosts. PMID:29684023
Stone, Jonathan W; Bleckley, Samuel; Lavelle, Sean; Schroeder, Susan J
2015-01-01
We present new modifications to the Wuchty algorithm in order to better define and explore possible conformations for an RNA sequence. The new features, including parallelization, energy-independent lonely pair constraints, context-dependent chemical probing constraints, helix filters, and optional multibranch loops, provide useful tools for exploring the landscape of RNA folding. Chemical probing alone may not necessarily define a single unique structure. The helix filters and optional multibranch loops are global constraints on RNA structure that are an especially useful tool for generating models of encapsidated viral RNA for which cryoelectron microscopy or crystallography data may be available. The computations generate a combinatorially complete set of structures near a free energy minimum and thus provide data on the density and diversity of structures near the bottom of a folding funnel for an RNA sequence. The conformational landscapes for some RNA sequences may resemble a low, wide basin rather than a steep funnel that converges to a single structure.
Rusniok, Christophe; Lomma, Mariella; Dervins-Ravault, Delphine; Newton, Hayley J.; Sansom, Fiona M.; Jarraud, Sophie; Zidane, Nora; Ma, Laurence; Bouchier, Christiane; Etienne, Jerôme; Hartland, Elizabeth L.; Buchrieser, Carmen
2010-01-01
Legionella pneumophila and L. longbeachae are two species of a large genus of bacteria that are ubiquitous in nature. L. pneumophila is mainly found in natural and artificial water circuits while L. longbeachae is mainly present in soil. Under the appropriate conditions both species are human pathogens, capable of causing a severe form of pneumonia termed Legionnaires' disease. Here we report the sequencing and analysis of four L. longbeachae genomes, one complete genome sequence of L. longbeachae strain NSW150 serogroup (Sg) 1, and three draft genome sequences another belonging to Sg1 and two to Sg2. The genome organization and gene content of the four L. longbeachae genomes are highly conserved, indicating strong pressure for niche adaptation. Analysis and comparison of L. longbeachae strain NSW150 with L. pneumophila revealed common but also unexpected features specific to this pathogen. The interaction with host cells shows distinct features from L. pneumophila, as L. longbeachae possesses a unique repertoire of putative Dot/Icm type IV secretion system substrates, eukaryotic-like and eukaryotic domain proteins, and encodes additional secretion systems. However, analysis of the ability of a dotA mutant of L. longbeachae NSW150 to replicate in the Acanthamoeba castellanii and in a mouse lung infection model showed that the Dot/Icm type IV secretion system is also essential for the virulence of L. longbeachae. In contrast to L. pneumophila, L. longbeachae does not encode flagella, thereby providing a possible explanation for differences in mouse susceptibility to infection between the two pathogens. Furthermore, transcriptome analysis revealed that L. longbeachae has a less pronounced biphasic life cycle as compared to L. pneumophila, and genome analysis and electron microscopy suggested that L. longbeachae is encapsulated. These species-specific differences may account for the different environmental niches and disease epidemiology of these two Legionella species. PMID:20174605
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.
Herold, Keith E; Rasooly, Avraham
2003-12-01
Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
de la Bastide, Paul Y; Leung, Wai Lam; Hintz, William E
2015-01-01
The ITS region of the rDNA gene was compared for Saprolegnia spp. in order to improve our understanding of nucleotide sequence variability within and between species of this genus, determine species composition in Canadian fin fish aquaculture facilities, and to assess the utility of ITS sequence variability in genetic marker development. From a collection of more than 400 field isolates, ITS region nucleotide sequences were studied and it was determined that there was sufficient consistent inter-specific variation to support the designation of species identity based on ITS sequence data. This non-subjective approach to species identification does not rely upon transient morphological features. Phylogenetic analyses comparing our ITS sequences and species designations with data from previous studies generally supported the clade scheme of Diéguez-Uribeondo et al. (2007) and found agreement with the molecular taxonomic cluster system of Sandoval-Sierra et al. (2014). Our Canadian ITS sequence collection will thus contribute to the public database and assist the clarification of Saprolegnia spp. taxonomy. The analysis of ITS region sequence variability facilitated genus- and species-level identification of unknown samples from aquaculture facilities and provided useful information on species composition. A unique ITS-RFLP for the identification of S. parasitica was also described. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Secondary structural entropy in RNA switch (Riboswitch) identification.
Manzourolajdad, Amirhossein; Arnold, Jonathan
2015-04-28
RNA regulatory elements play a significant role in gene regulation. Riboswitches, a widespread group of regulatory RNAs, are vital components of many bacterial genomes. These regulatory elements generally function by forming a ligand-induced alternative fold that controls access to ribosome binding sites or other regulatory sites in RNA. Riboswitch-mediated mechanisms are ubiquitous across bacterial genomes. A typical class of riboswitch has its own unique structural and biological complexity, making de novo riboswitch identification a formidable task. Traditionally, riboswitches have been identified through comparative genomics based on sequence and structural homology. The limitations of structural-homology-based approaches, coupled with the assumption that there is a great diversity of undiscovered riboswitches, suggests the need for alternative methods for riboswitch identification, possibly based on features intrinsic to their structure. As of yet, no such reliable method has been proposed. We used structural entropy of riboswitch sequences as a measure of their secondary structural dynamics. Entropy values of a diverse set of riboswitches were compared to that of their mutants, their dinucleotide shuffles, and their reverse complement sequences under different stochastic context-free grammar folding models. Significance of our results was evaluated by comparison to other approaches, such as the base-pairing entropy and energy landscapes dynamics. Classifiers based on structural entropy optimized via sequence and structural features were devised as riboswitch identifiers and tested on Bacillus subtilis, Escherichia coli, and Synechococcus elongatus as an exploration of structural entropy based approaches. The unusually long untranslated region of the cotH in Bacillus subtilis, as well as upstream regions of certain genes, such as the sucC genes were associated with significant structural entropy values in genome-wide examinations. Various tests show that there is in fact a relationship between higher structural entropy and the potential for the RNA sequence to have alternative structures, within the limitations of our methodology. This relationship, though modest, is consistent across various tests. Understanding the behavior of structural entropy as a fairly new feature for RNA conformational dynamics, however, may require extensive exploratory investigation both across RNA sequences and folding models.
Gupta, Radhey S
2012-11-01
The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been laterally transferred between these groups. Other results and observations reported here indicate that the genes for the BchL-N-B proteins in Proteobacteria are derived from the Clade C Cyanobacteria, whereas those in Chlorobi were acquired from Chloroflexus or related bacteria by means of LGTs. Some implications of these observations regarding the origin and spread of photosynthesis are discussed.
Working the kinks out of nucleosomal DNA
Olson, Wilma K.; Zhurkin, Victor B.
2011-01-01
Condensation of DNA in the nucleosome takes advantage of its double-helical architecture. The DNA deforms at sites where the base pairs face the histone octamer. The largest so-called kink-and-slide deformations occur in the vicinity of arginines that penetrate the minor groove. Nucleosome structures formed from the 601 positioning sequence differ subtly from those incorporating an AT-rich human α-satellite DNA. Restraints imposed by the histone arginines on the displacement of base pairs can modulate the sequence-dependent deformability of DNA and potentially contribute to the unique features of the different nucleosomes. Steric barriers mimicking constraints found in the nucleosome induce the simulated large-scale rearrangement of canonical B-DNA to kink-and-slide states. The pathway to these states shows non-harmonic behavior consistent with bending profiles inferred from AFM measurements. PMID:21482100
Bhattacharya, Pamela; Barnebey, Adam; Zemla, Marcin; ...
2015-10-05
Thermoanaerobacter thermohydrosulfuricus BSB-33 is a thermophilic gram positive obligate anaerobe isolated from a hot spring in West Bengal, India. Unlike other T. thermohydrosulfuricus strains, BSB-33 is able to anaerobically reduce Fe(III) and Cr(VI) optimally at 60 °C. BSB-33 is the first Cr(VI) reducing T. thermohydrosulfuricus genome sequenced and of particular interest for bioremediation of environmental chromium contaminations. Here we discuss features of T. thermohydrosulfuricus BSB-33 and the unique genetic elements that may account for the peculiar metal reducing properties of this organism. The T. thermohydrosulfuricus BSB-33 genome comprises 2597606 bp encoding 2581 protein genes, 12 rRNA, 193 pseudogenes and hasmore » a G + C content of 34.20 %. Lastly, putative chromate reductases were identified by comparative analyses with other Thermoanaerobacter and chromate-reducing bacteria.« less
Microfluidics for genome-wide studies involving next generation sequencing
Murphy, Travis W.; Lu, Chang
2017-01-01
Next-generation sequencing (NGS) has revolutionized how molecular biology studies are conducted. Its decreasing cost and increasing throughput permit profiling of genomic, transcriptomic, and epigenomic features for a wide range of applications. Microfluidics has been proven to be highly complementary to NGS technology with its unique capabilities for handling small volumes of samples and providing platforms for automation, integration, and multiplexing. In this article, we review recent progress on applying microfluidics to facilitate genome-wide studies. We emphasize on several technical aspects of NGS and how they benefit from coupling with microfluidic technology. We also summarize recent efforts on developing microfluidic technology for genomic, transcriptomic, and epigenomic studies, with emphasis on single cell analysis. We envision rapid growth in these directions, driven by the needs for testing scarce primary cell samples from patients in the context of precision medicine. PMID:28396707
High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.
Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie
2015-06-17
High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.
Iacobino, Angelo; Scalfaro, Concetta; Franciosa, Giovanna
2013-01-01
We determined the genetic maps of the megaplasmids of six neutoroxigenic Clostridium butyricum type E strains from Italy using molecular and bioinformatics techniques. The megaplasmids are circular, not linear as we had previously proposed. The differently-sized megaplasmids share a genetic region that includes structural, metabolic and regulatory genes. In addition, we found that a 168 kb genetic region is present only in the larger megaplasmids of two tested strains, whereas it is absent from the smaller megaplasmids of the four remaining strains. The genetic region unique to the larger megaplasmids contains, among other features, a locus for clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR associated (cas) genes, i.e. a bacterial adaptive immune system providing sequence-specific protection from invading genetic elements. Some CRISPR spacer sequences of the neurotoxigenic C. butyricum type E strains showed homology to prophage, phage and plasmid sequences from closely related clostridia species or from distant species, all sharing the intestinal habitat, suggesting that the CRISPR locus might be involved in the microorganism adaptation to the human or animal intestinal environment. Besides, we report here that each of four distinct CRISPR spacers partially matched DNA sequences of different prophages and phages, at identical nucleotide locations. This suggests that, at least in neurotoxigenic C. butyricum type E, the CRISPR locus is potentially able to recognize the same conserved DNA sequence of different invading genetic elements, besides targeting sequences unique to previously encountered invading DNA, as currently predicted for a CRISPR locus. Thus, the results of this study introduce the possibility that CRISPR loci can provide resistance to a wider range of invading DNA elements than previously appreciated. Whether it is more advantageous for the peculiar neurotoxigenic C. butyricum type E strains to maintain or to lose the CRISPR-cas system remains an open question. PMID:23967192
Familiari, Giuseppe; Heyn, Rosemarie; Relucenti, Michela; Nottola, Stefania A; Sathananthan, A Henry
2006-01-01
This study describes the updated, fine structure of human gametes, the human fertilization process, and human embryos, mainly derived from assisted reproductive technology (ART). As clearly shown, the ultrastructure of human reproduction is a peculiar multistep process, which differs in part from that of other mammalian models, having some unique features. Particular attention has been devoted to the (1) sperm ultrastructure, likely "Tygerberg (Kruger) strict morphology criteria"; (2) mature oocyte, in which the MII spindle is barrel shaped, anastral, and lacking centrioles; (3) three-dimensional microarchitecture of the zona pellucida with its unique supramolecular filamentous organization; (4) sperm-egg interactions with the peculiarity of the sperm centrosome that activates the egg and organizes the sperm aster and mitotic spindles of the embryo; and (5) presence of viable cumulus cells whose metabolic activity is closely related to egg and embryo behavior in in vitro as well as in vivo conditions, in a sort of extraovarian "microfollicular unit." Even if the ultrastructural morphodynamic features of human fertilization are well understood, our knowledge about in vivo fertilization is still very limited and the complex sequence of in vivo biological steps involved in human reproduction is only partially reproduced in current ART procedures.
Gurvich, Olga L.; Näsvall, S. Joakim; Baranov, Pavel V.; Björk, Glenn R.; Atkins, John F.
2011-01-01
The bacterial pheL gene encodes the leader peptide for the phenylalanine biosynthetic operon. Translation of pheL mRNA controls transcription attenuation and, consequently, expression of the downstream pheA gene. Fifty-three unique pheL genes have been identified in sequenced genomes of the gamma subdivision. There are two groups of pheL genes, both of which are short and contain a run(s) of phenylalanine codons at an internal position. One group is somewhat diverse and features different termination and 5′-flanking codons. The other group, mostly restricted to Enterobacteria and including Escherichia coli pheL, has a conserved nucleotide sequence that ends with UUC_CCC_UGA. When these three codons in E. coli pheL mRNA are in the ribosomal E-, P- and A-sites, there is an unusually high level, 15%, of +1 ribosomal frameshifting due to features of the nascent peptide sequence that include the penultimate phenylalanine. This level increases to 60% with a natural, heterologous, nascent peptide stimulator. Nevertheless, studies with different tRNAPro mutants in Salmonella enterica suggest that frameshifting at the end of pheL does not influence expression of the downstream pheA. This finding of incidental, rather than utilized, frameshifting is cautionary for other studies of programmed frameshifting. PMID:21177642
Mercury's surface: Preliminary description and interpretation from Mariner 10 pictures
Murray, B.C.; Belton, M.J.S.; Danielson, G. Edward; Davies, M.E.; Gault, D.E.; Hapke, B.; O'Leary, B.; Strom, R.G.; Suomi, V.; Trask, N.
1974-01-01
The surface morphology and optical properties of Mercury resemble those of the moon in remarkable detail and record a very similar sequence of events. Chemical and mineralogical similarity of the outer layers of Mercury and the moon is implied; Mercury is probably a differentiated planet with a large iron-rich core. Differentiation is inferred to have occurred very early. No evidence of atmospheric modification of landforms has been found. Large-scale scarps and ridges unlike lunar or martian features may reflect a unique period of planetary compression near the end of heavy bombardment by small planetesimals.
Hadjikyriacou, Andrea; Yang, Yanzhong; Espejo, Alexsandra; Bedford, Mark T.; Clarke, Steven G.
2015-01-01
Human protein arginine methyltransferase (PRMT) 9 symmetrically dimethylates arginine residues on splicing factor SF3B2 (SAP145) and has been functionally linked to the regulation of alternative splicing of pre-mRNA. Site-directed mutagenesis studies on this enzyme and its substrate had revealed essential unique residues in the double E loop and the importance of the C-terminal duplicated methyltransferase domain. In contrast to what had been observed with other PRMTs and their physiological substrates, a peptide containing the methylatable Arg-508 of SF3B2 was not recognized by PRMT9 in vitro. Although amino acid substitutions of residues surrounding Arg-508 had no great effect on PRMT9 recognition of SF3B2, moving the arginine residue within this sequence abolished methylation. PRMT9 and PRMT5 are the only known mammalian enzymes capable of forming symmetric dimethylarginine (SDMA) residues as type II PRMTs. We demonstrate here that the specificity of these enzymes for their substrates is distinct and not redundant. The loss of PRMT5 activity in mouse embryo fibroblasts results in almost complete loss of SDMA, suggesting that PRMT5 is the primary SDMA-forming enzyme in these cells. PRMT9, with its duplicated methyltransferase domain and conserved sequence in the double E loop, appears to have a unique structure and specificity among PRMTs for methylating SF3B2 and potentially other polypeptides. PMID:25979344
Zhao, Shanrong; Zhang, Ying; Gamini, Ramya; Zhang, Baohong; von Schack, David
2018-03-19
To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.
Micro- and nanofluidic systems in devices for biological, medical and environmental research
NASA Astrophysics Data System (ADS)
Evstrapov, A. A.
2017-11-01
The use of micro- and nanofluidic systems in modern analytical instruments allow you to implement a number of unique opportunities and achieve ultra-high measurement sensitivity. The possibility of manipulation of the individual biological objects (cells, bacteria, viruses, proteins, nucleic acids) in a liquid medium caused the development of devices on microchip platform for methods: chromatographic and electrophoretic analyzes; polymerase chain reaction; sequencing of nucleic acids; immunoassay; cytometric studies. Development of micro and nano fabrication technologies, materials science, surface chemistry, analytical chemistry, cell engineering have led to the creation of a unique systems such as “lab-on-a-chip”, “human-on-a-chip” and other. This article discusses common in microfluidics materials and methods of making functional structures. Examples of integration of nanoscale structures in microfluidic devices for the implementation of new features and improve the technical characteristics of devices and systems are shown.
Advancements in zebrafish applications for 21st century toxicology.
Garcia, Gloria R; Noyes, Pamela D; Tanguay, Robert L
2016-05-01
The zebrafish model is the only available high-throughput vertebrate assessment system, and it is uniquely suited for studies of in vivo cell biology. A sequenced and annotated genome has revealed a large degree of evolutionary conservation in comparison to the human genome. Due to our shared evolutionary history, the anatomical and physiological features of fish are highly homologous to humans, which facilitates studies relevant to human health. In addition, zebrafish provide a very unique vertebrate data stream that allows researchers to anchor hypotheses at the biochemical, genetic, and cellular levels to observations at the structural, functional, and behavioral level in a high-throughput format. In this review, we will draw heavily from toxicological studies to highlight advances in zebrafish high-throughput systems. Breakthroughs in transgenic/reporter lines and methods for genetic manipulation, such as the CRISPR-Cas9 system, will be comprised of reports across diverse disciplines. Copyright © 2016 Elsevier Inc. All rights reserved.
Advancements in zebrafish applications for 21st century toxicology
Garcia, Gloria R.; Noyes, Pamela D.; Tanguay, Robert L.
2016-01-01
The zebrafish model is the only available high-throughput vertebrate assessment system, and it is uniquely suited for studies of in vivo cell biology. A sequenced and annotated genome has revealed a large degree of evolutionary conservation in comparison to the human genome. Due to our shared evolutionary history, the anatomical and physiological features of fish are highly homologous to humans, which facilitates studies relevant to human health. In addition, zebrafish provide a very unique vertebrate data stream that allows researchers to anchor hypotheses at the biochemical, genetic, and cellular levels to observations at the structural, functional, and behavioral level in a high-throughput format. In this review, we will draw heavily from toxicological studies to highlight advances in zebrafish high-throughput systems. Breakthroughs in transgenic/reporter lines and methods for genetic manipulation, such as the CRISPR-Cas9 system, will be comprised of reports across diverse disciplines. PMID:27016469
Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F
2017-06-01
Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.
An iris recognition algorithm based on DCT and GLCM
NASA Astrophysics Data System (ADS)
Feng, G.; Wu, Ye-qing
2008-04-01
With the enlargement of mankind's activity range, the significance for person's status identity is becoming more and more important. So many different techniques for person's status identity were proposed for this practical usage. Conventional person's status identity methods like password and identification card are not always reliable. A wide variety of biometrics has been developed for this challenge. Among those biologic characteristics, iris pattern gains increasing attention for its stability, reliability, uniqueness, noninvasiveness and difficult to counterfeit. The distinct merits of the iris lead to its high reliability for personal identification. So the iris identification technique had become hot research point in the past several years. This paper presents an efficient algorithm for iris recognition using gray-level co-occurrence matrix(GLCM) and Discrete Cosine transform(DCT). To obtain more representative iris features, features from space and DCT transformation domain are extracted. Both GLCM and DCT are applied on the iris image to form the feature sequence in this paper. The combination of GLCM and DCT makes the iris feature more distinct. Upon GLCM and DCT the eigenvector of iris extracted, which reflects features of spatial transformation and frequency transformation. Experimental results show that the algorithm is effective and feasible with iris recognition.
Evolutionary insights from Erwinia amylovora genomics.
Smits, Theo H M; Rezzonico, Fabio; Duffy, Brion
2011-08-20
Evolutionary genomics is coming into focus with the recent availability of complete sequences for many bacterial species. A hypothesis on the evolution of virulence factors in the plant pathogen Erwinia amylovora, the causative agent of fire blight, was generated using comparative genomics with the genomes E. amylovora, Erwinia pyrifoliae and Erwinia tasmaniensis. Putative virulence factors were mapped to the proposed genealogy of the genus Erwinia that is based on phylogenetic and genomic data. Ancestral origin of several virulence factors was identified, including levan biosynthesis, sorbitol metabolism, three T3SS and two T6SS. Other factors appeared to have been acquired after divergence of pathogenic species, including a second flagellar gene and two glycosyltransferases involved in amylovoran biosynthesis. E. amylovora singletons include 3 unique T3SS effectors that may explain differential virulence/host ranges. E. amylovora also has a unique T1SS export system, and a unique third T6SS gene cluster. Genetic analysis revealed signatures of foreign DNA suggesting that horizontal gene transfer is responsible for some of these differential features between the three species. Copyright © 2010 Elsevier B.V. All rights reserved.
Mirza, Babur S.; Muruganandam, Subathra; Meng, Xianyu; Sorensen, Darwin L.; Dupont, R. Ryan
2014-01-01
Basin-fill aquifers of the Southwestern United States are associated with elevated concentrations of arsenic (As) in groundwater. Many private domestic wells in the Cache Valley Basin, UT, have As concentrations in excess of the U.S. EPA drinking water limit. Thirteen sediment cores were collected from the center of the valley at the depth of the shallow groundwater and were sectioned into layers based on redoxmorphic features. Three of the layers, two from redox transition zones and one from a depletion zone, were used to establish microcosms. Microcosms were treated with groundwater (GW) or groundwater plus glucose (GW+G) to investigate the extent of As reduction in relation to iron (Fe) transformation and characterize the microbial community structure and function by sequencing 16S rRNA and arsenate dissimilatory reductase (arrA) genes. Under the carbon-limited conditions of the GW treatment, As reduction was independent of Fe reduction, despite the abundance of sequences related to Geobacter and Shewanella, genera that include a variety of dissimilatory iron-reducing bacteria. The addition of glucose, an electron donor and carbon source, caused substantial shifts toward domination of the bacterial community by Clostridium-related organisms, and As reduction was correlated with Fe reduction for the sediments from the redox transition zone. The arrA gene sequencing from microcosms at day 54 of incubation showed the presence of 14 unique phylotypes, none of which were related to any previously described arrA gene sequence, suggesting a unique community of dissimilatory arsenate-respiring bacteria in the Cache Valley Basin. PMID:24632255
Liu, Kaiyu; Li, Yi; Jousset, Françoise-Xavière; Zadori, Zoltan; Szelei, Jozsef; Yu, Qian; Pham, Hanh Thi; Lépine, François; Bergoin, Max; Tijssen, Peter
2011-01-01
The Acheta domesticus densovirus (AdDNV), isolated from crickets, has been endemic in Europe for at least 35 years. Severe epizootics have also been observed in American commercial rearings since 2009 and 2010. The AdDNV genome was cloned and sequenced for this study. The transcription map showed that splicing occurred in both the nonstructural (NS) and capsid protein (VP) multicistronic RNAs. The splicing pattern of NS mRNA predicted 3 nonstructural proteins (NS1 [576 codons], NS2 [286 codons], and NS3 [213 codons]). The VP gene cassette contained two VP open reading frames (ORFs), of 597 (ORF-A) and 268 (ORF-B) codons. The VP2 sequence was shown by N-terminal Edman degradation and mass spectrometry to correspond with ORF-A. Mass spectrometry, sequencing, and Western blotting of baculovirus-expressed VPs versus native structural proteins demonstrated that the VP1 structural protein was generated by joining ORF-A and -B via splicing (splice II), eliminating the N terminus of VP2. This splice resulted in a nested set of VP1 (816 codons), VP3 (467 codons), and VP4 (429 codons) structural proteins. In contrast, the two splices within ORF-B (Ia and Ib) removed the donor site of intron II and resulted in VP2, VP3, and VP4 expression. ORF-B may also code for several nonstructural proteins, of 268, 233, and 158 codons. The small ORF-B contains the coding sequence for a phospholipase A2 motif found in VP1, which was shown previously to be critical for cellular uptake of the virus. These splicing features are unique among parvoviruses and define a new genus of ambisense densoviruses. PMID:21775445
2014-01-01
Background Helicobacter pylori is well known for its relationship with the occurrence of several severe gastric diseases. The mechanisms of pathogenesis triggered by H. pylori are less well known. In this study, we report the genome sequence and genomic characterizations of H. pylori strain HLJ039 that was isolated from a patient with gastric cancer in the Chinese province of Heilongjiang, where there is a high incidence of gastric cancer. To investigate potential genomic features that may be involved in pathogenesis of carcinoma, the genome was compared to three previously sequenced genomes in this area. Result We obtained 42 contigs with a total length of 1,611,192 bp and predicted 1,687 coding sequences. Compared to strains isolated from gastritis and ulcers in this area, 10 different regions were identified as being unique for HLJ039; they mainly encoded type II restriction-modification enzyme, type II m6A methylase, DNA-cytosine methyltransferase, DNA methylase, and hypothetical proteins. A unique 547-bp fragment sharing 93% identity with a hypothetical protein of Helicobacter cinaedi ATCC BAA-847 was not present in any other previous H. pylori strains. Phylogenetic analysis based on core genome single nucleotide polymorphisms shows that HLJ039 is defined as hspEAsia subgroup, which belongs to the hpEastAsia group. Conclusion DNA methylations, variations of the genomic regions involved in restriction and modification systems, are the “hot” regions that may be related to the mechanism of H. pylori-induced gastric cancer. The genome sequence will provide useful information for the deep mining of potential mechanisms related to East Asian gastric cancer. PMID:24565107
Kamiie, J; Sugahara, G; Yoshimoto, S; Aihara, N; Mineshige, T; Uetsuka, K; Shirota, K
2017-01-01
Here we report a pig with amyloid A (AA) amyloidosis associated with Streptococcus suis infection and identification of a unique amyloid sequence in the amyloid deposits in the tissue. Tissues from the 180-day-old underdeveloped pig contained foci of necrosis and suppurative inflammation associated with S. suis infection. Congo red stain, immunohistochemistry, and electron microscopy revealed intense AA deposition in the spleen and renal glomeruli. Mass spectrometric analysis of amyloid material extracted from the spleen showed serum AA 2 (SAA2) peptide as well as a unique peptide sequence previously reported in a pig with AA amyloidosis. The common detection of the unique amyloid sequence in the current and past cases of AA amyloidosis in pigs suggests that this amyloid sequence might play a key role in the development of porcine AA amyloidosis. An in vitro fibrillation assay demonstrated that the unique AA peptide formed typically rigid, long amyloid fibrils (10 nm wide) and the N-terminus peptide of SAA2 formed zigzagged, short fibers (7 nm wide). Moreover, the SAA2 peptide formed long, rigid amyloid fibrils in the presence of sonicated amyloid fibrils formed by the unique AA peptide. These findings indicate that the N-terminus of SAA2 as well as the AA peptide mediate the development of AA amyloidosis in pigs via cross-seeding polymerization.
Yamazaki, Tomohiro; Matsuo, Junji; Takahashi, Satoshi; Kumagai, Shouta; Shimoda, Tomoko; Abe, Kiyotaka; Minami, Kunihiro; Yamaguchi, Hiroyuki
2015-12-01
Although sexually transmitted disease due to Chlamydia trachomatis occurs similarly in both men and women, the female urogenital tract differs from that of males anatomically and physiologically, possibly leading to specific polymorphisms of the bacterial surface molecules. In the present study, we therefore characterized polymorphic features in a high-definition phylogenetic marker, polymorphic outer membrane protein (Pmp) F of C. trachomatis strains isolated from male urogenital tracts in Japan (Category: Japan-males, n = 12), when compared with those isolated from female cervical ducts in Japan (Category: Japan-females, n = 11), female cervical ducts in the other country (Category: Ref-females, n = 12) or homosexual male rectums in the other country (Category: Ref-males, n = 7), by general bioinformatics analysis tool with MAFFT software. As a result, phylogenetic reconstruction of the PmpF amino acid sequences showing three distinct clusters revealed that the Japan-males were limited into cluster 1 and 2, although there were only four clusters even though including an outgroup. Meanwhile, the phylogenetic distance values of PmpF passenger domain without hinge region, but not its full-length sequence, showed that the Japan-males were more stable and displayed less diversity when compared with the other categories, supported by the sequence conservation features. Thus, PmpF passenger domain is a useful phylogenetic maker, and the phylogenic features indicate that C. trachomatis strains isolated from male urogenital tracts in Japan may be unique, suggesting an adaptation depending on selective pressure, such as the presence or absence of microbial flora, furthermore possibly connecting to sexual differentiation. Copyright © 2015 Japanese Society of Chemotherapy and The Japanese Association for Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Aguado-Llera, David; Martínez-Gómez, Ana Isabel; Prieto, Jesús; Marenchino, Marco; Traverso, José Angel; Gómez, Javier; Chueca, Ana; Neira, José L.
2011-01-01
Thioredoxins (TRXs) are ubiquitous proteins involved in redox processes. About forty genes encode TRX or TRX-related proteins in plants, grouped in different families according to their subcellular localization. For instance, the h-type TRXs are located in cytoplasm or mitochondria, whereas f-type TRXs have a plastidial origin, although both types of proteins have an eukaryotic origin as opposed to other TRXs. Herein, we study the conformational and the biophysical features of TRXh1, TRXh2 and TRXf from Pisum sativum. The modelled structures of the three proteins show the well-known TRX fold. While sharing similar pH-denaturations features, the chemical and thermal stabilities are different, being PsTRXh1 (Pisum sativum thioredoxin h1) the most stable isoform; moreover, the three proteins follow a three-state denaturation model, during the chemical-denaturations. These differences in the thermal- and chemical-denaturations result from changes, in a broad sense, of the several ASAs (accessible surface areas) of the proteins. Thus, although a strong relationship can be found between the primary amino acid sequence and the structure among TRXs, that between the residue sequence and the conformational stability and biophysical properties is not. We discuss how these differences in the biophysical properties of TRXs determine their unique functions in pea, and we show how residues involved in the biophysical features described (pH-titrations, dimerizations and chemical-denaturations) belong to regions involved in interaction with other proteins. Our results suggest that the sequence demands of protein-protein function are relatively rigid, with different protein-binding pockets (some in common) for each of the three proteins, but the demands of structure and conformational stability per se (as long as there is a maintained core), are less so. PMID:21364950
Itoh, Masayuki; Ide, Shuhei; Iwasaki, Yuji; Saito, Takashi; Narita, Keishi; Dai, Hongmei; Yamakura, Shinji; Furue, Takeki; Kitayama, Hirotsugu; Maeda, Keiko; Takahashi, Eihiko; Matsui, Kiyoshi; Goto, Yu-Ichi; Takeda, Sen; Arima, Masataka
2018-04-01
Arima syndrome (AS) is a rare disease and its clinical features mimic those of Joubert syndrome or Joubert syndrome-related diseases (JSRD). Recently, we clarified the AS diagnostic criteria and its severe phenotype. However, genetic evidence of AS remains unknown. We explored causative genes of AS and compared the clinical and genetic features of AS with the other JSRD. We performed genetic analyses of 4 AS patients of 3 families with combination of whole-exome sequencing and Sanger sequencing. Furthermore, we studied cell biology with the cultured fibroblasts of 3 AS patients. All patients had a specific homozygous variant (c.6012-12T>A, p.Arg2004Serfs*7) or compound heterozygous variants (c.1711+1G>A; c.6012-12T>A, p.Gly570Aspfs*19;Arg2004Serfs*7) in centrosomal protein 290 kDa (CEP290) gene. These unique variants lead to abnormal splicing and premature termination. Morphological analysis of cultured fibroblasts from AS patients revealed a marked decrease of the CEP290-positive cell number with significantly longer cilium and naked and protruded ciliary axoneme without ciliary membrane into the cytoplasm. AS resulted in cilia dysfunction from centrosome disruption. The unique variant of CEP290 could be strongly linked to AS pathology. Here, we provided AS specific genetic evidence, which steers the structure and functions of centrosome that is responsible for normal ciliogenesis. This is the first report that has demonstrated the molecular basis of Arima syndrome. Copyright © 2017 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.
The genomes and comparative genomics of Lactobacillus delbrueckii phages.
Riipinen, Katja-Anneli; Forsman, Päivi; Alatossava, Tapani
2011-07-01
Lactobacillus delbrueckii phages are a great source of genetic diversity. Here, the genome sequences of Lb. delbrueckii phages LL-Ku, c5 and JCL1032 were analyzed in detail, and the genetic diversity of Lb. delbrueckii phages belonging to different taxonomic groups was explored. The lytic isometric group b phages LL-Ku (31,080 bp) and c5 (31,841 bp) showed a minimum nucleotide sequence identity of 90% over about three-fourths of their genomes. The genomic locations of their lysis modules were unique, and the genomes featured several putative overlapping transcription units of genes. LL-Ku and c5 virions displayed peptidoglycan hydrolytic activity associated with a ~36-kDa protein similar in size to the endolysin. Unexpectedly, the 49,433-bp genome of the prolate phage JCL1032 (temperate, group c) revealed a conserved gene order within its structural genes. Lb. delbrueckii phages representing groups a (a phage LL-H), b and c possessed only limited protein sequence homology. Genomic comparison of LL-Ku and c5 suggested that diversification of Lb. delbrueckii phages is mainly due to insertions, deletions and recombination. For the first time, the complete genome sequences of group b and c Lb. delbrueckii phages are reported.
ReadXplorer—visualization and analysis of mapped sequences
Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander
2014-01-01
Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157
Akkuratov, Evgeny E; Walters, Lorraine; Saha-Mandal, Arnab; Khandekar, Sushant; Crawford, Erin; Zirbel, Craig L; Leisner, Scott; Prakash, Ashwin; Fedorova, Larisa; Fedorov, Alexei
2014-09-10
Orthologous introns have identical positions relative to the coding sequence in orthologous genes of different species. By analyzing the complete genomes of five plants we generated a database of 40,512 orthologous intron groups of dicotyledonous plants, 28,519 orthologous intron groups of angiosperms, and 15,726 of land plants (moss and angiosperms). Multiple sequence alignments of each orthologous intron group were obtained using the Mafft algorithm. The number of conserved regions in plant introns appeared to be hundreds of times fewer than that in mammals or vertebrates. Approximately three quarters of conserved intronic regions among angiosperms and dicots, in particular, correspond to alternatively-spliced exonic sequences. We registered only a handful of conserved intronic ncRNAs of flowering plants. However, the most evolutionarily conserved intronic region, which is ubiquitous for all plants examined in this study, including moss, possessed multiple structural features of tRNAs, which caused us to classify it as a putative tRNA-like ncRNA. Intronic sequences encoding tRNA-like structures are not unique to plants. Bioinformatics examination of the presence of tRNA inside introns revealed an unusually long-term association of four glycine tRNAs inside the Vac14 gene of fish, amniotes, and mammals. Copyright © 2014 Elsevier B.V. All rights reserved.
Gioia, Jason; Qin, Xiang; Jiang, Huaiyang; Clinkenbeard, Kenneth; Lo, Reggie; Liu, Yamei; Fox, George E.; Yerrapragada, Shailaja; McLeod, Michael P.; McNeill, Thomas Z.; Hemphill, Lisa; Sodergren, Erica; Wang, Qiaoyan; Muzny, Donna M.; Homsi, Farah J.; Weinstock, George M.; Highlander, Sarah K.
2006-01-01
The draft genome sequence of Mannheimia haemolytica A1, the causative agent of bovine respiratory disease complex (BRDC), is presented. Strain ATCC BAA-410, isolated from the lung of a calf with BRDC, was the DNA source. The annotated genome includes 2,839 coding sequences, 1,966 of which were assigned a function and 436 of which are unique to M. haemolytica. Through genome annotation many features of interest were identified, including bacteriophages and genes related to virulence, natural competence, and transcriptional regulation. In addition to previously described virulence factors, M. haemolytica encodes adhesins, including the filamentous hemagglutinin FhaB and two trimeric autotransporter adhesins. Two dual-function immunoglobulin-protease/adhesins are also present, as is a third immunoglobulin protease. Genes related to iron acquisition and drug resistance were identified and are likely important for survival in the host and virulence. Analysis of the genome indicates that M. haemolytica is naturally competent, as genes for natural competence and DNA uptake signal sequences (USS) are present. Comparison of competence loci and USS in other species in the family Pasteurellaceae indicates that M. haemolytica, Actinobacillus pleuropneumoniae, and Haemophilus ducreyi form a lineage distinct from other Pasteurellaceae. This observation was supported by a phylogenetic analysis using sequences of predicted housekeeping genes. PMID:17015664
DOE Office of Scientific and Technical Information (OSTI.GOV)
Szoets, H.; Reithmueller, G.; Weiss, E.
1986-03-01
Antigen HLA-B27 is a high-risk genetic factor with respect to a group of rheumatoid disorders, especially ankylosing spondylitis. A cDNA library was constructed from an autozygous B-cell line expressing HLA-B27, HLA-Cw1, and the previously cloned HLA-A2 antigen. Clones detected with an HLA probe were isolated and sorted into homology groups by differential hybridization and restriction maps. Nucleotide sequencing allowed the unambiguous assignment of cDNAs to HLA-A, -B, and -C loci. The HLA-B27 mRNA has the structure features and the codon variability typical of an HLA class I transcript but it specifies two uncommon amino acid replacements: a cysteine in positionmore » 67 and a serine in position 131. The latter substitution may have functional consequences, because it occurs in a conserved region and at a position invariably occupied by a species-specific arginine in humans and lysine in mice. The availability of the complete sequence of HLA-B27 and of the partial sequence of HLA-Cw1 allows the recognition of locus-specific sequence markers, particularly, but not exclusively, in the transmembrane and cytoplasmic domains.« less
A frequency-domain seismic blind deconvolution based on Gini correlations
NASA Astrophysics Data System (ADS)
Wang, Zhiguo; Zhang, Bing; Gao, Jinghuai; Huo Liu, Qing
2018-02-01
In reflection seismic processing, the seismic blind deconvolution is a challenging problem, especially when the signal-to-noise ratio (SNR) of the seismic record is low and the length of the seismic record is short. As a solution to this ill-posed inverse problem, we assume that the reflectivity sequence is independent and identically distributed (i.i.d.). To infer the i.i.d. relationships from seismic data, we first introduce the Gini correlations (GCs) to construct a new criterion for the seismic blind deconvolution in the frequency-domain. Due to a unique feature, the GCs are robust in their higher tolerance of the low SNR data and less dependent on record length. Applications of the seismic blind deconvolution based on the GCs show their capacity in estimating the unknown seismic wavelet and the reflectivity sequence, whatever synthetic traces or field data, even with low SNR and short sample record.
Microbiological Features of KPC-Producing Enterobacter Isolates Identified in a U.S. Hospital System
Ahn, Chulsoo; Syed, Alveena; Hu, Fupin; O’Hara, Jessica A.; Rivera, Jesabel I.; Doi, Yohei
2014-01-01
Microbiological data regarding KPC-producing Enterobacter spp. are scarce. In this study, 11 unique KPC-producing Enterobacter isolates were identified among 44 ertapenem-non-susceptible Enterobacter isolates collected between 2009 and 2013 at a hospital system in Western Pennsylvania. All cases were healthcare-associated and occurred in medically complex patients. While pulsed-field gel electrophoresis (PFGE) showed diverse restriction patterns overall, multilocus sequence typing (MLST) identified Enterobacter cloacae isolates with sequence types (STs) 93 and 171 from two hospitals each. The levels of carbapenem minimum inhibitory concentrations were highly variable. All isolates remained susceptible to colistin, tigecycline, and the majority to amikacin and doxycycline. A blaKPC-carrying IncN plasmid conferring trimethoprim-sulfamethoxazole resistance was identified in three of the isolates. Spread of blaKPC in Enterobacter spp. appears to be due to a combination of plasmid-mediated and clonal processes. PMID:25053203
Expanded complexity of unstable repeat diseases
Polak, Urszula; McIvor, Elizabeth; Dent, Sharon Y.R.; Wells, Robert D.; Napierala, Marek
2015-01-01
Unstable Repeat Diseases (URDs) share a common mutational phenomenon of changes in the copy number of short, tandemly repeated DNA sequences. More than 20 human neurological diseases are caused by instability, predominantly expansion, of microsatellite sequences. Changes in the repeat size initiate a cascade of pathological processes, frequently characteristic of a unique disease or a small subgroup of the URDs. Understanding of both the mechanism of repeat instability and molecular consequences of the repeat expansions is critical to developing successful therapies for these diseases. Recent technological breakthroughs in whole genome, transcriptome and proteome analyses will almost certainly lead to new discoveries regarding the mechanisms of repeat instability, the pathogenesis of URDs, and will facilitate development of novel therapeutic approaches. The aim of this review is to give a general overview of unstable repeats diseases, highlight the complexities of these diseases, and feature the emerging discoveries in the field. PMID:23233240
The Sequence and Analysis of Duplication Rich Human Chromosome 16
DOE R&D Accomplishments Database
Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.
2004-01-01
We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.
From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins
Forman-Kay, Julie D.; Mittag, Tanja
2015-01-01
Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708
From sequence and forces to structure, function, and evolution of intrinsically disordered proteins.
Forman-Kay, Julie D; Mittag, Tanja
2013-09-03
Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics, and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales, and compactness are shaping a unified understanding of structure-dynamics-disorder/function relationships. In the 20(th) anniversary of Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional, and evolutionary properties. Copyright © 2013 Elsevier Ltd. All rights reserved.
Genome analysis of the platypus reveals unique signatures of evolution.
Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K
2008-05-08
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Genome analysis of the platypus reveals unique signatures of evolution
Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.
2009-01-01
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734
Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.
Adamo, Mark E; Gerber, Scott A
2016-09-07
MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
A septal chromosome segregator protein evolved into a conjugative DNA-translocator protein
Sepulveda, Edgardo; Vogelmann, Jutta
2011-01-01
Streptomycetes, Gram-positive soil bacteria well known for the production of antibiotics feature a unique conjugative DNA transfer system. In contrast to classical conjugation which is characterized by the secretion of a pilot protein covalently linked to a single-stranded DNA molecule, in Streptomyces a double-stranded DNA molecule is translocated during conjugative transfer. This transfer involves a single plasmid encoded protein, TraB. A detailed biochemical and biophysical characterization of TraB, revealed a close relationship to FtsK, mediating chromosome segregation during bacterial cell division. TraB translocates plasmid DNA by recognizing 8-bp direct repeats located in a specific plasmid region clt. Similar sequences accidentally also occur on chromosomes and have been shown to be bound by TraB. We suggest that TraB mobilizes chromosomal genes by the interaction with these chromosomal clt-like sequences not relying on the integration of the conjugative plasmid into the chromosome. PMID:22479692
Rapid evaluation and quality control of next generation sequencing data with FaQCs.
Lo, Chien-Chi; Chain, Patrick S G
2014-11-19
Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
A cool tool for hot and sour Archaea: proteomics of Sulfolobus solfataricus.
Kort, Julia Christin; Esser, Dominik; Pham, Trong Khoa; Noirel, Josselin; Wright, Phillip C; Siebers, Bettina
2013-10-01
In recent years, much progress has been made in proteomic studies to unravel metabolic pathways and basic cellular processes. This is especially interesting for members of the Archaea, the third domain of life. Archaea exhibit extraordinary features and many of their cultivable representatives are adaptable to extreme environments. Archaea harbor many unique traits besides bacterial attributes, such as size, shape, and DNA structure and eukaryal characteristics like information processing. Sulfolobus solfataricus P2, a thermoacidophilic archaeal representative, is a well-established model organism adapted to low-pH environments (pH 2-3) and high temperatures (80°C). The genome has a size of 3 Mbp and its sequence has been deciphered. Approximately 3033 predicted open reading frames have been identified and the genome is characterized by a great number of diverse insertion sequence elements. In unraveling the organisms' metabolism and lifestyle, proteomic analyses have played a major role. Much effort has been directed at this organism and is reviewed here. With the help of proteomics, unique metabolic pathways were resolved in S. solfataricus, targets for regulatory protein phosphorylation identified, and cellular responses upon virus infection as well as oxidative stress analyzed. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hepatitis A virus: host interactions, molecular epidemiology and evolution.
Vaughan, Gilberto; Goncalves Rossi, Livia Maria; Forbi, Joseph C; de Paula, Vanessa S; Purdy, Michael A; Xia, Guoliang; Khudyakov, Yury E
2014-01-01
Infection with hepatitis A virus (HAV) is the commonest viral cause of liver disease and presents an important public health problem worldwide. Several unique HAV properties and molecular mechanisms of its interaction with host were recently discovered and should aid in clarifying the pathogenesis of hepatitis A. Genetic characterization of HAV strains have resulted in the identification of different genotypes and subtypes, which exhibit a characteristic worldwide distribution. Shifts in HAV endemicity occurring in different parts of the world, introduction of genetically diverse strains from geographically distant regions, genotype displacement observed in some countries and population expansion detected in the last decades of the 20th century using phylogenetic analysis are important factors contributing to the complex dynamics of HAV infections worldwide. Strong selection pressures, some of which, like usage of deoptimized codons, are unique to HAV, limit genetic variability of the virus. Analysis of subgenomic regions has been proven useful for outbreak investigations. However, sharing short sequences among epidemiologically unrelated strains indicates that specific identification of HAV strains for molecular surveillance can be achieved only using whole-genome sequences. Here, we present up-to-date information on the HAV molecular epidemiology and evolution, and highlight the most relevant features of the HAV-host interactions. Published by Elsevier B.V.
Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice; Grasland, Béatrice; Frossard, Jean-Pierre; Dastjerdi, Akbar; Hulst, Marcel; Hanke, Dennis; Pohlmann, Anne; Blome, Sandra; van der Poel, Wim H. M.; Steinbach, Falko; Blanchard, Yannick; Lavazza, Antonio; Bøtner, Anette
2018-01-01
Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence (Accession number AF353511) but they are >99% identical. Unique and shared differences between sequences were identified. The coding region for the surface-exposed spike protein showed the highest proportion of variability including both point mutations and small deletions. The predicted expression of the ORF3 gene product was more dramatically affected in three different variants of this virus through either loss of the initiation codon or gain of a premature termination codon. The genome of one isolate had a substantially rearranged 5´-terminal sequence. This rearrangement was validated through the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies. PMID:29494671
Bystry, Vojtech; Agathangelidis, Andreas; Bikos, Vasilis; Sutton, Lesley Ann; Baliakas, Panagiotis; Hadzidimitriou, Anastasia; Stamatopoulos, Kostas; Darzentas, Nikos
2015-12-01
An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ∼30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ nikos.darzentas@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Willis, Leslie G.; Siepp, Robyn; Stewart, Taryn M.
2005-08-01
The genome of the Trichoplusia ni single nucleopolyhedrovirus (TnSNPV), a group II NPV which infects the cabbage looper (T. ni), has been completely sequenced and analyzed. The TnSNPV DNA genome consists of 134,394 bp and has an overall G + C content of 39%. Gene analysis predicted 144 open reading frames (ORFs) of 150 nucleotides or greater that showed minimal overlap. Comparisons with previously sequenced baculoviruses indicate that 119 TnSNPV ORFs were homologues of previously reported viral gene sequences. Ninety-four TnSNPV ORFs returned an Autographa californica multiple NPV (AcMNPV) homologue while 25 ORFs returned poor or no sequence matches withmore » the current databases. A putative photolyase gene was also identified that had highest amino acid identity to the photolyase genes of Chrysodeixis chalcites NPV (ChchNPV) (47%) and Danio rerio (zebrafish) (40%). In addition unlike all other baculoviruses no obvious homologous repeat (hr) sequences were identified. Comparison of the TnSNPV and AcMNPV genomes provides a unique opportunity to examine two baculoviruses that are highly virulent for a common insect host (T. ni) yet belong to diverse baculovirus taxonomic groups and possess distinct biological features. In vitro fusion assays demonstrated that the TnSNPV F protein induces membrane fusion and syncytia formation and were compared to syncytia formed by AcMNPV GP64.« less
Influence of time and length size feature selections for human activity sequences recognition.
Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran
2014-01-01
In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.
Mutations in RSPH1 cause primary ciliary dyskinesia with a unique clinical and ciliary phenotype.
Knowles, Michael R; Ostrowski, Lawrence E; Leigh, Margaret W; Sears, Patrick R; Davis, Stephanie D; Wolf, Whitney E; Hazucha, Milan J; Carson, Johnny L; Olivier, Kenneth N; Sagel, Scott D; Rosenfeld, Margaret; Ferkol, Thomas W; Dell, Sharon D; Milla, Carlos E; Randell, Scott H; Yin, Weining; Sannuti, Aruna; Metjian, Hilda M; Noone, Peadar G; Noone, Peter J; Olson, Christina A; Patrone, Michael V; Dang, Hong; Lee, Hye-Seung; Hurd, Toby W; Gee, Heon Yung; Otto, Edgar A; Halbritter, Jan; Kohl, Stefan; Kircher, Martin; Krischer, Jeffrey; Bamshad, Michael J; Nickerson, Deborah A; Hildebrandt, Friedhelm; Shendure, Jay; Zariwala, Maimoona A
2014-03-15
Primary ciliary dyskinesia (PCD) is a genetically heterogeneous recessive disorder of motile cilia, but the genetic cause is not defined for all patients with PCD. To identify disease-causing mutations in novel genes, we performed exome sequencing, follow-up characterization, mutation scanning, and genotype-phenotype studies in patients with PCD. Whole-exome sequencing was performed using NimbleGen capture and Illumina HiSeq sequencing. Sanger-based sequencing was used for mutation scanning, validation, and segregation analysis. We performed exome sequencing on an affected sib-pair with normal ultrastructure in more than 85% of cilia. A homozygous splice-site mutation was detected in RSPH1 in both siblings; parents were carriers. Screening RSPH1 in 413 unrelated probands, including 325 with PCD and 88 with idiopathic bronchiectasis, revealed biallelic loss-of-function mutations in nine additional probands. Five affected siblings of probands in RSPH1 families harbored the familial mutations. The 16 individuals with RSPH1 mutations had some features of PCD; however, nasal nitric oxide levels were higher than in patients with PCD with other gene mutations (98.3 vs. 20.7 nl/min; P < 0.0003). Additionally, individuals with RSPH1 mutations had a lower prevalence (8 of 16) of neonatal respiratory distress, and later onset of daily wet cough than typical for PCD, and better lung function (FEV1), compared with 75 age- and sex-matched PCD cases (73.0 vs. 61.8, FEV1 % predicted; P = 0.043). Cilia from individuals with RSPH1 mutations had normal beat frequency (6.1 ± Hz at 25°C), but an abnormal, circular beat pattern. The milder clinical disease and higher nasal nitric oxide in individuals with biallelic mutations in RSPH1 provides evidence of a unique genotype-phenotype relationship in PCD, and suggests that mutations in RSPH1 may be associated with residual ciliary function.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Yanfeng; Zheng, Yi; Qin, Ling
Beta-hydroxyacid dehydrogenase (β-HAD) genes have been identified in all sequenced genomes of eukaryotes and prokaryotes. Their gene products catalyze the NAD+- or NADP+-dependent oxidation of various β-hydroxy acid substrates into their corresponding semialdehyde. In many fungal and bacterial genomes, multiple β-HAD genes are observed leading to the hypothesis that these gene products may have unique, uncharacterized metabolic roles specific to their species. The genomes of Geobacter sulfurreducens and Geobacter metallireducens each contain two potential β-HAD genes. The protein sequences of one pair of these genes, Gs-βHAD (Q74DE4) and Gm-βHAD (Q39R98), have 65% sequence identity and 77% sequence similarity with eachmore » other. Both proteins reduce succinic semialdehyde, a metabolite of the GABA shunt. To further explore the structural and functional characteristics of these two β-HADs with a potentially unique substrate specificity, crystal structures for Gs-βHAD and Gm-βHAD in complex with NADP+ were determined to a resolution of 1.89 Å and 2.07 Å, respectively. The structure of both proteins are similar, composed of 14 α-helices and nine β-strands organized into two domains. Domain One (1-165) adopts a typical Rossmann fold composed of two α/β units: a six-strand parallel β-sheet surrounded by six α-helices (α1 – α6) followed by a mixed three-strand β-sheet surrounded by two α-helices (α7 and α8). Domain Two (166-287) is composed of a bundle of seven α-helices (α9 – α14). Four functional regions conserved in all β-HADs are spatially located near each other at the interdomain cleft in both Gs-βHAD and Gm-βHAD with a buried molecule of NADP+. The structural features of Gs-βHAD and Gm-βHAD are described in relation to the four conserved consensus sequences characteristic of β-HADs and the potential biochemical importance of these enzymes as an alternative pathway for the degradation of succinic semialdehyde.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Doyle, C Kuyler; Lykidis, A
2006-01-01
Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, {alpha}-proteobacterium, is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, 17 putative pseudogenes, and a substantial proportion of noncoding sequence (27%). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences and a unique serine-threonine bias associated with the potential for O glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associatedmore » with immune evasion were identified, one of which contains poly(G-C) tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Genes associated with pathogen-host interactions were identified, including a small group encoding proteins (n = 12) with tandem repeats and another group encoding proteins with eukaryote-like ankyrin domains (n = 7).« less
Characterization of a novel variant of Mycobacterium chimaera.
van Ingen, J; Hoefsloot, W; Buijtels, P C A M; Tortoli, E; Supply, P; Dekhuijzen, P N R; Boeree, M J; van Soolingen, D
2012-09-01
In this study, nonchromogenic mycobacteria were isolated from pulmonary samples of three patients in the Netherlands. All isolates had identical, unique 16S rRNA gene and 16S-23S ITS sequences, which were closely related to those of Mycobacterium chimaera and Mycobacterium marseillense. The biochemical features of the isolates differed slightly from those of M. chimaera, suggesting that the isolates may represent a possible separate species within the Mycobacterium avium complex (MAC). However, the cell-wall mycolic acid pattern, analysed by HPLC, and the partial sequences of the hsp65 and rpoB genes were identical to those of M. chimaera. We concluded that the isolates represent a novel variant of M. chimaera. The results of this analysis have led us to question the currently used methods of species definition for members of the genus Mycobacterium, which are based largely on 16S rRNA or rpoB gene sequencing. Definitions based on a single genetic target are likely to be insufficient. Genetic divergence, especially in the MAC, yields strains that cannot be confidently assigned to a specific species based on the analysis of a single genetic target.
Genomic Diversity in the Endosymbiotic Bacterium Rhizobium leguminosarum.
Sánchez-Cañizares, Carmen; Jorrín, Beatriz; Durán, David; Nadendla, Suvarna; Albareda, Marta; Rubio-Sanz, Laura; Lanza, Mónica; González-Guerrero, Manuel; Prieto, Rosa Isabel; Brito, Belén; Giglio, Michelle G; Rey, Luis; Ruiz-Argüeso, Tomás; Palacios, José M; Imperial, Juan
2018-01-24
Rhizobium leguminosarum bv. viciae is a soil α-proteobacterium that establishes a diazotrophic symbiosis with different legumes of the Fabeae tribe. The number of genome sequences from rhizobial strains available in public databases is constantly increasing, although complete, fully annotated genome structures from rhizobial genomes are scarce. In this work, we report and analyse the complete genome of R. leguminosarum bv. viciae UPM791. Whole genome sequencing can provide new insights into the genetic features contributing to symbiotically relevant processes such as bacterial adaptation to the rhizosphere, mechanisms for efficient competition with other bacteria, and the ability to establish a complex signalling dialogue with legumes, to enter the root without triggering plant defenses, and, ultimately, to fix nitrogen within the host. Comparison of the complete genome sequences of two strains of R. leguminosarum bv. viciae , 3841 and UPM791, highlights the existence of different symbiotic plasmids and a common core chromosome. Specific genomic traits, such as plasmid content or a distinctive regulation, define differential physiological capabilities of these endosymbionts. Among them, strain UPM791 presents unique adaptations for recycling the hydrogen generated in the nitrogen fixation process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.
2005-09-01
Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein familiesmore » associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).« less
Wang, Qi-Ming; Zhang, Yong-Hong; Wang, Bo; Wang, Long
2016-01-04
Two new species isolated from plant leaves belonging to Talaromyces section Talaromyces are reported, namely T. neofusisporus (ex-type AS3.15415 (T) = CBS 139516 (T)) and T. qii (ex-type AS3.15414 (T) = CBS 139515 (T)). Morphologically, T. neofusisporus is featured by forming synnemata on CYA and YES, bearing appressed biverticillate penicilli and smooth-walled fusiform conidia about 3.5-4.5 × 2-2.5 μm; and T. qii is characterized by velutinous colony texture, yellowish green conidia, yellow mycelium and ovoid to subglobose echinulate conidia measuring 3-3.5 μm. Phylogenetically, T. neofusisporus is such a unique species that no close relatives are found according to CaM, BenA and ITS1-5.8S-ITS2 as well as the combined three-gene sequences; and T. qii is related to T. thailandensis according to CaM, BenA and the combined sequence matrices, whereas ITS1-5.8S-ITS2 sequences do not support the close relationship between T. qii and T. thailandensis.
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Chien -Chi; Chain, Patrick S. G.
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Processing Dynamic Image Sequences from a Moving Sensor.
1984-02-01
65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign
Li, Minghui; Goncearenco, Alexander; Panchenko, Anna R
2017-01-01
In this review we describe a protocol to annotate the effects of missense mutations on proteins, their functions, stability, and binding. For this purpose we present a collection of the most comprehensive databases which store different types of sequencing data on missense mutations, we discuss their relationships, possible intersections, and unique features. Next, we suggest an annotation workflow using the state-of-the art methods and highlight their usability, advantages, and limitations for different cases. Finally, we address a particularly difficult problem of deciphering the molecular mechanisms of mutations on proteins and protein complexes to understand the origins and mechanisms of diseases.
The Traffic Management Advisor
NASA Technical Reports Server (NTRS)
Nedell, William; Erzberger, Heinz; Neuman, Frank
1990-01-01
The traffic management advisor (TMA) is comprised of algorithms, a graphical interface, and interactive tools for controlling the flow of air traffic into the terminal area. The primary algorithm incorporated in it is a real-time scheduler which generates efficient landing sequences and landing times for arrivals within about 200 n.m. from touchdown. A unique feature of the TMA is its graphical interface that allows the traffic manager to modify the computer-generated schedules for specific aircraft while allowing the automatic scheduler to continue generating schedules for all other aircraft. The graphical interface also provides convenient methods for monitoring the traffic flow and changing scheduling parameters during real-time operation.
Recent Progress in Aptamer-Based Functional Probes for Bioanalysis and Biomedicine.
Zhang, Huimin; Zhou, Leiji; Zhu, Zhi; Yang, Chaoyong
2016-07-11
Nucleic acid aptamers are short synthetic DNA or RNA sequences that can bind to a wide range of targets with high affinity and specificity. In recent years, aptamers have attracted increasing research interest due to their unique features of high binding affinity and specificity, small size, excellent chemical stability, easy chemical synthesis, facile modification, and minimal immunogenicity. These properties make aptamers ideal recognition ligands for bioanalysis, disease diagnosis, and cancer therapy. This review highlights the recent progress in aptamer selection and the latest applications of aptamer-based functional probes in the fields of bioanalysis and biomedicine. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Characterization of a new apscaviroid from American persimmon.
Ito, Takao; Suzaki, Koichi; Nakano, Masaaki; Sato, Akihiko
2013-12-01
A unique circular molecule of 358 nucleotides was detected in American persimmon (Diospyros virginiana L.). The molecule was graft-transmissible and had genetic characteristics of members of the genus Apscaviroid. It had the highest sequence similarity (72-73 %) to citrus viroid VI (CVd-VI) and formed a clade with CVd-VI, citrus dwarfing viroid, and apple dimple fruit viroid in a phylogenetic tree. The molecule was not detected in citrus, unlike CVd-VI, which infects citrus and persimmon, and it was genetically distant from persimmon latent viroid, which infects persimmon only. The genetic and biological features indicated that the molecule may be a member of a new apscaviroid species.
Hong, Jungeui; Gresham, David
2017-11-01
Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Partial bisulfite conversion for unique template sequencing
Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael
2018-01-01
Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sattison, M.B.; Blackman, H.S.; Novack, S.D.
The Office for Analysis and Evaluation of Operational Data (AEOD) has sought the assistance of the Idaho National Engineering Laboratory (INEL) to make some significant enhancements to the SAPHIRE-based Accident Sequence Precursor (ASP) models recently developed by the INEL. The challenge of this project is to provide the features of a full-scale PRA within the framework of the simplified ASP models. Some of these features include: (1) uncertainty analysis addressing the standard PRA uncertainties and the uncertainties unique to the ASP models and methods, (2) incorporation and proper quantification of individual human actions and the interaction among human actions, (3)more » enhanced treatment of common cause failures, and (4) extension of the ASP models to more closely mimic full-scale PRAs (inclusion of more initiators, explicitly modeling support system failures, etc.). This paper provides an overview of the methods being used to make the above improvements.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sattison, M.B.; Blackman, H.S.; Novack, S.D.
The Office for Analysis and Evaluation of Operational Data (AEOD) has sought the assistance of the Idaho National Engineering Laboratory (INEL) to make some significant enhancements to the SAPHIRE-based Accident Sequence Precursor (ASP) models recently developed by the INEL. The challenge of this project is to provide the features of a full-scale PRA within the framework of the simplified ASP models. Some of these features include: (1) uncertainty analysis addressing the standard PRA uncertainties and the uncertainties unique to the ASP models and methodology, (2) incorporation and proper quantification of individual human actions and the interaction among human actions, (3)more » enhanced treatment of common cause failures, and (4) extension of the ASP models to more closely mimic full-scale PRAs (inclusion of more initiators, explicitly modeling support system failures, etc.). This paper provides an overview of the methods being used to make the above improvements.« less
Shibata, Mami; Mekuchi, Miyuki; Mori, Kazuki; Muta, Shigeru; Chowdhury, Vishwajit Sur; Nakamura, Yoji; Ojima, Nobuhiko; Saitoh, Kenji; Kobayashi, Takanori; Wada, Tokio; Inouye, Kiyoshi; Kuhara, Satoru; Tashiro, Kosuke
2016-06-01
Bluefin tuna are high-performance swimmers and top predators in the open ocean. Their swimming is grounded by unique features including an exceptional glycolytic potential in white muscle, which is supported by high enzymatic activities. Here we performed high-throughput RNA sequencing (RNA-Seq) in muscles of the Pacific bluefin tuna (Thunnus orientalis) and Pacific cod (Gadus macrocephalus) and conducted a comparative transcriptomic analysis of genes related to energy production. We found that the total expression of glycolytic genes was much higher in the white muscle of tuna than in the other muscles, and that the expression of only six genes for glycolytic enzymes accounted for 83.4% of the total. These expression patterns were in good agreement with the patterns of enzyme activity previously reported. The findings suggest that the mRNA expression of glycolytic genes may contribute directly to the enzymatic activities in the muscles of tuna.
Carlos, Camila; Pereira, Letícia Bianca; Ottoboni, Laura Maria Mariscal
2017-06-01
One of the main goals of coral microbiology is to understand the ways in which coral-bacteria associations are established and maintained. This work describes the sequencing of the genome of Paracoccus sp. SM22M-07 isolated from the mucus of the endemic Brazilian coral species Mussismilia hispida. Comparative analysis was used to identify unique genomic features of SM22M-07 that might be involved in its adaptation to the marine ecosystem and the nutrient-rich environment provided by coral mucus, as well as in the establishment and strengthening of the interaction with the host. These features included genes related to the type IV protein secretion system, erythritol catabolism, and succinoglycan biosynthesis. We experimentally confirmed the production of succinoglycan by Paracoccus sp. SM22M-07 and we hypothesize that it may be involved in the association of the bacterium with coral surfaces.
Exon–intron organization of genes in the slime mold Physarum polycephalum
Trzcinska-Danielewicz, Joanna; Fronk, Jan
2000-01-01
The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon–intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon–intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon–intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3′-ends. PMID:10982858
SURVEY AND SUMMARY: exon-intron organization of genes in the slime mold Physarum polycephalum.
Trzcinska-Danielewicz, J; Fronk, J
2000-09-15
The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon-intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon-intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon-intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3'-ends.
Black Toenail Sign in MELAS Syndrome.
Whitehead, Matthew T; Wien, Michael; Lee, Bonmyong; Bass, Nancy; Gropman, Andrea
2017-10-01
Mitochondrial encephalopathy with lactic acidosis and stroke-like episodes (MELAS) syndrome is a mitochondrial disorder often causing progressive brain injury that is not confined to large arterial territories. Severe insults ultimately lead to gyral necrosis affecting the cortex and juxtacortical white matter; the neuroimaging correlate is partial gyral signal suppression on T2/FLAIR sequences that resemble black toenails. We aimed to characterize the imaging features and the natural history of MELAS-related gyral necrosis. Databases at two children's hospitals were searched for brain magnetic resonance imaging studies of individuals with MELAS. Examinations with motion artifact and those lacking T2/FLAIR sequences were excluded. The location, the cumulative number, and the maximum transverse diameter of necrotic gyral lesions were assessed using T2-weighted images and T2/FLAIR sequences. Wilcoxon signed-rank test was employed to evaluate the relationship between disease duration and the number of necrotic lesions. One hundred twenty-four examinations from patients with 14 unique MELAS patients (16 ± 3 years) were evaluated. Six of the eight patients who developed brain lesions also developed gyral necroses (mean 13, range 0 to 44). Necrotic lesions varied in maximal diameter from 4 to 25 mm. Cumulative necrotic lesions correlated with disease duration (P < 0.001). The black toenail sign signifying gyral necrosis is a common imaging feature in individuals with MELAS syndrome. The extent of gyral necrosis correlates with disease duration. Copyright © 2017 Elsevier Inc. All rights reserved.
Lin, Hao; Deng, En-Ze; Ding, Hui; Chen, Wei; Chou, Kuo-Chen
2014-01-01
The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called ‘iPro54-PseKNC’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k-tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC. For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the σ54 promoters. PMID:25361964
Generation of a Maize B Centromere Minimal Map Containing the Central Core Domain.
Ellis, Nathanael A; Douglas, Ryan N; Jackson, Caroline E; Birchler, James A; Dawe, R Kelly
2015-10-28
The maize B centromere has been used as a model for centromere epigenetics and as the basis for building artificial chromosomes. However, there are no sequence resources for this important centromere. Here we used transposon display for the centromere-specific retroelement CRM2 to identify a collection of 40 sequence tags that flank CRM2 insertion points on the B chromosome. These were confirmed to lie within the centromere by assaying deletion breakpoints from centromere misdivision derivatives (intracentromere breakages caused by centromere fission). Markers were grouped together on the basis of their association with other markers in the misdivision series and assembled into a pseudocontig containing 10.1 kb of sequence. To identify sequences that interact directly with centromere proteins, we carried out chromatin immunoprecipitation using antibodies to centromeric histone H3 (CENH3), a defining feature of functional centromeric sequences. The CENH3 chromatin immunoprecipitation map was interpreted relative to the known transmission rates of centromere misdivision derivatives to identify a centromere core domain spanning 33 markers. A subset of seven markers was mapped in additional B centromere misdivision derivatives with the use of unique primer pairs. A derivative previously shown to have no canonical centromere sequences (Telo3-3) lacks these core markers. Our results provide a molecular map of the B chromosome centromere and identify key sequences within the map that interact directly with centromeric histone H3. Copyright © 2015 Ellis et al.
Target Site Recognition by a Diversity-Generating Retroelement
Guo, Huatao; Tse, Longping V.; Nieh, Angela W.; Czornyj, Elizabeth; Williams, Steven; Oukil, Sabrina; Liu, Vincent B.; Miller, Jeff F.
2011-01-01
Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification. PMID:22194701
Saini, Vikram; Raghuvanshi, Saurabh; Khurana, Jitendra P.; Ahmed, Niyaz; Hasnain, Seyed E.; Tyagi, Akhilesh K.; Tyagi, Anil K.
2012-01-01
Understanding the evolutionary and genomic mechanisms responsible for turning the soil-derived saprophytic mycobacteria into lethal intracellular pathogens is a critical step towards the development of strategies for the control of mycobacterial diseases. In this context, Mycobacterium indicus pranii (MIP) is of specific interest because of its unique immunological and evolutionary significance. Evolutionarily, it is the progenitor of opportunistic pathogens belonging to M. avium complex and is endowed with features that place it between saprophytic and pathogenic species. Herein, we have sequenced the complete MIP genome to understand its unique life style, basis of immunomodulation and habitat diversification in mycobacteria. As a case of massive gene acquisitions, 50.5% of MIP open reading frames (ORFs) are laterally acquired. We show, for the first time for Mycobacterium, that MIP genome has mosaic architecture. These gene acquisitions have led to the enrichment of selected gene families critical to MIP physiology. Comparative genomic analysis indicates a higher antigenic potential of MIP imparting it a unique ability for immunomodulation. Besides, it also suggests an important role of genomic fluidity in habitat diversification within mycobacteria and provides a unique view of evolutionary divergence and putative bottlenecks that might have eventually led to intracellular survival and pathogenic attributes in mycobacteria. PMID:22965120
Wu, Guangxi; Zhao, He; Li, Chenhao; Rajapakse, Menaka Priyadarsani; Wong, Wing Cheong; Xu, Jun; Saunders, Charles W.; Reeder, Nancy L.; Reilman, Raymond A.; Scheynius, Annika; Sun, Sheng; Billmyre, Blake Robert; Li, Wenjun; Averette, Anna Floyd; Mieczkowski, Piotr; Heitman, Joseph; Theelen, Bart; Schröder, Markus S.; De Sessions, Paola Florez; Butler, Geraldine; Maurer-Stroh, Sebastian; Boekhout, Teun; Nagarajan, Niranjan; Dawson, Thomas L.
2015-01-01
Malassezia is a unique lipophilic genus in class Malasseziomycetes in Ustilaginomycotina, (Basidiomycota, fungi) that otherwise consists almost exclusively of plant pathogens. Malassezia are typically isolated from warm-blooded animals, are dominant members of the human skin mycobiome and are associated with common skin disorders. To characterize the genetic basis of the unique phenotypes of Malassezia spp., we sequenced the genomes of all 14 accepted species and used comparative genomics against a broad panel of fungal genomes to comprehensively identify distinct features that define the Malassezia gene repertoire: gene gain and loss; selection signatures; and lineage-specific gene family expansions. Our analysis revealed key gene gain events (64) with a single gene conserved across all Malassezia but absent in all other sequenced Basidiomycota. These likely horizontally transferred genes provide intriguing gain-of-function events and prime candidates to explain the emergence of Malassezia. A larger set of genes (741) were lost, with enrichment for glycosyl hydrolases and carbohydrate metabolism, concordant with adaptation to skin’s carbohydrate-deficient environment. Gene family analysis revealed extensive turnover and underlined the importance of secretory lipases, phospholipases, aspartyl proteases, and other peptidases. Combining genomic analysis with a re-evaluation of culture characteristics, we establish the likely lipid-dependence of all Malassezia. Our phylogenetic analysis sheds new light on the relationship between Malassezia and other members of Ustilaginomycotina, as well as phylogenetic lineages within the genus. Overall, our study provides a unique genomic resource for understanding Malassezia niche-specificity and potential virulence, as well as their abundance and distribution in the environment and on human skin. PMID:26539826
Wu, Guangxi; Zhao, He; Li, Chenhao; Rajapakse, Menaka Priyadarsani; Wong, Wing Cheong; Xu, Jun; Saunders, Charles W; Reeder, Nancy L; Reilman, Raymond A; Scheynius, Annika; Sun, Sheng; Billmyre, Blake Robert; Li, Wenjun; Averette, Anna Floyd; Mieczkowski, Piotr; Heitman, Joseph; Theelen, Bart; Schröder, Markus S; De Sessions, Paola Florez; Butler, Geraldine; Maurer-Stroh, Sebastian; Boekhout, Teun; Nagarajan, Niranjan; Dawson, Thomas L
2015-11-01
Malassezia is a unique lipophilic genus in class Malasseziomycetes in Ustilaginomycotina, (Basidiomycota, fungi) that otherwise consists almost exclusively of plant pathogens. Malassezia are typically isolated from warm-blooded animals, are dominant members of the human skin mycobiome and are associated with common skin disorders. To characterize the genetic basis of the unique phenotypes of Malassezia spp., we sequenced the genomes of all 14 accepted species and used comparative genomics against a broad panel of fungal genomes to comprehensively identify distinct features that define the Malassezia gene repertoire: gene gain and loss; selection signatures; and lineage-specific gene family expansions. Our analysis revealed key gene gain events (64) with a single gene conserved across all Malassezia but absent in all other sequenced Basidiomycota. These likely horizontally transferred genes provide intriguing gain-of-function events and prime candidates to explain the emergence of Malassezia. A larger set of genes (741) were lost, with enrichment for glycosyl hydrolases and carbohydrate metabolism, concordant with adaptation to skin's carbohydrate-deficient environment. Gene family analysis revealed extensive turnover and underlined the importance of secretory lipases, phospholipases, aspartyl proteases, and other peptidases. Combining genomic analysis with a re-evaluation of culture characteristics, we establish the likely lipid-dependence of all Malassezia. Our phylogenetic analysis sheds new light on the relationship between Malassezia and other members of Ustilaginomycotina, as well as phylogenetic lineages within the genus. Overall, our study provides a unique genomic resource for understanding Malassezia niche-specificity and potential virulence, as well as their abundance and distribution in the environment and on human skin.
Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping
2007-01-01
Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730
Pilgrim, Jack; Ander, Mats; Garros, Claire; Baylis, Matthew; Hurst, Gregory D. D.
2017-01-01
Summary There is increasing interest in the heritable bacteria of invertebrate vectors of disease as they present novel targets for control initiatives. Previous studies on biting midges (Culicoides spp.), known to transmit several RNA viruses of veterinary importance, have revealed infections with the endosymbiotic bacteria, Wolbachia and Cardinium. However, rickettsial symbionts in these vectors are underexplored. Here, we present the genome of a previously uncharacterized Rickettsia endosymbiont from Culicoides newsteadi (RiCNE). This genome presents unique features potentially associated with host invasion and adaptation, including genes for the complete non‐oxidative phase of the pentose phosphate pathway, and others predicted to mediate lipopolysaccharides and cell wall modification. Screening of 414 Culicoides individuals from 29 Palearctic or Afrotropical species revealed that Rickettsia represent a widespread but previously overlooked association, reaching high frequencies in midge populations and present in 38% of the species tested. Sequence typing clusters the Rickettsia within the Torix group of the genus, a group known to infect several aquatic and hematophagous taxa. FISH analysis indicated the presence of Rickettsia bacteria in ovary tissue, indicating their maternal inheritance. Given the importance of biting midges as vectors, a key area of future research is to establish the impact of this endosymbiont on vector competence. PMID:28805302
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.
Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V
2016-01-01
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals
Popova, Olga V.; Mikhailov, Kirill V.; Nikitin, Mikhail A.; Logacheva, Maria D.; Penin, Aleksey A.; Muntyan, Maria S.; Kedrova, Olga S.; Petrov, Nikolai B.; Panchin, Yuri V.
2016-01-01
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha—an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia. PMID:27755612
Genome Features of “Dark-Fly”, a Drosophila Line Reared Long-Term in a Dark Environment
Zhou, Jun; Sugiyama, Yuzo; Nishimura, Osamu; Aizu, Tomoyuki; Toyoda, Atsushi; Fujiyama, Asao; Agata, Kiyokazu
2012-01-01
Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed “Dark-fly”, which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation. PMID:22432011
Zhang, Bo; Zhang, Yan-Hong; Wang, Xin; Zhang, Hui-Xian; Lin, Qiang
2017-07-01
The deep sea is one of the most extensive ecosystems on earth. Organisms living there survive in an extremely harsh environment, and their mitochondrial energy metabolism might be a result of evolution. As one of the most important organelles, mitochondria generate energy through energy metabolism and play an important role in almost all biological activities. In this study, the mitogenome of a deep-sea sea anemone ( Bolocera sp.) was sequenced and characterized. Like other metazoans, it contained 13 energy pathway protein-coding genes and two ribosomal RNAs. However, it also exhibited some unique features: just two transfer RNA genes, two group I introns, two transposon-like noncanonical open reading frames (ORFs), and a control region-like (CR-like) element. All of the mitochondrial genes were coded by the same strand (the H-strand). The genetic order and orientation were identical to those of most sequenced actiniarians. Phylogenetic analyses showed that this species was closely related to Bolocera tuediae . Positive selection analysis showed that three residues (31 L and 42 N in ATP6 , 570 S in ND5 ) of Bolocera sp. were positively selected sites. By comparing these features with those of shallow sea anemone species, we deduced that these novel gene features may influence the activity of mitochondrial genes. This study may provide some clues regarding the adaptation of Bolocera sp. to the deep-sea environment.
Diaz, MA; Bik, EM; Carlin, KP; Venn-Watson, SK; Jensen, ED; Jones, SE; Gaston, EP; Relman, DA; Versalovic, J
2013-01-01
Aims In order to develop complementary health management strategies for marine mammals, we used culture-based and culture-independent approaches to identify gastrointestinal lactobacilli of the common bottlenose dolphin, Tursiops truncatus. Methods and Results We screened 307 bacterial isolates from oral and rectal swabs, milk and gastric fluid, collected from 38 dolphins in the U.S. Navy Marine Mammal Program, for potentially beneficial features. We focused our search on lactobacilli and evaluated their ability to modulate TNF secretion by host cells and inhibit growth of pathogens. We recovered Lactobacillus salivarius strains which secreted factors that stimulated TNF production by human monocytoid cells. These Lact. salivarius isolates inhibited growth of selected marine mammal and human bacterial pathogens. In addition, we identified a novel Lactobacillus species by culture and direct sequencing with 96·3% 16S rDNA sequence similarity to Lactobacillus ceti. Conclusions Dolphin-derived Lact. salivarius isolates possess features making them candidate probiotics for clinical studies in marine mammals. Significance and Impact of the Study This is the first study to isolate lactobacilli from dolphins, including a novel Lactobacillus species and a new strain of Lact. salivarius, with potential for veterinary probiotic applications. The isolation and identification of novel Lactobacillus spp. and other indigenous microbes from bottlenose dolphins will enable the study of the biology of symbiotic members of the dolphin microbiota and facilitate the understanding of the microbiomes of these unique animals. PMID:23855505
Boomsma, Wouter; Nielsen, Sofie V; Lindorff-Larsen, Kresten; Hartmann-Petersen, Rasmus; Ellgaard, Lars
2016-01-01
The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is not a unique case, and that several other yeast and human E3 ligases have sequence properties that may allow them to recognize substrates by a similar mechanism as San1.
Genome Sequence of a Canadian Vibrio parahaemolyticus Isolate with Unique Mobilizing Capacity.
Bioteau, Audrey; Huguet, Kévin; Burrus, Vincent; Banerjee, Swapan
2018-06-14
Vibrio parahaemolyticus is a clinically significant marine bacterium implicated in gastroenteritis among consumers of raw or undercooked seafood. This report presents the whole-genome sequence of a unique strain of V. parahaemolyticus isolated from oysters harvested in Canada. © Crown copyright 2018.
Wiebe, Nicholas J P; Meyer, Irmtraud M
2010-06-24
The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment.
Rapid evaluation and quality control of next generation sequencing data with FaQCs
Lo, Chien -Chi; Chain, Patrick S. G.
2014-12-01
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie
2017-05-15
B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.
Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen
2015-04-15
In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
USDA-ARS?s Scientific Manuscript database
Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...
Jeong, Hyeonsoo; Kim, Kwondo; Caetano-Anollés, Kelsey; Kim, Heebal; Kim, Byung-Ki; Yi, Jun-Koo; Ha, Jae-Jung; Cho, Seoae; Oh, Dong Yep
2016-05-24
Chicken, Gallus gallus, is a valuable species both as a food source and as a model organism for scientific research. Here, we sequenced the genome of Gyeongbuk Araucana, a rare chicken breed with unique phenotypic characteristics including flight ability, large body size, and laying blue-shelled eggs, to identify its genomic features. We generated genomes of Gyeongbuk Araucana, Leghorn, and Korean Native Chicken at a total of 33.5, 35.82, and 33.23 coverage depth, respectively. Along with the genomes of 12 Chinese breeds, we identified genomic variants of 16.3 million SNVs and 2.3 million InDels in mapped regions. Additionally, through assembly of unmapped reads and selective sweep, we identified candidate genes that fall into heart, vasculature and muscle development and body growth categories, which provided insight into Gyeongbuk Araucana's phenotypic traits. Finally, genetic variation based on the transposable element insertion pattern was investigated to elucidate the features of transposable elements related to blue egg shell formation. This study presents results of the first genomic study on the Gyeongbuk Araucana breed; it has potential to serve as an invaluable resource for future research on the genomic characteristics of this chicken breed as well as others.
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
Herget, Stephan; Toukach, Philip V; Ranzinger, René; Hull, William E; Knirel, Yuriy A; von der Lieth, Claus-Wilhelm
2008-01-01
Background There are considerable differences between bacterial and mammalian glycans. In contrast to most eukaryotic carbohydrates, bacterial glycans are often composed of repeating units with diverse functions ranging from structural reinforcement to adhesion, colonization and camouflage. Since bacterial glycans are typically displayed at the cell surface, they can interact with the environment and, therefore, have significant biomedical importance. Results The sequence characteristics of glycans (monosaccharide composition, modifications, and linkage patterns) for the higher bacterial taxonomic classes have been examined and compared with the data for mammals, with both similarities and unique features becoming evident. Compared to mammalian glycans, the bacterial glycans deposited in the current databases have a more than ten-fold greater diversity at the monosaccharide level, and the disaccharide pattern space is approximately nine times larger. Specific bacterial subclasses exhibit characteristic glycans which can be distinguished on the basis of distinctive structural features or sequence properties. Conclusion For the first time a systematic database analysis of the bacterial glycome has been performed. This study summarizes the current knowledge of bacterial glycan architecture and diversity and reveals putative targets for the rational design and development of therapeutic intervention strategies by comparing bacterial and mammalian glycans. PMID:18694500
Defining the healthy "core microbiome" of oral microbial communities
2009-01-01
Background Most studies examining the commensal human oral microbiome are focused on disease or are limited in methodology. In order to diagnose and treat diseases at an early and reversible stage an in-depth definition of health is indispensible. The aim of this study therefore was to define the healthy oral microbiome using recent advances in sequencing technology (454 pyrosequencing). Results We sampled and sequenced microbiomes from several intraoral niches (dental surfaces, cheek, hard palate, tongue and saliva) in three healthy individuals. Within an individual oral cavity, we found over 3600 unique sequences, over 500 different OTUs or "species-level" phylotypes (sequences that clustered at 3% genetic difference) and 88 - 104 higher taxa (genus or more inclusive taxon). The predominant taxa belonged to Firmicutes (genus Streptococcus, family Veillonellaceae, genus Granulicatella), Proteobacteria (genus Neisseria, Haemophilus), Actinobacteria (genus Corynebacterium, Rothia, Actinomyces), Bacteroidetes (genus Prevotella, Capnocytophaga, Porphyromonas) and Fusobacteria (genus Fusobacterium). Each individual sample harboured on average 266 "species-level" phylotypes (SD 67; range 123 - 326) with cheek samples being the least diverse and the dental samples from approximal surfaces showing the highest diversity. Principal component analysis discriminated the profiles of the samples originating from shedding surfaces (mucosa of tongue, cheek and palate) from the samples that were obtained from solid surfaces (teeth). There was a large overlap in the higher taxa, "species-level" phylotypes and unique sequences among the three microbiomes: 84% of the higher taxa, 75% of the OTUs and 65% of the unique sequences were present in at least two of the three microbiomes. The three individuals shared 1660 of 6315 unique sequences. These 1660 sequences (the "core microbiome") contributed 66% of the reads. The overlapping OTUs contributed to 94% of the reads, while nearly all reads (99.8%) belonged to the shared higher taxa. Conclusions We obtained the first insight into the diversity and uniqueness of individual oral microbiomes at a resolution of next-generation sequencing. We showed that a major proportion of bacterial sequences of unrelated healthy individuals is identical, supporting the concept of a core microbiome at health. PMID:20003481
Cho, Young Sun; Choi, Buyl Nim; Ha, En-Mi; Kim, Ki Hong; Kim, Sung Koo; Kim, Dong Soo; Nam, Yoon Kwon
2005-01-01
Novel metallothionein (MT) complementary DNA and genomic sequences were isolated from a cartilaginous shark species, Scyliorhinus torazame. The full-length open reading frame (ORF) of shark MT cDNA encoded 68 amino acids with a high cysteine content (29%). The genomic ORF sequence (932 bp) of shark MT isolated by polymerase chain reaction (PCR) comprised 3 exons with 2 interventing introns. Shark MT sequence shared many conserved features with other vertebrate MTs: overall amino acid identities of shark MT ranged from 47% to 57% with fish MTs, and 41% to 62% with mammalian MTs. However, in addition to these conserved characteristics, shark MT sequence exhibited some unique characteristics. It contained 4 extra amino acids (Lys-Ala-Gly-Arg) at the end of the beta-domain, which have not been reported in any other vertebrate MTs. The last amino acid residue at the C-terminus was Ser, which also has not been reported in fish and mammalian MTs. The MT messenger RNA levels in shark liver and kidney, assessed by semiquantitative reverse transcriptase PCR and RNA blot hybridization, were significantly affected by experimental exposures to heavy metals (cadmium, copper, and zinc). Generally, the transcriptional activation of shark MT gene was dependent on the dose (0-10 mg/kg body weight for injection and 0-20 microM for immersion) and duration (1-10 days); zinc was a more potent inducer than copper and cadmium.
Pleurochrysome: A Web Database of Pleurochrysis Transcripts and Orthologs Among Heterogeneous Algae
Fujiwara, Shoko; Takatsuka, Yukiko; Hirokawa, Yasutaka; Tsuzuki, Mikio; Takano, Tomoyuki; Kobayashi, Masaaki; Suda, Kunihiro; Asamizu, Erika; Yokoyama, Koji; Shibata, Daisuke; Tabata, Satoshi; Yano, Kentaro
2016-01-01
Pleurochrysis is a coccolithophorid genus, which belongs to the Coccolithales in the Haptophyta. The genus has been used extensively for biological research, together with Emiliania in the Isochrysidales, to understand distinctive features between the two coccolithophorid-including orders. However, molecular biological research on Pleurochrysis such as elucidation of the molecular mechanism behind coccolith formation has not made great progress at least in part because of lack of comprehensive gene information. To provide such information to the research community, we built an open web database, the Pleurochrysome (http://bioinf.mind.meiji.ac.jp/phapt/), which currently stores 9,023 unique gene sequences (designated as UNIGENEs) assembled from expressed sequence tag sequences of P. haptonemofera as core information. The UNIGENEs were annotated with gene sequences sharing significant homology, conserved domains, Gene Ontology, KEGG Orthology, predicted subcellular localization, open reading frames and orthologous relationship with genes of 10 other algal species, a cyanobacterium and the yeast Saccharomyces cerevisiae. This sequence and annotation information can be easily accessed via several search functions. Besides fundamental functions such as BLAST and keyword searches, this database also offers search functions to explore orthologous genes in the 12 organisms and to seek novel genes. The Pleurochrysome will promote molecular biological and phylogenetic research on coccolithophorids and other haptophytes by helping scientists mine data from the primary transcriptome of P. haptonemofera. PMID:26746174
Quantum-Sequencing: Fast electronic single DNA molecule sequencing
NASA Astrophysics Data System (ADS)
Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant
2014-03-01
A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.
Unique Features of Ethnic Mongolian Gut Microbiome revealed by metagenomic analysis.
Liu, Wenjun; Zhang, Jiachao; Wu, Chunyan; Cai, Shunfeng; Huang, Weiqiang; Chen, Jing; Xi, Xiaoxia; Liang, Zebin; Hou, Qiangchuan; Zhou, Bing; Qin, Nan; Zhang, Heping
2016-10-06
The human gut microbiota varies considerably among world populations due to a variety of factors including genetic background, diet, cultural habits and socioeconomic status. Here we characterized 110 healthy Mongolian adults gut microbiota by shotgun metagenomic sequencing and compared the intestinal microbiome among Mongolians, the Hans and European cohorts. The results showed that the taxonomic profile of intestinal microbiome among cohorts revealed the Actinobaceria and Bifidobacterium were the key microbes contributing to the differences among Mongolians, the Hans and Europeans at the phylum level and genus level, respectively. Metagenomic species analysis indicated that Faecalibacterium prausnitzii and Coprococcus comeswere enrich in Mongolian people which might contribute to gut health through anti-inflammatory properties and butyrate production, respectively. On the other hand, the enriched genus Collinsella, biomarker in symptomatic atherosclerosis patients, might be associated with the high morbidity of cardiovascular and cerebrovascular diseases in Mongolian adults. At the functional level, a unique microbial metabolic pathway profile was present in Mongolian's gut which mainly distributed in amino acid metabolism, carbohydrate metabolism, energy metabolism, lipid metabolism, glycan biosynthesis and metabolism. We can attribute the specific signatures of Mongolian gut microbiome to their unique genotype, dietary habits and living environment.
Partial bisulfite conversion for unique template sequencing.
Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael; Levy, Dan
2018-01-25
We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Artieri, Carlo G; Fraser, Hunter B
2014-12-01
The recent advent of ribosome profiling-sequencing of short ribosome-bound fragments of mRNA-has offered an unprecedented opportunity to interrogate the sequence features responsible for modulating translational rates. Nevertheless, numerous analyses of the first riboprofiling data set have produced equivocal and often incompatible results. Here we analyze three independent yeast riboprofiling data sets, including two with much higher coverage than previously available, and find that all three show substantial technical sequence biases that confound interpretations of ribosomal occupancy. After accounting for these biases, we find no effect of previously implicated factors on ribosomal pausing. Rather, we find that incorporation of proline, whose unique side-chain stalls peptide synthesis in vitro, also slows the ribosome in vivo. We also reanalyze a method that implicated positively charged amino acids as the major determinant of ribosomal stalling and demonstrate that it produces false signals of stalling in low-coverage data. Our results suggest that any analysis of riboprofiling data should account for sequencing biases and sparse coverage. To this end, we establish a robust methodology that enables analysis of ribosome profiling data without prior assumptions regarding which positions spanned by the ribosome cause stalling. © 2014 Artieri and Fraser; Published by Cold Spring Harbor Laboratory Press.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Paliwoda, Rebecca E; Li, Feng; Reid, Michael S; Lin, Yanwen; Le, X Chris
2014-06-17
Functionalizing nanomaterials for diverse analytical, biomedical, and therapeutic applications requires determination of surface coverage (or density) of DNA on nanomaterials. We describe a sequential strand displacement beacon assay that is able to quantify specific DNA sequences conjugated or coconjugated onto gold nanoparticles (AuNPs). Unlike the conventional fluorescence assay that requires the target DNA to be fluorescently labeled, the sequential strand displacement beacon method is able to quantify multiple unlabeled DNA oligonucleotides using a single (universal) strand displacement beacon. This unique feature is achieved by introducing two short unlabeled DNA probes for each specific DNA sequence and by performing sequential DNA strand displacement reactions. Varying the relative amounts of the specific DNA sequences and spacing DNA sequences during their coconjugation onto AuNPs results in different densities of the specific DNA on AuNP, ranging from 90 to 230 DNA molecules per AuNP. Results obtained from our sequential strand displacement beacon assay are consistent with those obtained from the conventional fluorescence assays. However, labeling of DNA with some fluorescent dyes, e.g., tetramethylrhodamine, alters DNA density on AuNP. The strand displacement strategy overcomes this problem by obviating direct labeling of the target DNA. This method has broad potential to facilitate more efficient design and characterization of novel multifunctional materials for diverse applications.
novPTMenzy: a database for enzymes involved in novel post-translational modifications
Khater, Shradha; Mohanty, Debasisa
2015-01-01
With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459
The Human Transcript Database: A Catalogue of Full Length cDNA Inserts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bouckk John; Michael McLeod; Kim Worley
1999-09-10
The BCM Search Launcher provided improved access to web-based sequence analysis services during the granting period and beyond. The Search Launcher web site grouped analysis procedures by function and provided default parameters that provided reasonable search results for most applications. For instance, most queries were automatically masked for repeat sequences prior to sequence database searches to avoid spurious matches. In addition to the web-based access and arrangements that were made using the functions easier, the BCM Search Launcher provided unique value-added applications like the BEAUTY sequence database search tool that combined information about protein domains and sequence database search resultsmore » to give an enhanced, more complete picture of the reliability and relative value of the information reported. This enhanced search tool made evaluating search results more straight-forward and consistent. Some of the favorite features of the web site are the sequence utilities and the batch client functionality that allows processing of multiple samples from the command line interface. One measure of the success of the BCM Search Launcher is the number of sites that have adopted the models first developed on the site. The graphic display on the BLAST search from the NCBI web site is one such outgrowth, as is the display of protein domain search results within BLAST search results, and the design of the Biology Workbench application. The logs of usage and comments from users confirm the great utility of this resource.« less
Genomic analyses of Clostridium perfringens isolates from five toxinotypes.
Hassan, Karl A; Elbourne, Liam D H; Tetu, Sasha G; Melville, Stephen B; Rood, Julian I; Paulsen, Ian T
2015-05-01
Clostridium perfringens can be isolated from a range of environments, including soil, marine and fresh water sediments, and the gastrointestinal tracts of animals and humans. Some C. perfringens strains have attractive industrial applications, e.g., in the degradation of waste products or the production of useful chemicals. However, C. perfringens has been most studied as the causative agent of a range of enteric and soft tissue infections of varying severities in humans and animals. Host preference and disease type in C. perfringens are intimately linked to the production of key extracellular toxins and on this basis toxigenic C. perfringens strains have been classified into five toxinotypes (A-E). To date, twelve genome sequences have been generated for a diverse collection of C. perfringens isolates, including strains associated with human and animal infections, a human commensal strain, and a strain with potential industrial utility. Most of the sequenced strains are classified as toxinotype A. However, genome sequences of representative strains from each of the other four toxinotypes have also been determined. Analysis of this collection of sequences has highlighted a lack of features differentiating toxinotype A strains from the other isolates, indicating that the primary defining characteristic of toxinotype A strains is their lack of key plasmid-encoded extracellular toxin genes associated with toxinotype B to E strains. The representative B-E strains sequenced to date each harbour many unique genes. Additional genome sequences are needed to determine if these genes are characteristic of their respective toxinotypes. Copyright © 2014. Published by Elsevier Masson SAS.
Intrahaplotypic Variants Differentiate Complex Linkage Disequilibrium within Human MHC Haplotypes
Lam, Tze Hau; Tay, Matthew Zirui; Wang, Bei; Xiao, Ziwei; Ren, Ee Chee
2015-01-01
Distinct regions of long-range genetic fixation in the human MHC region, known as conserved extended haplotypes (CEHs), possess unique genomic characteristics and are strongly associated with numerous diseases. While CEHs appear to be homogeneous by SNP analysis, the nature of fine variations within their genomic structure is unknown. Using multiple, MHC-homozygous cell lines, we demonstrate extensive sequence conservation in two common Asian MHC haplotypes: A33-B58-DR3 and A2-B46-DR9. However, characterization of phase-resolved MHC haplotypes revealed unique intra-CEH patterns of variation and uncovered 127 single nucleotide variants (SNVs) which are missing from public databases. We further show that the strong linkage disequilibrium structure within the human MHC that typically confounds precise identification of genetic features can be resolved using intra-CEH variants, as evidenced by rs3129063 and rs448489, which affect expression of ZFP57, a gene important in methylation and epigenetic regulation. This study demonstrates an improved strategy that can be used towards genetic dissection of diseases. PMID:26593880
Topological Structure of the Space of Phenotypes: The Case of RNA Neutral Networks
Aguirre, Jacobo; Buldú, Javier M.; Stich, Michael; Manrubia, Susanna C.
2011-01-01
The evolution and adaptation of molecular populations is constrained by the diversity accessible through mutational processes. RNA is a paradigmatic example of biopolymer where genotype (sequence) and phenotype (approximated by the secondary structure fold) are identified in a single molecule. The extreme redundancy of the genotype-phenotype map leads to large ensembles of RNA sequences that fold into the same secondary structure and can be connected through single-point mutations. These ensembles define neutral networks of phenotypes in sequence space. Here we analyze the topological properties of neutral networks formed by 12-nucleotides RNA sequences, obtained through the exhaustive folding of sequence space. A total of 412 sequences fragments into 645 subnetworks that correspond to 57 different secondary structures. The topological analysis reveals that each subnetwork is far from being random: it has a degree distribution with a well-defined average and a small dispersion, a high clustering coefficient, and an average shortest path between nodes close to its minimum possible value, i.e. the Hamming distance between sequences. RNA neutral networks are assortative due to the correlation in the composition of neighboring sequences, a feature that together with the symmetries inherent to the folding process explains the existence of communities. Several topological relationships can be analytically derived attending to structural restrictions and generic properties of the folding process. The average degree of these phenotypic networks grows logarithmically with their size, such that abundant phenotypes have the additional advantage of being more robust to mutations. This property prevents fragmentation of neutral networks and thus enhances the navigability of sequence space. In summary, RNA neutral networks show unique topological properties, unknown to other networks previously described. PMID:22028856
A better sequence-read simulator program for metagenomics.
Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony
2014-01-01
There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
USDA-ARS?s Scientific Manuscript database
Antibody engineering requires the identification of antigen binding domains or variable regions (VR) unique to each antibody. It is the VR that define the unique antigen binding properties and proper sequence identification is essential for functional evaluation and performance of recombinant antibo...
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.
Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
Wang, Baosheng; Khalili Mahani, Marjan; Ng, Wei Lun; Kusumi, Junko; Phi, Hai Hong; Inomata, Nobuyuki; Wang, Xiao-Ru; Szmidt, Alfred E
2014-01-01
Pinus krempfii Lecomte is a morphologically and ecologically unique pine, endemic to Vietnam. It is regarded as vulnerable species with distribution limited to just two provinces: Khanh Hoa and Lam Dong. Although a few phylogenetic studies have included this species, almost nothing is known about its genetic features. In particular, there are no studies addressing the levels and patterns of genetic variation in natural populations of P. krempfii. In this study, we sampled 57 individuals from six natural populations of P. krempfii and analyzed their sequence variation in ten nuclear gene regions (approximately 9 kb) and 14 mitochondrial (mt) DNA regions (approximately 10 kb). We also analyzed variation at seven chloroplast (cp) microsatellite (SSR) loci. We found very low haplotype and nucleotide diversity at nuclear loci compared with other pine species. Furthermore, all investigated populations were monomorphic across all mitochondrial DNA (mtDNA) regions included in our study, which are polymorphic in other pine species. Population differentiation at nuclear loci was low (5.2%) but significant. However, structure analysis of nuclear loci did not detect genetically differentiated groups of populations. Approximate Bayesian computation (ABC) using nuclear sequence data and mismatch distribution analysis for cpSSR loci suggested recent expansion of the species. The implications of these findings for the management and conservation of P. krempfii genetic resources were discussed. PMID:25360263
Yuan, Zhaohe; Fang, Yanming; Zhang, Taikui; Fei, Zhangjun; Han, Fengming; Liu, Cuiyu; Liu, Min; Xiao, Wei; Zhang, Wenjing; Wu, Shan; Zhang, Mengwei; Ju, Youhui; Xu, Huili; Dai, He; Liu, Yujun; Chen, Yanhui; Wang, Lili; Zhou, Jianqing; Guan, Dian; Yan, Ming; Xia, Yanhua; Huang, Xianbin; Liu, Dongyuan; Wei, Hongmin; Zheng, Hongkun
2017-12-22
Pomegranate (Punica granatum L.) has an ancient cultivation history and has become an emerging profitable fruit crop due to its attractive features such as the bright red appearance and the high abundance of medicinally valuable ellagitannin-based compounds in its peel and aril. However, the limited genomic resources have restricted further elucidation of genetics and evolution of these interesting traits. Here, we report a 274-Mb high-quality draft pomegranate genome sequence, which covers approximately 81.5% of the estimated 336-Mb genome, consists of 2177 scaffolds with an N50 size of 1.7 Mb and contains 30 903 genes. Phylogenomic analysis supported that pomegranate belongs to the Lythraceae family rather than the monogeneric Punicaceae family, and comparative analyses showed that pomegranate and Eucalyptus grandis share the paleotetraploidy event. Integrated genomic and transcriptomic analyses provided insights into the molecular mechanisms underlying the biosynthesis of ellagitannin-based compounds, the colour formation in both peels and arils during pomegranate fruit development, and the unique ovule development processes that are characteristic of pomegranate. This genome sequence provides an important resource to expand our understanding of some unique biological processes and to facilitate both comparative biology studies and crop breeding. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Thompson, Michael C.; Wheatley, Nicole M.; Jorda, Julien; Sawaya, Michael R.; Gidaniyan, Soheil D.; Ahmed, Hoda; Yang, Zhongyu; McCarty, Krystal N.; Whitelegge, Julian P.; Yeates, Todd O.
2014-01-01
Recently, progress has been made toward understanding the functional diversity of bacterial microcompartment (MCP) systems, which serve as protein-based metabolic organelles in diverse microbes. New types of MCPs have been identified, including the glycyl-radical propanediol (Grp) MCP. Within these elaborate protein complexes, BMC-domain shell proteins assemble to form a polyhedral barrier that encapsulates the enzymatic contents of the MCP. Interestingly, the Grp MCP contains a number of shell proteins with unusual sequence features. GrpU is one such shell protein, whose amino acid sequence is particularly divergent from other members of the BMC-domain superfamily of proteins that effectively defines all MCPs. Expression, purification, and subsequent characterization of the protein showed, unexpectedly, that it binds an iron-sulfur cluster. We determined X-ray crystal structures of two GrpU orthologs, providing the first structural insight into the homohexameric BMC-domain shell proteins of the Grp system. The X-ray structures of GrpU, both obtained in the apo form, combined with spectroscopic analyses and computational modeling, show that the metal cluster resides in the central pore of the BMC shell protein at a position of broken 6-fold symmetry. The result is a structurally polymorphic iron-sulfur cluster binding site that appears to be unique among metalloproteins studied to date. PMID:25102080
Temporality of Features in Near-Death Experience Narratives
Martial, Charlotte; Cassol, Héléna; Antonopoulos, Georgios; Charlier, Thomas; Heros, Julien; Donneau, Anne-Françoise; Charland-Verville, Vanessa; Laureys, Steven
2017-01-01
Background: After an occurrence of a Near-Death Experience (NDE), Near-Death Experiencers (NDErs) usually report extremely rich and detailed narratives. Phenomenologically, a NDE can be described as a set of distinguishable features. Some authors have proposed regular patterns of NDEs, however, the actual temporality sequence of NDE core features remains a little explored area. Objectives: The aim of the present study was to investigate the frequency distribution of these features (globally and according to the position of features in narratives) as well as the most frequently reported temporality sequences of features. Methods: We collected 154 French freely expressed written NDE narratives (i.e., Greyson NDE scale total score ≥ 7/32). A text analysis was conducted on all narratives in order to infer temporal ordering and frequency distribution of NDE features. Results: Our analyses highlighted the following most frequently reported sequence of consecutive NDE features: Out-of-Body Experience, Experiencing a tunnel, Seeing a bright light, Feeling of peace. Yet, this sequence was encountered in a very limited number of NDErs. Conclusion: These findings may suggest that NDEs temporality sequences can vary across NDErs. Exploring associations and relationships among features encountered during NDEs may complete the rigorous definition and scientific comprehension of the phenomenon. PMID:28659779
Temporality of Features in Near-Death Experience Narratives.
Martial, Charlotte; Cassol, Héléna; Antonopoulos, Georgios; Charlier, Thomas; Heros, Julien; Donneau, Anne-Françoise; Charland-Verville, Vanessa; Laureys, Steven
2017-01-01
Background: After an occurrence of a Near-Death Experience (NDE), Near-Death Experiencers (NDErs) usually report extremely rich and detailed narratives. Phenomenologically, a NDE can be described as a set of distinguishable features. Some authors have proposed regular patterns of NDEs, however, the actual temporality sequence of NDE core features remains a little explored area. Objectives: The aim of the present study was to investigate the frequency distribution of these features (globally and according to the position of features in narratives) as well as the most frequently reported temporality sequences of features. Methods: We collected 154 French freely expressed written NDE narratives (i.e., Greyson NDE scale total score ≥ 7/32). A text analysis was conducted on all narratives in order to infer temporal ordering and frequency distribution of NDE features. Results: Our analyses highlighted the following most frequently reported sequence of consecutive NDE features: Out-of-Body Experience, Experiencing a tunnel, Seeing a bright light, Feeling of peace. Yet, this sequence was encountered in a very limited number of NDErs. Conclusion: These findings may suggest that NDEs temporality sequences can vary across NDErs. Exploring associations and relationships among features encountered during NDEs may complete the rigorous definition and scientific comprehension of the phenomenon.
Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.
2015-01-01
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272
Talkowski, Michael E; Ernst, Carl; Heilbut, Adrian; Chiang, Colby; Hanscom, Carrie; Lindgren, Amelia; Kirby, Andrew; Liu, Shangtao; Muddukrishna, Bhavana; Ohsumi, Toshiro K; Shen, Yiping; Borowsky, Mark; Daly, Mark J; Morton, Cynthia C; Gusella, James F
2011-04-08
The contribution of balanced chromosomal rearrangements to complex disorders remains unclear because they are not detected routinely by genome-wide microarrays and clinical localization is imprecise. Failure to consider these events bypasses a potentially powerful complement to single nucleotide polymorphism and copy-number association approaches to complex disorders, where much of the heritability remains unexplained. To capitalize on this genetic resource, we have applied optimized sequencing and analysis strategies to test whether these potentially high-impact variants can be mapped at reasonable cost and throughput. By using a whole-genome multiplexing strategy, rearrangement breakpoints could be delineated at a fraction of the cost of standard sequencing. For rearrangements already mapped regionally by karyotyping and fluorescence in situ hybridization, a targeted approach enabled capture and sequencing of multiple breakpoints simultaneously. Importantly, this strategy permitted capture and unique alignment of up to 97% of repeat-masked sequences in the targeted regions. Genome-wide analyses estimate that only 3.7% of bases should be routinely omitted from genomic DNA capture experiments. Illustrating the power of these approaches, the rearrangement breakpoints were rapidly defined to base pair resolution and revealed unexpected sequence complexity, such as co-occurrence of inversion and translocation as an underlying feature of karyotypically balanced alterations. These findings have implications ranging from genome annotation to de novo assemblies and could enable sequencing screens for structural variations at a cost comparable to that of microarrays in standard clinical practice. Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Quaranfil, Johnston Atoll, and Lake Chad viruses are novel members of the family Orthomyxoviridae.
Presti, Rachel M; Zhao, Guoyan; Beatty, Wandy L; Mihindukulasuriya, Kathie A; da Rosa, Amelia P A Travassos; Popov, Vsevolod L; Tesh, Robert B; Virgin, Herbert W; Wang, David
2009-11-01
Arboviral infections are an important cause of emerging infections due to the movements of humans, animals, and hematophagous arthropods. Quaranfil virus (QRFV) is an unclassified arbovirus originally isolated from children with mild febrile illness in Quaranfil, Egypt, in 1953. It has subsequently been isolated in multiple geographic areas from ticks and birds. We used high-throughput sequencing to classify QRFV as a novel orthomyxovirus. The genome of this virus is comprised of multiple RNA segments; five were completely sequenced. Proteins with limited amino acid similarity to conserved domains in polymerase (PA, PB1, and PB2) and hemagglutinin (HA) genes from known orthomyxoviruses were predicted to be present in four of the segments. The fifth sequenced segment shared no detectable similarity to any protein and is of uncertain function. The end-terminal sequences of QRFV are conserved between segments and are different from those of the known orthomyxovirus genera. QRFV is known to cross-react serologically with two other unclassified viruses, Johnston Atoll virus (JAV) and Lake Chad virus (LKCV). The complete open reading frames of PB1 and HA were sequenced for JAV, while a fragment of PB1 of LKCV was identified by mass sequencing. QRFV and JAV PB1 and HA shared 80% and 70% amino acid identity to each other, respectively; the LKCV PB1 fragment shared 83% amino acid identity with the corresponding region of QRFV PB1. Based on phylogenetic analyses, virion ultrastructural features, and the unique end-terminal sequences identified, we propose that QRFV, JAV, and LKCV comprise a novel genus of the family Orthomyxoviridae.
Quaranfil, Johnston Atoll, and Lake Chad Viruses Are Novel Members of the Family Orthomyxoviridae▿
Presti, Rachel M.; Zhao, Guoyan; Beatty, Wandy L.; Mihindukulasuriya, Kathie A.; Travassos da Rosa, Amelia P. A.; Popov, Vsevolod L.; Tesh, Robert B.; Virgin, Herbert W.; Wang, David
2009-01-01
Arboviral infections are an important cause of emerging infections due to the movements of humans, animals, and hematophagous arthropods. Quaranfil virus (QRFV) is an unclassified arbovirus originally isolated from children with mild febrile illness in Quaranfil, Egypt, in 1953. It has subsequently been isolated in multiple geographic areas from ticks and birds. We used high-throughput sequencing to classify QRFV as a novel orthomyxovirus. The genome of this virus is comprised of multiple RNA segments; five were completely sequenced. Proteins with limited amino acid similarity to conserved domains in polymerase (PA, PB1, and PB2) and hemagglutinin (HA) genes from known orthomyxoviruses were predicted to be present in four of the segments. The fifth sequenced segment shared no detectable similarity to any protein and is of uncertain function. The end-terminal sequences of QRFV are conserved between segments and are different from those of the known orthomyxovirus genera. QRFV is known to cross-react serologically with two other unclassified viruses, Johnston Atoll virus (JAV) and Lake Chad virus (LKCV). The complete open reading frames of PB1 and HA were sequenced for JAV, while a fragment of PB1 of LKCV was identified by mass sequencing. QRFV and JAV PB1 and HA shared 80% and 70% amino acid identity to each other, respectively; the LKCV PB1 fragment shared 83% amino acid identity with the corresponding region of QRFV PB1. Based on phylogenetic analyses, virion ultrastructural features, and the unique end-terminal sequences identified, we propose that QRFV, JAV, and LKCV comprise a novel genus of the family Orthomyxoviridae. PMID:19726499
Lakshmanan, Anupama; Cheong, Daniel W; Accardo, Angelo; Di Fabrizio, Enzo; Riekel, Christian; Hauser, Charlotte A E
2013-01-08
The self-assembly of abnormally folded proteins into amyloid fibrils is a hallmark of many debilitating diseases, from Alzheimer's and Parkinson diseases to prion-related disorders and diabetes type II. However, the fundamental mechanism of amyloid aggregation remains poorly understood. Core sequences of four to seven amino acids within natural amyloid proteins that form toxic fibrils have been used to study amyloidogenesis. We recently reported a class of systematically designed ultrasmall peptides that self-assemble in water into cross-β-type fibers. Here we compare the self-assembly of these peptides with natural core sequences. These include core segments from Alzheimer's amyloid-β, human amylin, and calcitonin. We analyzed the self-assembly process using circular dichroism, electron microscopy, X-ray diffraction, rheology, and molecular dynamics simulations. We found that the designed aliphatic peptides exhibited a similar self-assembly mechanism to several natural sequences, with formation of α-helical intermediates being a common feature. Interestingly, the self-assembly of a second core sequence from amyloid-β, containing the diphenylalanine motif, was distinctly different from all other examined sequences. The diphenylalanine-containing sequence formed β-sheet aggregates without going through the α-helical intermediate step, giving a unique fiber-diffraction pattern and simulation structure. Based on these results, we propose a simplified aliphatic model system to study amyloidosis. Our results provide vital insight into the nature of early intermediates formed and suggest that aromatic interactions are not as important in amyloid formation as previously postulated. This information is necessary for developing therapeutic drugs that inhibit and control amyloid formation.
Discriminative prediction of mammalian enhancers from DNA sequence
Lee, Dongwon; Karchin, Rachel; Beer, Michael A.
2011-01-01
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.
2010-04-01
Malware are analogs of viruses. Viruses are comprised of large numbers of polypeptide proteins. The shape and function of the protein strands determines the functionality of the segment, similar to a subroutine in malware. The full combination of subroutines is the malware organism, in analogous fashion as a collection of polypeptides forms protein structures that are information bearing. We propose to apply the methods of Bioinformatics to analyze malware to provide a rich feature set for creating a unique and novel detection and classification scheme that is originally applied to Bioinformatics amino acid sequencing. Our proposed methods enable real time in situ (in contrast to in vivo) detection applications.
Biosynthesis of enediyne antitumor antibiotics.
Van Lanen, Steven G; Shen, Ben
2008-01-01
The enediyne polyketides are secondary metabolites isolated from a variety of Actinomycetes. All members share very potent anticancer and antibiotic activity, and prospects for the clinical application of the enediynes has been validated with the recent marketing of two enediyne derivatives as anticancer agents. The biosynthesis of these compounds is of interest because of the numerous structural features that are unique to the enediyne family. The gene cluster for five enediynes has now been cloned and sequenced, providing the foundation to understand natures' means to biosynthesize such complex, exotic molecules. Presented here is a review of the current progress in delineating the biosynthesis of the enediynes with an emphasis on the model enediyne, C-1027.
Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui
2015-01-01
Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
Dewhurst, Henry M.; Choudhury, Shilpa; Torres, Matthew P.
2015-01-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. PMID:26070665
Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.
Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry
2017-07-11
A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.
Tenacibaculum litoreum sp. nov., isolated from tidal flat sediment.
Choi, Dong Han; Kim, Yoon-Gon; Hwang, Chung Yeon; Yi, Hana; Chun, Jongsik; Cho, Byung Cheol
2006-03-01
A rod-shaped bacterium, designated CL-TF13T, was isolated from a tidal flat in Ganghwa, Korea. Analysis of the 16S rRNA gene sequence revealed an affiliation with the genus Tenacibaculum. The sequence similarities between CL-TF13T and type strains of members of the genus Tenacibaculum were from 94.2 to 97.4%. Cells were motile by means of gliding. Strain CL-TF13T grew on solid medium as pale-yellow colonies with an irregular spreading edge. The strain was able to grow in NaCl at a range of 3-5%. They grew within a temperature range of 5-40 degrees C and at pH range of 6-10. The major fatty acids were summed feature 3 (C(16:1)omega7c and/or iso-C(15:0) 2-OH, 19.6%), iso-C(15:0) (18.8%) and iso-C(17:0) 3-OH (13.6%). Fatty acids such as C(18:3)omega6c (6,9,12) (1.5%) and summed feature 4 (iso I- and/or anteiso B-C(17:1), 1.3%) were uniquely found in minor quantities in CL-TF13T among Tenacibaculum species. The DNA G + C content was 30 mol%. According to physiological data, fatty-acid composition and 16S rRNA gene sequence, CL-TF13T could be assigned to the genus Tenacibaculum but distinguished from the recognized species of the genus. Therefore, strain CL-TF13T (= KCCM 42115T = JCM 13039T) represents a novel species, for which the name Tenacibaculum litoreum sp. nov. is proposed.
High-density, microsphere-based fiber optic DNA microarrays.
Epstein, Jason R; Leung, Amy P K; Lee, Kyong Hoon; Walt, David R
2003-05-01
A high-density fiber optic DNA microarray has been developed consisting of oligonucleotide-functionalized, 3.1-microm-diameter microspheres randomly distributed on the etched face of an imaging fiber bundle. The fiber bundles are comprised of 6000-50000 fused optical fibers and each fiber terminates with an etched well. The microwell array is capable of housing complementary-sized microspheres, each containing thousands of copies of a unique oligonucleotide probe sequence. The array fabrication process results in random microsphere placement. Determining the position of microspheres in the random array requires an optical encoding scheme. This array platform provides many advantages over other array formats. The microsphere-stock suspension concentration added to the etched fiber can be controlled to provide inherent sensor redundancy. Examining identical microspheres has a beneficial effect on the signal-to-noise ratio. As other sequences of interest are discovered, new microsphere sensing elements can be added to existing microsphere pools and new arrays can be fabricated incorporating the new sequences without altering the existing detection capabilities. These microarrays contain the smallest feature sizes (3 microm) of any DNA array, allowing interrogation of extremely small sample volumes. Reducing the feature size results in higher local target molecule concentrations, creating rapid and highly sensitive assays. The microsphere array platform is also flexible in its applications; research has included DNA-protein interaction profiles, microbial strain differentiation, and non-labeled target interrogation with molecular beacons. Fiber optic microsphere-based DNA microarrays have a simple fabrication protocol enabling their expansion into other applications, such as single cell-based assays.
Our evolving understanding of aeolian bedforms, based on observation of dunes on different worlds
NASA Astrophysics Data System (ADS)
Diniega, Serina; Kreslavsky, Mikhail; Radebaugh, Jani; Silvestro, Simone; Telfer, Matt; Tirsch, Daniela
2017-06-01
Dunes, dune fields, and ripples are unique and useful records of the interaction between wind and granular materials - finding such features on a planetary surface immediately suggests certain information about climate and surface conditions (at least during the dunes' formation and evolution). Additionally, studies of dune characteristics under non-Earth conditions allow for ;tests; of aeolian process models based primarily on observations of terrestrial features and dynamics, and refinement of the models to include consideration of a wider range of environmental and planetary conditions. To-date, the planetary aeolian community has found and studied dune fields on Mars, Venus, and the Saturnian moon Titan. Additionally, we have observed candidate ;aeolian bedforms; on Comet 67P/Churyumov-Gerasimenko, the Jovian moon Io, and - most recently - Pluto. In this paper, we hypothesize that the progression of investigations of aeolian bedforms and processes on a particular planetary body follows a consistent sequence - primarily set by the acquisition of data of particular types and resolutions, and by the maturation of knowledge about that planetary body. We define that sequence of generated knowledge and new questions (within seven investigation phases) and discuss examples from all of the studied bodies. The aim of such a sequence is to better define our past and current state of understanding about the aeolian bedforms of a particular body, to highlight the related assumptions that require re-analysis with data acquired during later investigations, and to use lessons learned from planetary and terrestrial aeolian studies to predict what types of investigations could be most fruitful in the future.
Lier, Clément; Baticle, Elodie; Horvath, Philippe; Haguenoer, Eve; Valentin, Anne-Sophie; Glaser, Philippe; Mereghetti, Laurent; Lanotte, Philippe
2015-01-01
CRISPR-Cas systems (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins) are found in 90% of archaea and about 40% of bacteria. In this original system, CRISPR arrays comprise short, almost unique sequences called spacers that are interspersed with conserved palindromic repeats. These systems play a role in adaptive immunity and participate to fight non-self DNA such as integrative and conjugative elements, plasmids, and phages. In Streptococcus agalactiae, a bacterium implicated in colonization and infections in humans since the 1960s, two CRISPR-Cas systems have been described. A type II-A system, characterized by proteins Cas9, Cas1, Cas2, and Csn2, is ubiquitous, and a type I–C system, with the Cas8c signature protein, is present in about 20% of the isolates. Unlike type I–C, which appears to be non-functional, type II-A appears fully functional. Here we studied type II-A CRISPR-cas loci from 126 human isolates of S. agalactiae belonging to different clonal complexes that represent the diversity of the species and that have been implicated in colonization or infection. The CRISPR-cas locus was analyzed both at spacer and repeat levels. Major distinctive features were identified according to the phylogenetic lineages previously defined by multilocus sequence typing, especially for the sequence type (ST) 17, which is considered hypervirulent. Among other idiosyncrasies, ST-17 shows a significantly lower number of spacers in comparison with other lineages. This characteristic could reflect the peculiar virulence or colonization specificities of this lineage. PMID:26124774
NASA Technical Reports Server (NTRS)
Serebreny, S. M.; Evans, W. E.; Wiegman, E. J.
1974-01-01
The usefulness of dynamic display techniques in exploiting the repetitive nature of ERTS imagery was investigated. A specially designed Electronic Satellite Image Analysis Console (ESIAC) was developed and employed to process data for seven ERTS principal investigators studying dynamic hydrological conditions for diverse applications. These applications include measurement of snowfield extent and sediment plumes from estuary discharge, Playa Lake inventory, and monitoring of phreatophyte and other vegetation changes. The ESIAC provides facilities for storing registered image sequences in a magnetic video disc memory for subsequent recall, enhancement, and animated display in monochrome or color. The most unique feature of the system is the capability to time lapse the imagery and analytic displays of the imagery. Data products included quantitative measurements of distances and areas, binary thematic maps based on monospectral or multispectral decisions, radiance profiles, and movie loops. Applications of animation for uses other than creating time-lapse sequences are identified. Input to the ESIAC can be either digital or via photographic transparencies.
An unusual osteomyelitis caused by Moraxella osloensis: A case report.
Alkhatib, Nidal J; Younis, Manaf H; Alobaidi, Ahmad S; Shaath, Nebal M
2017-01-01
Moraxella osloensis is a gram-negative coccobacillus, that is saprophytic on skin and mucosa, and rarely causing human infections. Reported cases of human infections usually occur in immunocompromised patients. We report the second case of M. osloensis-caused-osteomyelitis in literature, occurring in a young healthy man. The organism was identified by sequencing analysis of the 16S ribosomal RNA gene. Our patient was treated successfully with surgical debridement and intravenous third-generation cephalosporins. M. osloensis has been rarely reported to cause local or invasive infections. Our case report is the second case in literature and it is different from the previously reported case in that our patient has no chronic medical problems, no history of trauma, with unique presentation and features on the MRI and intraoperative finding. Proper diagnosis is essential for appropriate treatment of osteomyelitis. RNA gene sequence analysis is the primary method of M. osloensis diagnosis. M. osloensis is usually susceptible to simple antibiotics. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Anand, Prachi; Grigoryan, Alexandre; Bhuiyan, Mohammed H; Ueberheide, Beatrix; Russell, Victoria; Quinoñez, Jose; Moy, Patrick; Chait, Brian T; Poget, Sébastien F; Holford, Mandë
2014-01-01
Disulfide-rich peptide toxins found in the secretions of venomous organisms such as snakes, spiders, scorpions, leeches, and marine snails are highly efficient and effective tools for novel therapeutic drug development. Venom peptide toxins have been used extensively to characterize ion channels in the nervous system and platelet aggregation in haemostatic systems. A significant hurdle in characterizing disulfide-rich peptide toxins from venomous animals is obtaining significant quantities needed for sequence and structural analyses. Presented here is a strategy for the structural characterization of venom peptide toxins from sample limited (4 ng) specimens via direct mass spectrometry sequencing, chemical synthesis and NMR structure elucidation. Using this integrated approach, venom peptide Tv1 from Terebra variegata was discovered. Tv1 displays a unique fold not witnessed in prior snail neuropeptides. The novel structural features found for Tv1 suggest that the terebrid pool of peptide toxins may target different neuronal agents with varying specificities compared to previously characterized snail neuropeptides.
From genes to genomes: a new paradigm for studying fungal pathogenesis in Magnaporthe oryzae.
Xu, Jin-Rong; Zhao, Xinhua; Dean, Ralph A
2007-01-01
Magnaporthe oryzae is the most destructive fungal pathogen of rice worldwide and because of its amenability to classical and molecular genetic manipulation, availability of a genome sequence, and other resources it has emerged as a leading model system to study host-pathogen interactions. This chapter reviews recent progress toward elucidation of the molecular basis of infection-related morphogenesis, host penetration, invasive growth, and host-pathogen interactions. Related information on genome analysis and genomic studies of plant infection processes is summarized under specific topics where appropriate. Particular emphasis is placed on the role of MAP kinase and cAMP signal transduction pathways and unique features in the genome such as repetitive sequences and expanded gene families. Emerging developments in functional genome analysis through large-scale insertional mutagenesis and gene expression profiling are detailed. The chapter concludes with new prospects in the area of systems biology, such as protein expression profiling, and highlighting remaining crucial information needed to fully appreciate host-pathogen interactions.
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons
Krishnaswami, Suguna Rani; Grindberg, Rashel V; Novotny, Mark; Venepally, Pratap; Lacar, Benjamin; Bhutani, Kunal; Linker, Sara B; Pham, Son; Erwin, Jennifer A; Miller, Jeremy A; Hodge, Rebecca; McCarthy, James K; Kelder, Martin; McCorrison, Jamison; Aevermann, Brian D; Fuertes, Francisco Diez; Scheuermann, Richard H; Lee, Jun; Lein, Ed S; Schork, Nicholas; McConnell, Michael J; Gage, Fred H; Lasken, Roger S
2016-01-01
A protocol is described for sequencing the transcriptome of a cell nucleus. Nuclei are isolated from specimens and sorted by FACS, cDNA libraries are constructed and RNA-seq is performed, followed by data analysis. Some steps follow published methods (Smart-seq2 for cDNA synthesis and Nextera XT barcoded library preparation) and are not described in detail here. Previous single-cell approaches for RNA-seq from tissues include cell dissociation using protease treatment at 30 °C, which is known to alter the transcriptome. We isolate nuclei at 4 °C from tissue homogenates, which cause minimal damage. Nuclear transcriptomes can be obtained from postmortem human brain tissue stored at −80 °C, making brain archives accessible for RNA-seq from individual neurons. The method also allows investigation of biological features unique to nuclei, such as enrichment of certain transcripts and precursors of some noncoding RNAs. By following this procedure, it takes about 4 d to construct cDNA libraries that are ready for sequencing. PMID:26890679
Genome Structure of the Legume, Lotus japonicus
Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi
2008-01-01
The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435
Freyhult, Eva; Moulton, Vincent; Ardell, David H.
2006-01-01
Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure–function relationships in sequence families and superfamilies. PMID:16473848
Swallow Event Sequencing: Comparing Healthy Older and Younger Adults.
Herzberg, Erica G; Lazarus, Cathy L; Steele, Catriona M; Molfenter, Sonja M
2018-04-23
Previous research has established that a great deal of variation exists in the temporal sequence of swallowing events for healthy adults. Yet, the impact of aging on swallow event sequence is not well understood. Kendall et al. (Dysphagia 18(2):85-91, 2003) suggested there are 4 obligatory paired-event sequences in swallowing. We directly compared adherence to these sequences, as well as event latencies, and quantified the percentage of unique sequences in two samples of healthy adults: young (< 45) and old (> 65). The 8 swallowing events that contribute to the sequences were reliably identified from videofluoroscopy in a sample of 23 healthy seniors (10 male, mean age 74.7) and 20 healthy young adults (10 male, mean age 31.5) with no evidence of penetration-aspiration or post-swallow residue. Chi-square analyses compared the proportions of obligatory pairs and unique sequences by age group. Compared to the older subjects, younger subjects had significantly lower adherence to two obligatory sequences: Upper Esophageal Sphincter (UES) opening occurs before (or simultaneous with) the bolus arriving at the UES and UES maximum distention occurs before maximum pharyngeal constriction. The associated latencies were significantly different between age groups as well. Further, significantly fewer unique swallow sequences were observed in the older group (61%) compared with the young (82%) (χ 2 = 31.8; p < 0.001). Our findings suggest that paired swallow event sequences may not be robust across the age continuum and that variation in swallow sequences appears to decrease with aging. These findings provide normative references for comparisons to older individuals with dysphagia.
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Mahieu, Nathaniel G; Patti, Gary J
2017-10-03
When using liquid chromatography/mass spectrometry (LC/MS) to perform untargeted metabolomics, it is now routine to detect tens of thousands of features from biological samples. Poor understanding of the data, however, has complicated interpretation and masked the number of unique metabolites actually being measured in an experiment. Here we place an upper bound on the number of unique metabolites detected in Escherichia coli samples analyzed with one untargeted metabolomics method. We first group multiple features arising from the same analyte, which we call "degenerate features", using a context-driven annotation approach. Surprisingly, this analysis revealed thousands of previously unreported degeneracies that reduced the number of unique analytes to ∼2961. We then applied an orthogonal approach to remove nonbiological features from the data using the 13 C-based credentialing technology. This further reduced the number of unique analytes to less than 1000. Our 90% reduction in data is 5-fold greater than previously published studies. On the basis of the results, we propose an alternative approach to untargeted metabolomics that relies on thoroughly annotated reference data sets. To this end, we introduce the creDBle database ( http://creDBle.wustl.edu ), which contains accurate mass, retention time, and MS/MS fragmentation data as well as annotations of all credentialed features.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Y; Wang, C; Horton, J
Purpose: To investigate the feasibility of using classic textural feature extraction in radiotherapy response assessment, we studied a unique cohort of early stage breast cancer patients with paired pre - and post-radiation Diffusion Weighted MRI (DWI-MRI) and Dynamic Contrast Enhanced MRI (DCE-MRI). Methods: 15 female patients from our prospective phase I trial evaluating preoperative radiotherapy were included in this retrospective study. Each patient received a single-fraction radiation treatment, and DWI and DCE scans were conducted before and after the radiotherapy. DWI scans were acquired using a spin-echo EPI sequence with diffusion weighting factors of b = 0 and b =more » 500 mm{sup 2} /s, and the apparent diffusion coefficient (ADC) maps were calculated. DCE-MRI scans were acquired using a T{sub 1}-weighted 3D SPGR sequence with a temporal resolution of about 1 minute. The contrast agent (CA) was intravenously injected with a 0.1 mmol/kg bodyweight dose at 2 ml/s. Two parameters, volume transfer constant (K{sup trans} ) and k{sub ep} were analyzed using the two-compartment Tofts kinetic model. For DCE parametric maps and ADC maps, 33 textural features were generated from the clinical target volume (CTV) in a 3D fashion using the classic gray level co-occurrence matrix (GLCOM) and gray level run length matrix (GLRLM). Wilcoxon signed-rank test was used to determine the significance of each texture feature’s change after the radiotherapy. The significance was set to 0.05 with Bonferroni correction. Results: For ADC maps calculated from DWI-MRI, 24 out of 33 CTV features changed significantly after the radiotherapy. For DCE-MRI pharmacokinetic parameters, all 33 CTV features of K{sup trans} and 33 features of k{sub ep} changed significantly. Conclusion: Initial results indicate that those significantly changed classic texture features are sensitive to radiation-induced changes and can be used for assessment of radiotherapy response in breast cancer.« less
Real-Time PCR Assay for a Unique Chromosomal Sequence of Bacillus anthracis
2004-12-01
13061 Neisseria lactamica .............................................................. 23970 Bacillus coagulans ...NEG Bacillus coagulane 7050 NEG NEG Bacillus cereus 13472 NEG NEG Bacillus licheniforms 12759 NEG NEG Bacillus cereus 13824 NEG NEG Bacillus ...Assay for a Unique Chromosomal Sequence of Bacillus anthracis Elizabeth Bode,1 William Hurtle,2† and David Norwood1* United States Army Medical
Universal Entropy of Word Ordering Across Linguistic Families
Montemurro, Marcelo A.; Zanette, Damián H.
2011-01-01
Background The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language. Methodology/Principal Findings We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families. Conclusions/Significance Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal. PMID:21603637
Transterm: a database to aid the analysis of regulatory sequences in mRNAs
Jacobs, Grant H.; Chen, Augustine; Stevens, Stewart G.; Stockwell, Peter A.; Black, Michael A.; Tate, Warren P.; Brown, Chris M.
2009-01-01
Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affect translation (initiation, elongation, termination) for species or strains in Genbank, processed through a standard pipeline. The second is curated descriptions of experimentally determined regulatory elements that function as translational control elements in mRNAs. Transterm focuses on protein binding sites, particularly those in 3′-untranslated regions (3′-UTR). For this release the interface has been extensively updated based on user feedback. The data is now accessible by strain rather than species, for example there are 10 Escherichia coli strains (genomes) analysed separately. In addition to providing a repository of data, the database also provides tools for users to query their own mRNA sequences. Users can search sequences for Transterm or user defined regulatory elements, including protein or miRNA targets. Transterm also provides a central core of links to related resources for complementary analyses. PMID:18984623
González-Pedrajo, Bertha; de la Mora, Javier; Ballado, Teresa; Camarena, Laura; Dreyfus, Georges
2002-11-13
In this work, we show evidence regarding the functionality of a large cluster of flagellar genes in Rhodobacter sphaeroides. The genes of this cluster, flgGHIJKL and orf-1, are mainly involved in the formation of the basal body, and flgK and flgL encode the hook-associated proteins HAP1 and HAP3. In general, these genes showed a good similarity as compared with those reported for Salmonella enterica. However, flgJ and flgK showed particular features that make them unique among the flagellar sequences already reported. flgJ is only a third of the size reported for flgJ from Salmonella; whereas flgK is about three times larger than any other flgK sequence previously known. Our results indicate that both genes are functional, and their products are essential for flagellar assembly. In contrast, the interruption of orf-1, did not affect motility suggesting that this sequence, if functional, is not indispensable for flagellar assembly. Finally, we present genetic evidence suggesting that the flgGHIJKL genes are expressed as a single transcriptional unit depending on the sigma-54 factor.
Plasmin on adherent cells: from microvesiculation to apoptosis
Doeuvre, Loïc; Plawinski, Laurent; Goux, Didier; Vivien, Denis; Anglés-Cano, Eduardo
2010-01-01
SYNOPSIS Cell activation by stressors is characterised by a sequence of detectable phenotypic cell changes. The strength of a given stimulus induces modifications in the activity of membrane phospholipids transporters and calpains, which leads to phosphatidylserine exposure, membrane blebbing and the release of microparticles (nanoscale membrane vesicles). This vesiculation could be considered as a warning signal that may be followed, if the stimulus is maintained, by cell detachment-induced apoptosis. In this study, plasminogen incubated onto adherent cells is activated into plasmin by constitutively expressed tPA or uPA. Plasmin formed on the cellular membrane then induces an unique response characterized by membrane blebbing and vesiculation. Hitherto unknown for plasmin, these membrane changes are similar to those induced by thrombin on platelets. If plasmin formation evolves, matrix proteins are then degraded, cells lose attachment and enter the apoptotic process, characterized by DNA fragmentation and electron microscopy features. This sequence of events was experimentally documented at all these stages. Since other proteolytic or inflammatory stimuli may evoke similar responses by distinct adherent cells, this sequence can be applied to distinguish activated adherent cells from cells entering the apoptotic process. This is a major definition crucial to the identification of mediators, inhibitors and potential therapeutic agents. PMID:20846121
Hao, Pei; Zheng, Huajun; Yu, Yao; Ding, Guohui; Gu, Wenyi; Chen, Shuting; Yu, Zhonghao; Ren, Shuangxi; Oda, Munehiro; Konno, Tomonobu; Wang, Shengyue; Li, Xuan; Ji, Zai-Si; Zhao, Guoping
2011-01-17
Lactobacillus delbrueckii subsp. bulgaricus (Lb. bulgaricus) is an important species of Lactic Acid Bacteria (LAB) used for cheese and yogurt fermentation. The genome of Lb. bulgaricus 2038, an industrial strain mainly used for yogurt production, was completely sequenced and compared against the other two ATCC collection strains of the same subspecies. Specific physiological properties of strain 2038, such as lysine biosynthesis, formate production, aspartate-related carbon-skeleton intermediate metabolism, unique EPS synthesis and efficient DNA restriction/modification systems, are all different from those of the collection strains that might benefit the industrial production of yogurt. Other common features shared by Lb. bulgaricus strains, such as efficient protocooperation with Streptococcus thermophilus and lactate production as well as well-equipped stress tolerance mechanisms may account for it being selected originally for yogurt fermentation industry. Multiple lines of evidence suggested that Lb. bulgaricus 2038 was genetically closer to the common ancestor of the subspecies than the other two sequenced collection strains, probably due to a strict industrial maintenance process for strain 2038 that might have halted its genome decay and sustained a gene network suitable for large scale yogurt production.
Ding, Guohui; Gu, Wenyi; Chen, Shuting; Yu, Zhonghao; Ren, Shuangxi; Oda, Munehiro; Konno, Tomonobu; Wang, Shengyue; Li, Xuan; Ji, Zai-Si; Zhao, Guoping
2011-01-01
Lactobacillus delbrueckii subsp. bulgaricus (Lb. bulgaricus) is an important species of Lactic Acid Bacteria (LAB) used for cheese and yogurt fermentation. The genome of Lb. bulgaricus 2038, an industrial strain mainly used for yogurt production, was completely sequenced and compared against the other two ATCC collection strains of the same subspecies. Specific physiological properties of strain 2038, such as lysine biosynthesis, formate production, aspartate-related carbon-skeleton intermediate metabolism, unique EPS synthesis and efficient DNA restriction/modification systems, are all different from those of the collection strains that might benefit the industrial production of yogurt. Other common features shared by Lb. bulgaricus strains, such as efficient protocooperation with Streptococcus thermophilus and lactate production as well as well-equipped stress tolerance mechanisms may account for it being selected originally for yogurt fermentation industry. Multiple lines of evidence suggested that Lb. bulgaricus 2038 was genetically closer to the common ancestor of the subspecies than the other two sequenced collection strains, probably due to a strict industrial maintenance process for strain 2038 that might have halted its genome decay and sustained a gene network suitable for large scale yogurt production. PMID:21264216
Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping
2017-12-12
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
Pechar, R; Killer, J; Švejstil, R; Salmonová, H; Geigerová, M; Bunešová, V; Rada, V; Benada, O
2017-07-01
Bacteria with potential probiotic applications are not yet sufficiently explored, even for animals with economic importance. Therefore, we decided to isolate and identify representatives of the family Bifidobacteriaceae, which inhabit the crop of laying hens. During the study, a fructose-6-phosphate phosphoketolase-positive strain, RP51T, with a regular/slightly irregular and sometimes an S-shaped slightly curved rod-like shape, was isolated from the crop of a 13 -month-old Hisex Brown hybrid laying hen. The best growth of the Gram-stain-positive bacterium, which was isolated using Bifidobacterium-selective mTPY agar, was found out to be under strictly anaerobic conditions, however an ability to grow under microaerophilic and aerobic conditions was also observed. Sequencing of the almost complete 16S rRNA gene (1444 bp) showed Alloscardovia omnicolens CCUG 31649T and Bombiscardovia coagulans BLAPIII/AGVT to be the most closely related species with similarities of 93.4 and 93.1 %, respectively. Lower sequence similarities were determined with other scardovial genera and other representatives of the genus Bifidobacterium. Taxonomic relationships with A. omnicolens and other members of the family Bifidobacteriaceaewere also demonstrated, based on the sequences of dnaK, fusA, hsp60 and rplB gene fragments. Low sequence similarities of phylogenetic markers to related scardovial genera and bifidobacteria along with unique features of the bacterial strain investigated within the family Bifidobacteriaceae(including the lowest DNA G+C value (44.3 mol%), a unique spectrum of cellular fatty acids and polar lipids, cellular morphology, the wide temperature range for growth (15-49 °C) and habitat) clearly indicate that strain RP51T is a representative of a novel genus within the family Bifidobacteriaceae for which the name Galliscardovia ingluviei gen. nov., sp. nov. (RP51T=DSM 100235T=LMG 28778T=CCM 8606T) is proposed.
Qin, Jiang-Bo; Liu, Zhenyu; Zhang, Hui; Shen, Chen; Wang, Xiao-Chun; Tan, Yan; Wang, Shuo; Wu, Xiao-Feng; Tian, Jie
2017-05-07
BACKGROUND Gliomas are the most common primary brain neoplasms. Misdiagnosis occurs in glioma grading due to an overlap in conventional MRI manifestations. The aim of the present study was to evaluate the power of radiomic features based on multiple MRI sequences - T2-Weighted-Imaging-FLAIR (FLAIR), T1-Weighted-Imaging-Contrast-Enhanced (T1-CE), and Apparent Diffusion Coefficient (ADC) map - in glioma grading, and to improve the power of glioma grading by combining features. MATERIAL AND METHODS Sixty-six patients with histopathologically proven gliomas underwent T2-FLAIR and T1WI-CE sequence scanning with some patients (n=63) also undergoing DWI scanning. A total of 114 radiomic features were derived with radiomic methods by using in-house software. All radiomic features were compared between high-grade gliomas (HGGs) and low-grade gliomas (LGGs). Features with significant statistical differences were selected for receiver operating characteristic (ROC) curve analysis. The relationships between significantly different radiomic features and glial fibrillary acidic protein (GFAP) expression were evaluated. RESULTS A total of 8 radiomic features from 3 MRI sequences displayed significant differences between LGGs and HGGs. FLAIR GLCM Cluster Shade, T1-CE GLCM Entropy, and ADC GLCM Homogeneity were the best features to use in differentiating LGGs and HGGs in each MRI sequence. The combined feature was best able to differentiate LGGs and HGGs, which improved the accuracy of glioma grading compared to the above features in each MRI sequence. A significant correlation was found between GFAP and T1-CE GLCM Entropy, as well as between GFAP and ADC GLCM Homogeneity. CONCLUSIONS The combined radiomic feature had the highest efficacy in distinguishing LGGs from HGGs.
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Rhizobium acidisoli sp. nov., isolated from root nodules of Phaseolus vulgaris in acid soils.
Román-Ponce, Brenda; Jing Zhang, Yu; Soledad Vásquez-Murrieta, María; Hua Sui, Xin; Feng Chen, Wen; Carlos Alberto Padilla, Juan; Wu Guo, Xian; Lian Gao, Jun; Yan, Jun; Hong Wei, Ge; Tao Wang, En
2016-01-01
Two Gram-negative, aerobic, non-motile, rod-shaped bacterial strains, FH13T and FH23, representing a novel group of Rhizobium isolated from root nodules of Phaseolus vulgaris in Mexico, were studied by a polyphasic analysis. Phylogeny of 16S rRNA gene sequences revealed them to be members of the genus Rhizobium related most closely to 'Rhizobium anhuiense' CCBAU 23252 (99.7 % similarity), Rhizobium leguminosarum USDA 2370T (98.6 %), and Rhizobium sophorae CCBAU 03386T and others ( ≤ 98.3 %). In sequence analyses of the housekeeping genes recA, glnII and atpD, both strains formed a subclade distinct from all defined species of the genus Rhizobium at sequence similarities of 82.3-94.0 %, demonstrating that they represented a novel genomic species in the genus Rhizobium. Mean levels of DNA-DNA relatedness between the reference strain FH13T and the type strains of related species varied between 13.0 ± 2.0 and 52.1 ± 1.2 %. The DNA G+C content of strain FH13T was 63.5 mol% (Tm). The major cellular fatty acids were 16 : 0, 17 : 0 anteiso, 18 : 0, summed feature 2 (12 : 0 aldehyde/unknown 10.928) and summed feature 8 (18 : 1ω7c). The fatty acid 17 : 1ω5c was unique for this strain. Some phenotypic features, such as failure to utilize adonitol, l-arabinose, d-fructose and d-fucose, and ability to utilize d-galacturonic acid and itaconic acid as carbon source, could also be used to distinguish strain FH13T from the type strains of related species. Based upon these results, a novel species, Rhizobium acidisoli sp. nov., is proposed, with FH13T ( = CCBAU 101094T = HAMBI 3626T = LMG 28672T) as the type strain.
Knudsen, Erik S; Balaji, Uthra; Mannakee, Brian; Vail, Paris; Eslinger, Cody; Moxom, Christopher; Mansour, John; Witkiewicz, Agnieszka K
2018-03-01
Pancreatic ductal adenocarcinoma (PDAC) is a therapy recalcitrant disease with the worst survival rate of common solid tumours. Preclinical models that accurately reflect the genetic and biological diversity of PDAC will be important for delineating features of tumour biology and therapeutic vulnerabilities. 27 primary PDAC tumours were employed for genetic analysis and development of tumour models. Tumour tissue was used for derivation of xenografts and cell lines. Exome sequencing was performed on the originating tumour and developed models. RNA sequencing, histological and functional analyses were employed to determine the relationship of the patient-derived models to clinical presentation of PDAC. The cohort employed captured the genetic diversity of PDAC. From most cases, both cell lines and xenograft models were developed. Exome sequencing confirmed preservation of the primary tumour mutations in developed cell lines, which remained stable with extended passaging. The level of genetic conservation in the cell lines was comparable to that observed with patient-derived xenograft (PDX) models. Unlike historically established PDAC cancer cell lines, patient-derived models recapitulated the histological architecture of the primary tumour and exhibited metastatic spread similar to that observed clinically. Detailed genetic analyses of tumours and derived models revealed features of ex vivo evolution and the clonal architecture of PDAC. Functional analysis was used to elucidate therapeutic vulnerabilities of relevance to treatment of PDAC. These data illustrate that with the appropriate methods it is possible to develop cell lines that maintain genetic features of PDAC. Such models serve as important substrates for analysing the significance of genetic variants and create a unique biorepository of annotated cell lines and xenografts that were established simultaneously from same primary tumour. These models can be used to infer genetic and empirically determined therapeutic sensitivities that would be germane to the patient. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney
2012-01-01
RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
NASA Astrophysics Data System (ADS)
Andreasen, Daniel; Edmund, Jens M.; Zografos, Vasileios; Menze, Bjoern H.; Van Leemput, Koen
2016-03-01
In radiotherapy treatment planning that is only based on magnetic resonance imaging (MRI), the electron density information usually obtained from computed tomography (CT) must be derived from the MRI by synthesizing a so-called pseudo CT (pCT). This is a non-trivial task since MRI intensities are neither uniquely nor quantitatively related to electron density. Typical approaches involve either a classification or regression model requiring specialized MRI sequences to solve intensity ambiguities, or an atlas-based model necessitating multiple registrations between atlases and subject scans. In this work, we explore a machine learning approach for creating a pCT of the pelvic region from conventional MRI sequences without using atlases. We use a random forest provided with information about local texture, edges and spatial features derived from the MRI. This helps to solve intensity ambiguities. Furthermore, we use the concept of auto-context by sequentially training a number of classification forests to create and improve context features, which are finally used to train a regression forest for pCT prediction. We evaluate the pCT quality in terms of the voxel-wise error and the radiologic accuracy as measured by water-equivalent path lengths. We compare the performance of our method against two baseline pCT strategies, which either set all MRI voxels in the subject equal to the CT value of water, or in addition transfer the bone volume from the real CT. We show an improved performance compared to both baseline pCTs suggesting that our method may be useful for MRI-only radiotherapy.
Genome-wide phylogenetic analysis of the pathogenic potential of Vibrio furnissii
Lux, Thomas M.; Lee, Rob; Love, John
2014-01-01
We recently reported the genome sequence of a free-living strain of Vibrio furnissii (NCTC 11218) harvested from an estuarine environment. V. furnissii is a widespread, free-living proteobacterium and emerging pathogen that can cause acute gastroenteritis in humans and lethal zoonoses in aquatic invertebrates, including farmed crustaceans and molluscs. Here we present the analyses to assess the potential pathogenic impact of V. furnissii. We compared the complete genome of V. furnissii with 8 other emerging and pathogenic Vibrio species. We selected and analyzed more deeply 10 genomic regions based upon unique or common features, and used 3 of these regions to construct a phylogenetic tree. Thus, we positioned V. furnissii more accurately than before and revealed a closer relationship between V. furnissii and V. cholerae than previously thought. However, V. furnissii lacks several important features normally associated with virulence in the human pathogens V. cholera and V. vulnificus. A striking feature of the V. furnissii genome is the hugely increased Super Integron, compared to the other Vibrio. Analyses of predicted genomic islands resulted in the discovery of a protein sequence that is present only in Vibrio associated with diseases in aquatic animals. We also discovered evidence of high levels horizontal gene transfer in V. furnissii. V. furnissii seems therefore to have a dynamic and fluid genome that could quickly adapt to environmental perturbation or increase its pathogenicity. Taken together, these analyses confirm the potential of V. furnissii as an emerging marine and possible human pathogen, especially in the developing, tropical, coastal regions that are most at risk from climate change. PMID:25191313
Genome-wide phylogenetic analysis of the pathogenic potential of Vibrio furnissii.
Lux, Thomas M; Lee, Rob; Love, John
2014-01-01
We recently reported the genome sequence of a free-living strain of Vibrio furnissii (NCTC 11218) harvested from an estuarine environment. V. furnissii is a widespread, free-living proteobacterium and emerging pathogen that can cause acute gastroenteritis in humans and lethal zoonoses in aquatic invertebrates, including farmed crustaceans and molluscs. Here we present the analyses to assess the potential pathogenic impact of V. furnissii. We compared the complete genome of V. furnissii with 8 other emerging and pathogenic Vibrio species. We selected and analyzed more deeply 10 genomic regions based upon unique or common features, and used 3 of these regions to construct a phylogenetic tree. Thus, we positioned V. furnissii more accurately than before and revealed a closer relationship between V. furnissii and V. cholerae than previously thought. However, V. furnissii lacks several important features normally associated with virulence in the human pathogens V. cholera and V. vulnificus. A striking feature of the V. furnissii genome is the hugely increased Super Integron, compared to the other Vibrio. Analyses of predicted genomic islands resulted in the discovery of a protein sequence that is present only in Vibrio associated with diseases in aquatic animals. We also discovered evidence of high levels horizontal gene transfer in V. furnissii. V. furnissii seems therefore to have a dynamic and fluid genome that could quickly adapt to environmental perturbation or increase its pathogenicity. Taken together, these analyses confirm the potential of V. furnissii as an emerging marine and possible human pathogen, especially in the developing, tropical, coastal regions that are most at risk from climate change.
Sequencing Needs for Viral Diagnostics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, S N; Lam, M; Mulakken, N J
2004-01-26
We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''nearmore » neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.« less
Pilgrim, Jack; Ander, Mats; Garros, Claire; Baylis, Matthew; Hurst, Gregory D D; Siozios, Stefanos
2017-10-01
There is increasing interest in the heritable bacteria of invertebrate vectors of disease as they present novel targets for control initiatives. Previous studies on biting midges (Culicoides spp.), known to transmit several RNA viruses of veterinary importance, have revealed infections with the endosymbiotic bacteria, Wolbachia and Cardinium. However, rickettsial symbionts in these vectors are underexplored. Here, we present the genome of a previously uncharacterized Rickettsia endosymbiont from Culicoides newsteadi (RiCNE). This genome presents unique features potentially associated with host invasion and adaptation, including genes for the complete non-oxidative phase of the pentose phosphate pathway, and others predicted to mediate lipopolysaccharides and cell wall modification. Screening of 414 Culicoides individuals from 29 Palearctic or Afrotropical species revealed that Rickettsia represent a widespread but previously overlooked association, reaching high frequencies in midge populations and present in 38% of the species tested. Sequence typing clusters the Rickettsia within the Torix group of the genus, a group known to infect several aquatic and hematophagous taxa. FISH analysis indicated the presence of Rickettsia bacteria in ovary tissue, indicating their maternal inheritance. Given the importance of biting midges as vectors, a key area of future research is to establish the impact of this endosymbiont on vector competence. © 2017 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.
Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang
2018-02-01
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Sequencing Adventure Activities: A New Perspective.
ERIC Educational Resources Information Center
Bisson, Christian
Sequencing in adventure education involves putting activities in an order appropriate to the needs of the group. Contrary to the common assumption that each adventure sequence is unique, a review of literature concerning five sequencing models reveals a certain universality. These models present sequences that move through four phases: group…
Stranieri, Andrew; Abawajy, Jemal; Kelarev, Andrei; Huda, Shamsul; Chowdhury, Morshed; Jelinek, Herbert F
2013-07-01
This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN). We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery. This is important as not all five Ewing tests can always be applied in each situation in practice. We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests. We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery. We found the best sequences of tests for cost-function equal to the number of tests. The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93. They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests. The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure. We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained. The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence. Copyright © 2013 Elsevier B.V. All rights reserved.
Liaskou, Evaggelia; Klemsdal Henriksen, Eva Kristine; Holm, Kristian; Kaveh, Fatemeh; Hamm, David; Fear, Janine; Viken, Marte K; Hov, Johannes Roksund; Melum, Espen; Robins, Harlan; Olweus, Johanna; Karlsen, Tom H; Hirschfield, Gideon M
2016-05-01
Hepatic T-cell infiltrates and a strong genetic human leukocyte antigen association represent characteristic features of various immune-mediated liver diseases. Conceptually the presence of disease-associated antigens is predicted to be reflected in T-cell receptor (TCR) repertoires. Here, we aimed to determine if disease-associated TCRs could be identified in the nonviral chronic liver diseases primary biliary cirrhosis (PBC), primary sclerosing cholangitis (PSC), and alcoholic liver disease (ALD). We performed high-throughput sequencing of the TCRβ chain complementarity-determining region 3 of liver-infiltrating T cells from PSC (n = 20), PBC (n = 10), and ALD (n = 10) patients, alongside genomic human leukocyte antigen typing. The frequency of TCRβ nucleotide sequences was significantly higher in PSC samples (2.53 ± 0.80, mean ± standard error of the mean) compared to PBC samples (1.13 ± 0.17, P < 0.0001) and ALD samples (0.62 ± 0.10, P < 0.0001). An average clonotype overlap of 0.85% was detected among PSC samples, significantly higher compared to the average overlap of 0.77% seen within the PBC (P = 0.024) and ALD groups (0.40%, P < 0.0001). From eight to 42 clonotypes were uniquely detected in each of the three disease groups (≥30% of the respective patient samples). Multiple, unique sequences using different variable family genes encoded the same amino acid clonotypes, providing additional support for antigen-driven selection. In PSC and PBC, disease-associated clonotypes were detected among patients with human leukocyte antigen susceptibility alleles. We demonstrate liver-infiltrating disease-associated clonotypes in all three diseases evaluated, and evidence for antigen-driven clonal expansions. Our findings indicate that differential TCR signatures, as determined by high-throughput sequencing, may represent an imprint of distinctive antigenic repertoires present in the different chronic liver diseases; this thereby opens up the prospect of studying disease-relevant T cells in order to better understand and treat liver disease. © 2015 by the American Association for the Study of Liver Diseases.
Ibrahim, Wisam; Abadeh, Mohammad Saniee
2017-05-21
Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.
The neXtProt peptide uniqueness checker: a tool for the proteomics community.
Schaeffer, Mathieu; Gateau, Alain; Teixeira, Daniel; Michel, Pierre-André; Zahn-Zabal, Monique; Lane, Lydie
2017-11-01
The neXtProt peptide uniqueness checker allows scientists to define which peptides can be used to validate the existence of human proteins, i.e. map uniquely versus multiply to human protein sequences taking into account isobaric substitutions, alternative splicing and single amino acid variants. The pepx program is available at https://github.com/calipho-sib/pepx and can be launched from the command line or through a cgi web interface. Indexing requires a sequence file in FASTA format. The peptide uniqueness checker tool is freely available on the web at https://www.nextprot.org/tools/peptide-uniqueness-checker and from the neXtProt API at https://api.nextprot.org/. lydie.lane@sib.swiss. © The Author(s) 2017. Published by Oxford University Press.
Hypersonic Vehicle Trajectory Optimization and Control
NASA Technical Reports Server (NTRS)
Balakrishnan, S. N.; Shen, J.; Grohs, J. R.
1997-01-01
Two classes of neural networks have been developed for the study of hypersonic vehicle trajectory optimization and control. The first one is called an 'adaptive critic'. The uniqueness and main features of this approach are that: (1) they need no external training; (2) they allow variability of initial conditions; and (3) they can serve as feedback control. This is used to solve a 'free final time' two-point boundary value problem that maximizes the mass at the rocket burn-out while satisfying the pre-specified burn-out conditions in velocity, flightpath angle, and altitude. The second neural network is a recurrent network. An interesting feature of this network formulation is that when its inputs are the coefficients of the dynamics and control matrices, the network outputs are the Kalman sequences (with a quadratic cost function); the same network is also used for identifying the coefficients of the dynamics and control matrices. Consequently, we can use it to control a system whose parameters are uncertain. Numerical results are presented which illustrate the potential of these methods.
Direct Evidence of Intrinsic Blue Fluorescence from Oligomeric Interfaces of Human Serum Albumin.
Bhattacharya, Arpan; Bhowmik, Soumitra; Singh, Amit K; Kodgire, Prashant; Das, Apurba K; Mukherjee, Tushar Kanti
2017-10-10
The molecular origin behind the concentration-dependent intrinsic blue fluorescence of human serum albumin (HSA) is not known yet. This unusual blue fluorescence is believed to be a characteristic feature of amyloid-like fibrils of protein/peptide and originates due to the delocalization of peptide bond electrons through the extended hydrogen bond networks of cross-β-sheet structure. Herein, by combining the results of spectroscopy, size exclusion chromatography, native gel electrophoresis, and confocal microscopy, we have shown that the intrinsic blue fluorescence of HSA exclusively originates from oligomeric interfaces devoid of any amyloid-like fibrillar structure. Our study suggests that this low energy fluorescence band is not due to any particular residue/sequence, but rather it is a common feature of self-assembled peptide bonds. The present findings of intrinsic blue fluorescence from oligomeric interfaces pave the way for future applications of this unique visual phenomenon for early stage detection of various protein aggregation related human diseases.
Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D
2015-07-07
Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Layered Deposits of Arabia Terra and Meridiani Planum: Keys to the Habitability of Ancient Mars
NASA Technical Reports Server (NTRS)
Allen, Carlton C.; Oehler, Dorothy Z.; Paris, Kristen N.; Venechuk, Elizabeth M.
2006-01-01
Understanding the habitability of ancient Mars is a key goal in the exploration of that planet. Evidence for conditions favorable to early life must be sought in ancient sedimentary rocks, such as those of Arabia Terra and Meridiani Planum. Arabia Terra, the northernmost extension of the ancient highlands, is dominated by cratered plains and minor ridged units. These plains extend south into the adjacent Meridiani Planum. The Opportunity rover landed in northern Meridiani, close to the border with Arabia. High resolution MOC images reveal extensive layered sequences across much of the Arabia and Meridiani region. These layers have been interpreted as eroded remnants of sedimentary rock deposits (Edgett, 2005). The layered sequences are concentrated in the SW quadrant of Arabia and in northern Meridiani. Preliminary mapping by Edgett (2005) distinguished four large scale layered sequences in the Arabia and Meridiani region. These have dimensions of hundreds to more than 1,000 km. MOLA altimetry shows that each of the sequences can attain a thickness of 200 to 400 m, with a total thickness greater than 1 km. The sequences are generally flat lying, with regional slopes of a few degrees. Much finer layering is evident within a number of craters. The plains and ridged units of the Arabia and Meridiani region were originally mapped as Noachian based on crater statistics, particularly the number of large craters (Scott and Carr, 1978). The layered sequences in the current study postdate many, but not all, of these large craters. The layered sequences have partially or totally filled a number of craters with diameters ranging from 20 to over 50 km. The topmost layered sequence, as well as the lower two sequences, have intermediate thermal inertia, as derived from THEMIS, indicative of moderate induration. The TES spectra from the lower sequences include features indicative of basalt. Some areas of the topmost sequence, which includes the Opportunity landing site, have TES spectra dominated by hematite. Just below this topmost sequence lies a sequence with higher thermal inertia, indicative of more indurated or coarser grained material. The TES spectra of this sequence lack distinctive mineral features, and the rocks may be obscured by a thin coating of dust. The layers have been extensively eroded. The uppermost sequences are characterized by deeply scalloped boundaries. Filled craters have been partially exhumed. Finely layered deposits within craters have been strongly dissected. Landforms uniquely attributable to wind erosion are rare, but erosive styles and geomorphology characteristic of water and possibly ice are present. The layered sequences in Arabia Terra and Meridiani Planum likely reflect an epoch when the planet was much more habitable than it is today. Several areas in these layered sequences are under intensive study as candidate landing sites for the 2009 Mars Science Laboratory.
Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun
2013-01-01
Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Gupta, R S
1998-12-01
The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes.
Gupta, Radhey S.
1998-01-01
The presence of shared conserved insertion or deletions (indels) in protein sequences is a special type of signature sequence that shows considerable promise for phylogenetic inference. An alternative model of microbial evolution based on the use of indels of conserved proteins and the morphological features of prokaryotic organisms is proposed. In this model, extant archaebacteria and gram-positive bacteria, which have a simple, single-layered cell wall structure, are termed monoderm prokaryotes. They are believed to be descended from the most primitive organisms. Evidence from indels supports the view that the archaebacteria probably evolved from gram-positive bacteria, and I suggest that this evolution occurred in response to antibiotic selection pressures. Evidence is presented that diderm prokaryotes (i.e., gram-negative bacteria), which have a bilayered cell wall, are derived from monoderm prokaryotes. Signature sequences in different proteins provide a means to define a number of different taxa within prokaryotes (namely, low G+C and high G+C gram-positive, Deinococcus-Thermus, cyanobacteria, chlamydia-cytophaga related, and two different groups of Proteobacteria) and to indicate how they evolved from a common ancestor. Based on phylogenetic information from indels in different protein sequences, it is hypothesized that all eukaryotes, including amitochondriate and aplastidic organisms, received major gene contributions from both an archaebacterium and a gram-negative eubacterium. In this model, the ancestral eukaryotic cell is a chimera that resulted from a unique fusion event between the two separate groups of prokaryotes followed by integration of their genomes. PMID:9841678
Asrat, Asfawossen; Bahain, Jean-Jacques; Chapon, Cécile; Douville, Eric; Fragnol, Carole; Hernandez, Marion; Hovers, Erella; Leplongeon, Alice; Martin, Loïc; Pleurdeau, David; Pearson, Osbjorn; Puaud, Simon; Assefa, Zelalem
2017-01-01
Goda Buticha is a cave site near Dire Dawa in southeastern Ethiopia that contains an archaeological sequence sampling the late Pleistocene and Holocene of the region. The sedimentary sequence displays complex cultural, chronological and sedimentological histories that seem incongruent with one another. A first set of radiocarbon ages suggested a long sedimentological gap from the end of Marine Isotopic Stage (MIS) 3 to the mid-Holocene. Macroscopic observations suggest that the main sedimentological change does not coincide with the chronostratigraphic hiatus. The cultural sequence shows technological continuity with a late persistence of artifacts that are usually attributed to the Middle Stone Age into the younger parts of the stratigraphic sequence, yet become increasingly associated with lithic artifacts typically related to the Later Stone Age. While not a unique case, this combination of features is unusual in the Horn of Africa. In order to evaluate the possible implications of these observations, sedimentological analyses combined with optically stimulated luminescence (OSL) were conducted. The OSL data now extend the radiocarbon chronology up to 63 ± 7 ka; they also confirm the existence of the chronological gap between 24.8 ± 2.6 ka and 7.5 ± 0.3 ka. The sedimentological analyses suggest that the origin and mode of deposition were largely similar throughout the whole sequence, although the anthropic and faunal activities increased in the younger levels. Regional climatic records are used to support the sedimentological observations and interpretations. We discuss the implications of the sedimentological and dating analyses for understanding cultural processes in the region. PMID:28125597
Tribolo, Chantal; Asrat, Asfawossen; Bahain, Jean-Jacques; Chapon, Cécile; Douville, Eric; Fragnol, Carole; Hernandez, Marion; Hovers, Erella; Leplongeon, Alice; Martin, Loïc; Pleurdeau, David; Pearson, Osbjorn; Puaud, Simon; Assefa, Zelalem
2017-01-01
Goda Buticha is a cave site near Dire Dawa in southeastern Ethiopia that contains an archaeological sequence sampling the late Pleistocene and Holocene of the region. The sedimentary sequence displays complex cultural, chronological and sedimentological histories that seem incongruent with one another. A first set of radiocarbon ages suggested a long sedimentological gap from the end of Marine Isotopic Stage (MIS) 3 to the mid-Holocene. Macroscopic observations suggest that the main sedimentological change does not coincide with the chronostratigraphic hiatus. The cultural sequence shows technological continuity with a late persistence of artifacts that are usually attributed to the Middle Stone Age into the younger parts of the stratigraphic sequence, yet become increasingly associated with lithic artifacts typically related to the Later Stone Age. While not a unique case, this combination of features is unusual in the Horn of Africa. In order to evaluate the possible implications of these observations, sedimentological analyses combined with optically stimulated luminescence (OSL) were conducted. The OSL data now extend the radiocarbon chronology up to 63 ± 7 ka; they also confirm the existence of the chronological gap between 24.8 ± 2.6 ka and 7.5 ± 0.3 ka. The sedimentological analyses suggest that the origin and mode of deposition were largely similar throughout the whole sequence, although the anthropic and faunal activities increased in the younger levels. Regional climatic records are used to support the sedimentological observations and interpretations. We discuss the implications of the sedimentological and dating analyses for understanding cultural processes in the region.
Inforna 2.0: A Platform for the Sequence-Based Design of Small Molecules Targeting Structured RNAs.
Disney, Matthew D; Winkelsas, Audrey M; Velagapudi, Sai Pradeep; Southern, Mark; Fallahi, Mohammad; Childs-Disney, Jessica L
2016-06-17
The development of small molecules that target RNA is challenging yet, if successful, could advance the development of chemical probes to study RNA function or precision therapeutics to treat RNA-mediated disease. Previously, we described Inforna, an approach that can mine motifs (secondary structures) within target RNAs, which is deduced from the RNA sequence, and compare them to a database of known RNA motif-small molecule binding partners. Output generated by Inforna includes the motif found in both the database and the desired RNA target, lead small molecules for that target, and other related meta-data. Lead small molecules can then be tested for binding and affecting cellular (dys)function. Herein, we describe Inforna 2.0, which incorporates all known RNA motif-small molecule binding partners reported in the scientific literature, a chemical similarity searching feature, and an improved user interface and is freely available via an online web server. By incorporation of interactions identified by other laboratories, the database has been doubled, containing 1936 RNA motif-small molecule interactions, including 244 unique small molecules and 1331 motifs. Interestingly, chemotype analysis of the compounds that bind RNA in the database reveals features in small molecule chemotypes that are privileged for binding. Further, this updated database expanded the number of cellular RNAs to which lead compounds can be identified.
NASA Astrophysics Data System (ADS)
Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L.; Hamprecht, Fred A.; Winkler, Andreas
2014-06-01
Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.
van de Guchte, M; Penaud, S; Grimaldi, C; Barbe, V; Bryson, K; Nicolas, P; Robert, C; Oztas, S; Mangenot, S; Couloux, A; Loux, V; Dervyn, R; Bossy, R; Bolotin, A; Batto, J-M; Walunas, T; Gibrat, J-F; Bessières, P; Weissenbach, J; Ehrlich, S D; Maguin, E
2006-06-13
Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus) is a representative of the group of lactic acid-producing bacteria, mainly known for its worldwide application in yogurt production. The genome sequence of this bacterium has been determined and shows the signs of ongoing specialization, with a substantial number of pseudogenes and incomplete metabolic pathways and relatively few regulatory functions. Several unique features of the L. bulgaricus genome support the hypothesis that the genome is in a phase of rapid evolution. (i) Exceptionally high numbers of rRNA and tRNA genes with regard to genome size may indicate that the L. bulgaricus genome has known a recent phase of important size reduction, in agreement with the observed high frequency of gene inactivation and elimination; (ii) a much higher GC content at codon position 3 than expected on the basis of the overall GC content suggests that the composition of the genome is evolving toward a higher GC content; and (iii) the presence of a 47.5-kbp inverted repeat in the replication termination region, an extremely rare feature in bacterial genomes, may be interpreted as a transient stage in genome evolution. The results indicate the adaptation of L. bulgaricus from a plant-associated habitat to the stable protein and lactose-rich milk environment through the loss of superfluous functions and protocooperation with Streptococcus thermophilus.
Defensins and the convergent evolution of platypus and reptile venom genes.
Whittington, Camilla M; Papenfuss, Anthony T; Bansal, Paramjit; Torres, Allan M; Wong, Emily S W; Deakin, Janine E; Graves, Tina; Alsop, Amber; Schatzkamer, Kyriena; Kremitzki, Colin; Ponting, Chris P; Temple-Smith, Peter; Warren, Wesley C; Kuchel, Philip W; Belov, Katherine
2008-06-01
When the platypus (Ornithorhynchus anatinus) was first discovered, it was thought to be a taxidermist's hoax, as it has a blend of mammalian and reptilian features. It is a most remarkable mammal, not only because it lays eggs but also because it is venomous. Rather than delivering venom through a bite, as do snakes and shrews, male platypuses have venomous spurs on each hind leg. The platypus genome sequence provides a unique opportunity to unravel the evolutionary history of many of these interesting features. While searching the platypus genome for the sequences of antimicrobial defensin genes, we identified three Ornithorhynchus venom defensin-like peptide (OvDLP) genes, which produce the major components of platypus venom. We show that gene duplication and subsequent functional diversification of beta-defensins gave rise to these platypus OvDLPs. The OvDLP genes are located adjacent to the beta-defensins and share similar gene organization and peptide structures. Intriguingly, some species of snakes and lizards also produce venoms containing similar molecules called crotamines and crotamine-like peptides. This led us to trace the evolutionary origins of other components of platypus and reptile venom. Here we show that several venom components have evolved separately in the platypus and reptiles. Convergent evolution has repeatedly selected genes coding for proteins containing specific structural motifs as templates for venom molecules.
Defensins and the convergent evolution of platypus and reptile venom genes
Whittington, Camilla M.; Papenfuss, Anthony T.; Bansal, Paramjit; Torres, Allan M.; Wong, Emily S.W.; Deakin, Janine E.; Graves, Tina; Alsop, Amber; Schatzkamer, Kyriena; Kremitzki, Colin; Ponting, Chris P.; Temple-Smith, Peter; Warren, Wesley C.; Kuchel, Philip W.; Belov, Katherine
2008-01-01
When the platypus (Ornithorhynchus anatinus) was first discovered, it was thought to be a taxidermist’s hoax, as it has a blend of mammalian and reptilian features. It is a most remarkable mammal, not only because it lays eggs but also because it is venomous. Rather than delivering venom through a bite, as do snakes and shrews, male platypuses have venomous spurs on each hind leg. The platypus genome sequence provides a unique opportunity to unravel the evolutionary history of many of these interesting features. While searching the platypus genome for the sequences of antimicrobial defensin genes, we identified three Ornithorhynchus venom defensin-like peptide (OvDLP) genes, which produce the major components of platypus venom. We show that gene duplication and subsequent functional diversification of beta-defensins gave rise to these platypus OvDLPs. The OvDLP genes are located adjacent to the beta-defensins and share similar gene organization and peptide structures. Intriguingly, some species of snakes and lizards also produce venoms containing similar molecules called crotamines and crotamine-like peptides. This led us to trace the evolutionary origins of other components of platypus and reptile venom. Here we show that several venom components have evolved separately in the platypus and reptiles. Convergent evolution has repeatedly selected genes coding for proteins containing specific structural motifs as templates for venom molecules. PMID:18463304
Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L; Hamprecht, Fred A; Winkler, Andreas
2014-06-01
Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.
NASA Astrophysics Data System (ADS)
Sarkar, Subir; Banerjee, Santanu; Samanta, Pradip; Chakraborty, Nivedita; Chakraborty, Partha Pratim; Mukhopadhyay, Soumik; Singh, Arvind K.
2014-09-01
Microbial mat-related structures (MRS) in siliciclastics have been investigated from four Proterozic formations in India, namely the Marwar Supergroup, the Vindhyan Supergroup, the Chhatisgarh Supergroup and the Khariar Group for their spectral variations, genetic aspects, palaeo-environmental significance and influence on sequence stratigraphic architecture. The maximum diversification of MRS has been experienced in shallow marine coastal Precambrian successions. Observations made from modern environment as well as Precambrian rock records clearly indicates that the features like petee ridges, sand-cracks, gas domes, multi-directed ripples, reticulate surfaces, sieve-like surfaces and setulf are most likely to form in the shallowest part of the marine basins, in upper intertidal to supratidal conditions while wrinkle structures, roll-up structures and patchy ripples had a broader range of palaeogeographic settings from the supratidal to subtidal conditions. Discoidal microbial colony (DMC) represents a special variety of the mat-layer feature in modern environment that may have diverse internal architecture, sometimes falsely resembles Ediacaran medusoids. The uniqueness in sequence stratigraphic architecture of the microbial mat-covered sediment is reflected by the presence of more amalgamated HSTs compare to that of TSTs. The preservation of forced and normal regressive deposits on low-gradient epeiric shelf under low continental freeboard indicates microbial mat-infested sea-floor impedes erosion and concomitant sediment supply may facilitate formation and preservation of regressive packages.
Greene, Michelle R; Baldassano, Christopher; Fei-Fei, Li; Beck, Diane M; Baker, Chris I
2018-01-01
Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information. PMID:29513219
Groen, Iris Ia; Greene, Michelle R; Baldassano, Christopher; Fei-Fei, Li; Beck, Diane M; Baker, Chris I
2018-03-07
Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.
Greiner, Stephan; Wang, Xi; Herrmann, Reinhold G; Rauwolf, Uwe; Mayer, Klaus; Haberer, Georg; Meurer, Jörg
2008-09-01
A unique combination of genetic features and a rich stock of information make the flowering plant genus Oenothera an appealing model to explore the molecular basis of speciation processes including nucleus-organelle coevolution. From representative species, we have recently reported complete nucleotide sequences of the 5 basic and genetically distinguishable plastid chromosomes of subsection Oenothera (I-V). In nature, Oenothera plastid genomes are associated with 6 distinct, either homozygous or heterozygous, diploid nuclear genotypes of the 3 basic genomes A, B, or C. Artificially produced plastome-genome combinations that do not occur naturally often display interspecific plastome-genome incompatibility (PGI). In this study, we compare formal genetic data available from all 30 plastome-genome combinations with sequence differences between the plastomes to uncover potential determinants for interspecific PGI. Consistent with an active role in speciation, a remarkable number of genes have high Ka/Ks ratios. Different from the Solanacean cybrid model Atropa/tobacco, RNA editing seems not to be relevant for PGIs in Oenothera. However, predominantly sequence polymorphisms in intergenic segments are proposed as possible sources for PGI. A single locus, the bidirectional promoter region between psbB and clpP, is suggested to contribute to compartmental PGI in the interspecific AB hybrid containing plastome I (AB-I), consistent with its perturbed photosystem II activity.
Draft Genome Sequencing and Comparative Analysis of Aspergillus sojae NBRC4239
Sato, Atsushi; Oshima, Kenshiro; Noguchi, Hideki; Ogawa, Masahiro; Takahashi, Tadashi; Oguma, Tetsuya; Koyama, Yasuji; Itoh, Takehiko; Hattori, Masahira; Hanya, Yoshiki
2011-01-01
We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of α-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries. PMID:21659486
Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Ou, Hong-Yu; Qu, Hongping
2018-01-01
Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 10 2 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance.
Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Qu, Hongping
2018-01-01
ABSTRACT Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 102 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance. PMID:29338592
The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.).
Yang, Meng; Zhang, Xiaowei; Liu, Guiming; Yin, Yuxin; Chen, Kaifu; Yun, Quanzheng; Zhao, Duojun; Al-Mssallem, Ibrahim S; Yu, Jun
2010-09-15
Date palm (Phoenix dactylifera L.), a member of Arecaceae family, is one of the three major economically important woody palms--the two other palms being oil palm and coconut tree--and its fruit is a staple food among Middle East and North African nations, as well as many other tropical and subtropical regions. Here we report a complete sequence of the data palm chloroplast (cp) genome based on pyrosequencing. After extracting 369,022 cp sequencing reads from our whole-genome-shotgun data, we put together an assembly and validated it with intensive PCR-based verification, coupled with PCR product sequencing. The date palm cp genome is 158,462 bp in length and has a typical quadripartite structure of the large (LSC, 86,198 bp) and small single-copy (SSC, 17,712 bp) regions separated by a pair of inverted repeats (IRs, 27,276 bp). Similar to what has been found among most angiosperms, the date palm cp genome harbors 112 unique genes and 19 duplicated fragments in the IR regions. The junctions between LSC/IRs and SSC/IRs show different features of sequence expansion in evolution. We identified 78 SNPs as major intravarietal polymorphisms within the population of a specific cp genome, most of which were located in genes with vital functions. Based on RNA-sequencing data, we also found 18 polycistronic transcription units and three highly expression-biased genes--atpF, trnA-UGC, and rrn23. Unlike most monocots, date palm has a typical cp genome similar to that of tobacco--with little rearrangement and gene loss or gain. High-throughput sequencing technology facilitates the identification of intravarietal variations in cp genomes among different cultivars. Moreover, transcriptomic analysis of cp genes provides clues for uncovering regulatory mechanisms of transcription and translation in chloroplasts.
ERIC Educational Resources Information Center
Batzli, Janet M.
2005-01-01
''Why four semesters? How does this track differ from the two-semester course sequence?'' These are the most common questions students have when they learn about the Biology Core Curriculum (Biocore), a unique four-semester honors biology sequence at University of Wisconsin-Madison (UW-Madison). Biocore was first taught at University of Wisconsin…
Tai, Huanhuan; Lu, Xin; Opitz, Nina; Marcon, Caroline; Paschold, Anja; Lithio, Andrew; Nettleton, Dan; Hochholdinger, Frank
2016-01-01
Maize develops a complex root system composed of embryonic and post-embryonic roots. Spatio-temporal differences in the formation of these root types imply specific functions during maize development. A comparative transcriptomic study of embryonic primary and seminal, and post-embryonic crown roots of the maize inbred line B73 by RNA sequencing along with anatomical studies were conducted early in development. Seminal roots displayed unique anatomical features, whereas the organization of primary and crown roots was similar. For instance, seminal roots displayed fewer cortical cell files and their stele contained more meta-xylem vessels. Global expression profiling revealed diverse patterns of gene activity across all root types and highlighted the unique transcriptome of seminal roots. While functions in cell remodeling and cell wall formation were prominent in primary and crown roots, stress-related genes and transcriptional regulators were over-represented in seminal roots, suggesting functional specialization of the different root types. Dynamic expression of lignin biosynthesis genes and histochemical staining suggested diversification of cell wall lignification among the three root types. Our findings highlight a cost-efficient anatomical structure and a unique expression profile of seminal roots of the maize inbred line B73 different from primary and crown roots. PMID:26628518
Han, Limin; Chen, Chen; Wang, Zhezhi
2018-01-01
Epipremnum aureum is an important foliage plant in the Araceae family. In this study, we have sequenced the complete chloroplast genome of E. aureum by using Illumina Hiseq sequencing platforms. This genome is a double-stranded circular DNA sequence of 164,831 bp that contains 35.8% GC. The two inverted repeats (IRa and IRb; 26,606 bp) are spaced by a small single-copy region (22,868 bp) and a large single-copy region (88,751 bp). The chloroplast genome has 131 (113 unique) functional genes, including 86 (79 unique) protein-coding genes, 37 (30 unique) tRNA genes, and eight (four unique) rRNA genes. Tandem repeats comprise the majority of the 43 long repetitive sequences. In addition, 111 simple sequence repeats are present, with mononucleotides being the most common type and di- and tetranucleotides being infrequent events. Positive selection pressure on rps12 in the E. aureum chloroplast has been demonstrated via synonymous and nonsynonymous substitution rates and selection pressure sites analyses. Ycf15 and infA are pseudogenes in this species. We constructed a Maximum Likelihood phylogenetic tree based on the complete chloroplast genomes of 38 species from 13 families. Those results strongly indicated that E. aureum is positioned as the sister of Colocasia esculenta within the Araceae family. This work may provide information for further study of the molecular phylogenetic relationships within Araceae, as well as molecular markers and breeding novel varieties by chloroplast genetic-transformation of E. aureum in particular. PMID:29529038
Cross-cutting Relationships of Surface Features on Europa
NASA Technical Reports Server (NTRS)
1997-01-01
This image of Jupiter's moon Europa shows a very complex terrain of ridges and fractures. The absence of large craters and the low number of small craters indicates that this surface is geologically young. The relative ages of the ridges can be determined by using the principle of cross-cutting relationships; i.e. older features are cross-cut by younger features. Using this principle, planetary geologists are able to unravel the sequence of events in this seemingly chaotic terrain to unfold Europa's unique geologic history.
The spacecraft Galileo obtained this image on February 20, 1997. The area covered in this image is approximately 11 miles (18 kilometers) by 8.5 miles (14 kilometers) across, near 15 North, 273 West. North is toward the top of the image, with the sun illuminating from the right.The Jet Propulsion Laboratory, Pasadena, CA manages the mission for NASA's Office of Space Science, Washington, DC.This image and other images and data received from Galileo are posted on the World Wide Web, on the Galileo mission home page at URL http://galileo.jpl.nasa.gov. Background information and educational context for the images can be found at URL http://www.jpl.nasa.gov/galileo/sepoExtraction of Molecular Features through Exome to Transcriptome Alignment
Mudvari, Prakriti; Kowsari, Kamran; Cole, Charles; Mazumder, Raja; Horvath, Anelia
2014-01-01
Integrative Next Generation Sequencing (NGS) DNA and RNA analyses have very recently become feasible, and the published to date studies have discovered critical disease implicated pathways, and diagnostic and therapeutic targets. A growing number of exomes, genomes and transcriptomes from the same individual are quickly accumulating, providing unique venues for mechanistic and regulatory features analysis, and, at the same time, requiring new exploration strategies. In this study, we have integrated variation and expression information of four NGS datasets from the same individual: normal and tumor breast exomes and transcriptomes. Focusing on SNPcentered variant allelic prevalence, we illustrate analytical algorithms that can be applied to extract or validate potential regulatory elements, such as expression or growth advantage, imprinting, loss of heterozygosity (LOH), somatic changes, and RNA editing. In addition, we point to some critical elements that might bias the output and recommend alternative measures to maximize the confidence of findings. The need for such strategies is especially recognized within the growing appreciation of the concept of systems biology: integrative exploration of genome and transcriptome features reveal mechanistic and regulatory insights that reach far beyond linear addition of the individual datasets. PMID:24791251
Genomes as geography: using GIS technology to build interactive genome feature maps
Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J
2006-01-01
Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652
Seismicity of the Adriatic microplate
Console, R.; Di, Giovambattista R.; Favali, P.; Presgrave, B.W.; Smriglio, G.
1993-01-01
The Adriatic microplate was previously considered to be a unique block, tectonically active only along its margins. The seismic sequences that took place in the basin from 1986 to 1990 give new information about the geodynamics of this area. Three subsets of well recorded events were relocated by the joint hypocentre determination technique. On the whole, this seismic activity was concentrated in a belt crossing the southern Adriatic sea around latitude 42??, in connection with regional E-W fault systems. Some features of this seismicity, similar to those observed in other well known active margins of the Adriatic plate, support a model of a southern Adriatic lithospheric block, detached from the Northern one. Other geophysical information provides evidence of a transitional zone at the same latitude. ?? 1993.
Mapping, fine mapping, and molecular dissection of quantitative trait Loci in domestic animals.
Georges, Michel
2007-01-01
Artificial selection has created myriad breeds of domestic animals, each characterized by unique phenotypes pertaining to behavior, morphology, physiology, and disease. Most domestic animal populations share features with isolated founder populations, making them well suited for positional cloning. Genome sequences are now available for most domestic species, and with them a panoply of tools including high-density single-nucleotide polymorphism panels. As a result, domestic animal populations are becoming invaluable resources for studying the molecular architecture of complex traits and of adaptation. Here we review recent progress and issues in the positional identification of genes underlying complex traits in domestic animals. As many phenotypes studied in animals are quantitative, we focus on mapping, fine mapping, and cloning of quantitative trait loci.
Prediction of enhancer-promoter interactions via natural language processing.
Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui
2018-05-09
Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
The management of nonunion and malunion of the distal humerus--a 30-year experience.
Jupiter, Jesse B
2008-01-01
This personal series of nonunions of the distal humerus reviews unique features of this problem, categorizes them according to unique anatomic features, and presents pitfalls and pearls in the management of these complex reconstructive problems.
The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika
2010-01-27
Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less
Nair, Divya N; Prasad, Rajesh; Singhal, Neha; Bhattacharjee, Manish; Sudhakar, Renu; Singh, Pushpa; Thanumalayan, Subramonian; Kiran, Uday; Sharma, Yogendra; Sijwali, Puran Singh
2018-06-01
Plasmodium falciparum DJ1 (PfDJ1) belongs to the DJ-1/ThiJ/PfpI superfamily whose members are present in all the kingdoms of life and exhibit diverse cellular functions and biochemical activities. The common feature of the superfamily is the class I glutamine amidotransferase domain with a conserved redox-active cysteine residue, which mediates various activities of the superfamily members, including anti-oxidative activity in PfDJ1 and human DJ1 (hDJ1). As the superfamily members represent diverse functional classes, to investigate if there is any sequence feature unique to hDJ1-like proteins, sequences of the representative proteins of different functional classes were compared and analysed. A novel motif unique to PfDJ1 and several other hDJ1-like proteins, with the consensus sequence of TSXGPX5FXLX5L, was identified that we designated as the hDJ1-subfamily motif (DJSM). Several mutations that have been associated with Parkinson's disease are also present in DJSM, suggesting its functional importance in hDJ1-like proteins. Mutations of the conserved residues of DJSM of PfDJ1 did not significantly affect overall secondary structure, but caused both a significant loss (S151A and P154A) and gain (L168A) of anti-oxidative activity. We also report that PfDJ1 has deglycase activity, which was significantly decreased in its mutants of the catalytic cysteine (C106A) and DJSM (S151A and P154A). Episomal expression of the catalytic cysteine (C106A) or DJSM (P154A) mutant decreased growth rates of parasites as compared to that of wild type parasites or parasites expressing wild type PfDJ1. S151 appears to properly position the nucleophilic elbow containing C106 and P154 forms a hydrogen bond with C106, which could be a reason for the loss of activities of PfDJ1 upon their mutations. Taken together, DJSM delineates PfDJ1 and other hDJ1-subfamily proteins from the remaining superfamily, and is critical for anti-oxidative and deglycase activities of PfDJ1. Copyright © 2018 Elsevier B.V. All rights reserved.
Deng, Ke-Jun; Yang, Zu-Jun; Liu, Cheng; Zhao, Wei; Liu, Chang; Feng, Juan; Ren, Zheng-Long
2007-03-01
Genetic characterization of 9 populations of Rhodiola crenulata, R. fastigiata and R. sachalinensis (Crassulaceae) species from Sichuan and Jilin Provinces of China, was investigated using the conserved primer of nad7 intron 2. All PCR products about 800 bp long were shorter than other Crassulaceae plants, which were used as molecular markers to identify the Rhodiola species. The sequence of the products indicated that total exon of 53 bp and intron of 738 bp exhibit only 9 nucleotide variations. Blasting the nad7 sequences to GenBank and the phylogenetic analysis showed that the sequence of Rhodiola species was clusted independently, and the length was smaller than all the registered sequences of higher plants. The result suggests that the Rhiodola species had a unique sequence in this gene region, which might be related to the special growth condition.
Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria
Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin
2011-01-01
Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374
Zhang, Min; Wei, Zhiyi; Chang, Shaojie; Teng, Maikun; Gong, Weimin
2006-04-21
A 31kDa cysteine protease, SPE31, was isolated from the seeds of a legume plant, Pachyrizhus erosus. The protein was purified, crystallized and the 3D structure solved using molecular replacement. The cDNA was obtained by RT PCR followed by amplification using mRNA isolated from the seeds of the legume plant as a template. Analysis of the cDNA sequence and the 3D structure indicated the protein to belong to the papain family. Detailed analysis of the structure revealed an unusual replacement of the conserved catalytic Cys with Gly. Replacement of another conserved residue Ala/Gly by a Phe sterically blocks the access of the substrate to the active site. A polyethyleneglycol molecule and a natural peptide fragment were bound to the surface of the active site. Asn159 was found to be glycosylated. The SPE31 cDNA sequence shares several features with P34, a protein found in soybeans, that is implicated in plant defense mechanisms as an elicitor receptor binding to syringolide. P34 has also been shown to interact with vegetative storage proteins and NADH-dependent hydroxypyruvate reductase. These roles suggest that SPE31 and P34 form a unique subfamily within the papain family. The crystal structure of SPE31 complexed with a natural peptide ligand reveals a unique active site architecture. In addition, the clear evidence of glycosylated Asn159 provides useful information towards understanding the functional mechanism of SPE31/P34.
Kusada, Hiroyuki; Kameyama, Keishi; Meng, Xian-Ying; Kamagata, Yoichi; Tamaki, Hideyuki
2017-12-22
Our previous study shows that an anaerobic intestinal bacterium strain AJ110941 P contributes to type 2 diabetes development in mice. Here we phylogenetically and physiologically characterized this unique mouse gut bacterium. The 16S rRNA gene analysis revealed that the strain belongs to the family Lachnospiraceae but shows low sequence similarities ( < 92.5%) to valid species, and rather formed a distinct cluster with uncultured mouse gut bacteria clones. In metagenomic database survey, the 16S sequence of AJ110941 P also matched with mouse gut-derived datasets (56% of total datasets) with > 99% similarity, suggesting that AJ110941 P -related bacteria mainly reside in mouse digestive tracts. Strain AJ110941 P shared common physiological traits (e.g., Gram-positive, anaerobic, mesophilic, and fermentative growth with carbohydrates) with relative species of the Lachnospiraceae. Notably, the biofilm-forming capacity was found in both AJ110941 P and relative species. However, AJ110941 P possessed far more strong ability to produce biofilm than relative species and formed unique structure of extracellular polymeric substances. Furthermore, AJ110941 P cells are markedly long fusiform-shaped rods (9.0-62.5 µm) with multiple flagella that have never been observed in any other Lachnospiraceae members. Based on the phenotypic and phylogenetic features, we propose a new genus and species, Fusimonas intestini gen. nov., sp. nov. for strain AJ110941 P (FERM BP-11443).
Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo
2018-03-16
Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.
Poczai, Péter; Hyvönen, Jaakko
2017-01-01
Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC-rps14 region and 6-kb in the trnG-UCC-psbD, followed by a third <1kb inversion in the trnT sequence.
Hyvönen, Jaakko
2017-01-01
Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC–rps14 region and 6-kb in the trnG-UCC–psbD, followed by a third <1kb inversion in the trnT sequence. PMID:29095905
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.
Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred
2018-01-01
The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases
Assmus, Jens; Kleffe, Jürgen; Schmitt, Armin O.; Brockmann, Gudrun A.
2013-01-01
There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence. PMID:23658777
Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah
2012-01-01
Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.
Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah
2012-01-01
Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603
repRNA: a web server for generating various feature vectors of RNA sequences.
Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen
2016-02-01
With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called "representations of RNA sequences" (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users' own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/ .
Impact origin of sediments at the Opportunity landing site on Mars.
Knauth, L Paul; Burt, Donald M; Wohletz, Kenneth H
2005-12-22
Mars Exploration Rover Opportunity discovered sediments with layered structures thought to be unique to aqueous deposition and with minerals attributed to evaporation of an acidic salty sea. Remarkable iron-rich spherules were ascribed to later groundwater alteration, and the inferred abundance of water reinforced optimism that Mars was once habitable. The layered structures, however, are not unique to water deposition, and the scenario encounters difficulties in accounting for highly soluble salts admixed with less soluble salts, the lack of clay minerals from acid-rock reactions, high sphericity and near-uniform sizes of the spherules and the absence of a basin boundary. Here we present a simple alternative explanation involving deposition from a ground-hugging turbulent flow of rock fragments, salts, sulphides, brines and ice produced by meteorite impact. Subsequent weathering by intergranular water films can account for all of the features observed without invoking shallow seas, lakes or near-surface aquifers. Layered sequences observed elsewhere on heavily cratered Mars and attributed to wind, water or volcanism may well have formed similarly. If so, the search for past life on Mars should be reassessed accordingly.
Kilias, Stephanos P.; Nomikou, Paraskevi; Papanikolaou, Dimitrios; Polymenakou, Paraskevi N.; Godelitsas, Athanasios; Argyraki, Ariadne; Carey, Steven; Gamaletsos, Platon; Mertzimekis, Theo J.; Stathopoulou, Eleni; Goettlicher, Joerg; Steininger, Ralph; Betzelou, Konstantina; Livanos, Isidoros; Christakis, Christos; Bell, Katherine Croff; Scoullos, Michael
2013-01-01
We report on integrated geomorphological, mineralogical, geochemical and biological investigations of the hydrothermal vent field located on the floor of the density-stratified acidic (pH ~ 5) crater of the Kolumbo shallow-submarine arc-volcano, near Santorini. Kolumbo features rare geodynamic setting at convergent boundaries, where arc-volcanism and seafloor hydrothermal activity are occurring in thinned continental crust. Special focus is given to unique enrichments of polymetallic spires in Sb and Tl (±Hg, As, Au, Ag, Zn) indicating a new hybrid seafloor analogue of epithermal-to-volcanic-hosted-massive-sulphide deposits. Iron microbial-mat analyses reveal dominating ferrihydrite-type phases, and high-proportion of microbial sequences akin to "Nitrosopumilus maritimus", a mesophilic Thaumarchaeota strain capable of chemoautotrophic growth on hydrothermal ammonia and CO2. Our findings highlight that acidic shallow-submarine hydrothermal vents nourish marine ecosystems in which nitrifying Archaea are important and suggest ferrihydrite-type Fe3+-(hydrated)-oxyhydroxides in associated low-temperature iron mats are formed by anaerobic Fe2+-oxidation, dependent on microbially produced nitrate. PMID:23939372
Mosaic Graphs and Comparative Genomics in Phage Communities
Belcaid, Mahdi; Bergeron, Anne
2010-01-01
Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413
Production of Supra-regular Spatial Sequences by Macaque Monkeys.
Jiang, Xinjian; Long, Tenghai; Cao, Weicong; Li, Junru; Dehaene, Stanislas; Wang, Liping
2018-06-18
Understanding and producing embedded sequences in language, music, or mathematics, is a central characteristic of our species. These domains are hypothesized to involve a human-specific competence for supra-regular grammars, which can generate embedded sequences that go beyond the regular sequences engendered by finite-state automata. However, is this capacity truly unique to humans? Using a production task, we show that macaque monkeys can be trained to produce time-symmetrical embedded spatial sequences whose formal description requires supra-regular grammars or, equivalently, a push-down stack automaton. Monkeys spontaneously generalized the learned grammar to novel sequences, including longer ones, and could generate hierarchical sequences formed by an embedding of two levels of abstract rules. Compared to monkeys, however, preschool children learned the grammars much faster using a chunking strategy. While supra-regular grammars are accessible to nonhuman primates through extensive training, human uniqueness may lie in the speed and learning strategy with which they are acquired. Copyright © 2018 Elsevier Ltd. All rights reserved.
2011-01-01
Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism. PMID:21816042
Liu, Q; Yong, C B; Astell, C R
1994-06-01
Previous characterization of the terminal sequences of the minute virus of mice (MVM) genome demonstrated that the right hand palindrome contains two sequences, each the inverted complement of the other. However, the left hand palindrome was shown to exist as a unique sequence [Astell et al., J. Virol. 54: 179-185 (1985)]. The modified rolling hairpin (MRH) model for MVM replication provided an explanation of how the right hand palindrome could undergo hairpin transfer to generate two sequences, while the left end palindrome within the dimer bridge could undergo asymmetric resolution and retain the unique left end sequence. This report describes in vitro resolution of the wild-type dimer bridge sequence of MVM using recombinant (baculovirus) expressed NS-1 and a replication extract from LA9 cells. The resolution products are consistent with those predicted by the MRH model, providing support for this replication mechanism. In addition, mutant dimer bridge clones were constructed and used in the resolution assay. The mutant structures included removal of the asymmetry in the hairpin stem, inversion of the sequence at the initiating nick site, and a 2-bp deletion within one stem of the dimer bridge. In all cases, the mutant dimer bridge structures are resolved; however, the resolution pattern observed with the mutant dimer bridge compared with the wild-type dimer bridge is shifted toward symmetrical resolution. These results suggest that sequences within the left hand hairpin (and hence dimer bridge sequence) are responsible for asymmetric resolution and conservation of the unique sequence within the left hand palindrome of the MVM genome.
NASA Astrophysics Data System (ADS)
Yang, Hongxin; Su, Fulin
2018-01-01
We propose a moving target analysis algorithm using speeded-up robust features (SURF) and regular moment in inverse synthetic aperture radar (ISAR) image sequences. In our study, we first extract interest points from ISAR image sequences by SURF. Different from traditional feature point extraction methods, SURF-based feature points are invariant to scattering intensity, target rotation, and image size. Then, we employ a bilateral feature registering model to match these feature points. The feature registering scheme can not only search the isotropic feature points to link the image sequences but also reduce the error matching pairs. After that, the target centroid is detected by regular moment. Consequently, a cost function based on correlation coefficient is adopted to analyze the motion information. Experimental results based on simulated and real data validate the effectiveness and practicability of the proposed method.
Unique Variants in OPN1LW Cause Both Syndromic and Nonsyndromic X-Linked High Myopia Mapped to MYP1.
Li, Jiali; Gao, Bei; Guan, Liping; Xiao, Xueshan; Zhang, Jianguo; Li, Shiqiang; Jiang, Hui; Jia, Xiaoyun; Yang, Jianhua; Guo, Xiangming; Yin, Ye; Wang, Jun; Zhang, Qingjiong
2015-06-01
MYP1 is a locus for X-linked syndromic and nonsyndromic high myopia. Recently, unique haplotypes in OPN1LW were found to be responsible for X-linked syndromic high myopia mapped to MYP1. The current study is to test if such variants in OPN1LW are also responsible for X-linked nonsyndromic high myopia mapped to MYP1. The proband of the family previously mapped to MYP1 was initially analyzed using whole-exome sequencing and whole-genome sequencing. Additional probands with early-onset high myopia were analyzed using whole-exome sequencing. Variants in OPN1LW were selected and confirmed by Sanger sequencing. Long-range and second PCR were used to determine the haplotype and the first gene of the red-green gene array. Candidate variants were further validated in family members and controls. The unique LVAVA haplotype in OPN1LW was detected in the family with X-linked nonsyndromic high myopia mapped to MYP1. In addition, this haplotype and a novel frameshift mutation (c.617_620dup, p.Phe208Argfs*51) in OPN1LW were detected in two other families with X-linked high myopia. The unique haplotype cosegregated with high myopia in the two families, with a maximum LOD score of 3.34 and 2.31 at θ = 0. OPN1LW with the variants in these families was the first gene in the red-green gene array and was not present in 247 male controls. Reevaluation of the clinical data in both families with the unique haplotype suggested nonsyndromic high myopia. Our study confirms the findings that unique variants in OPN1LW are responsible for both syndromic and nonsyndromic X-linked high myopia mapped to MYP1.
Moody, Colleen L; Tretyachenko-Ladokhina, Vira; Laue, Thomas M; Senear, Donald F; Cocco, Melanie J
2011-08-09
The cytidine repressor (CytR) is a member of the LacR family of bacterial repressors with distinct functional features. The Escherichia coli CytR regulon comprises nine operons whose palindromic operators vary in both sequence and, most significantly, spacing between the recognition half-sites. This suggests a strong likelihood that protein folding would be coupled to DNA binding as a mechanism to accommodate the variety of different operator architectures to which CytR is targeted. Such coupling is a common feature of sequence-specific DNA-binding proteins, including the LacR family repressors; however, there are no significant structural rearrangements upon DNA binding within the three-helix DNA-binding domains (DBDs) studied to date. We used nuclear magnetic resonance (NMR) spectroscopy to characterize the CytR DBD free in solution and to determine the high-resolution structure of a CytR DBD monomer bound specifically to one DNA half-site of the uridine phosphorylase (udp) operator. We find that the free DBD populates multiple distinct conformations distinguished by up to four sets of NMR peaks per residue. This structural heterogeneity is previously unknown in the LacR family. These stable structures coalesce into a single, more stable udp-bound form that features a three-helix bundle containing a canonical helix-turn-helix motif. However, this structure differs from all other LacR family members whose structures are known with regard to the packing of the helices and consequently their relative orientations. Aspects of CytR activity are unique among repressors; we identify here structural properties that are also distinct and that might underlie the different functional properties. © 2011 American Chemical Society
Rossi, Mari; El-Khechen, Dima; Black, Mary Helen; Farwell Hagman, Kelly D; Tang, Sha; Powis, Zöe
2017-05-01
Exome sequencing has recently been proved to be a successful diagnostic method for complex neurodevelopmental disorders. However, the diagnostic yield of exome sequencing for autism spectrum disorders has not been extensively evaluated in large cohorts to date. We performed diagnostic exome sequencing in a cohort of 163 individuals with autism spectrum disorder (66.3%) or autistic features (33.7%). The diagnostic yield observed in patients in our cohort was 25.8% (42 of 163) for positive or likely positive findings in characterized disease genes, while a candidate genetic etiology was reported for an additional 3.3% (4 of 120) of patients. Among the positive findings in the patients with autism spectrum disorder or autistic features, 61.9% were the result of de novo mutations. Patients presenting with psychiatric conditions or ataxia or paraplegia in addition to autism spectrum disorder or autistic features were significantly more likely to receive positive results compared with patients without these clinical features (95.6% vs 27.1%, P < 0.0001; 83.3% vs 21.2%, P < 0.0001, respectively). The majority of the positive findings were in recently identified autism spectrum disorder genes, supporting the importance of diagnostic exome sequencing for patients with autism spectrum disorder or autistic features as the causative genes might evade traditional sequential or panel testing. These results suggest that diagnostic exome sequencing would be an efficient primary diagnostic method for patients with autism spectrum disorders or autistic features. Moreover, our data may aid clinicians to better determine which subset of patients with autism spectrum disorder with additional clinical features would benefit the most from diagnostic exome sequencing. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
2011-01-01
Background Panax notoginseng (Burk) F.H. Chen is important medicinal plant of the Araliacease family. Triterpene saponins are the bioactive constituents in P. notoginseng. However, available genomic information regarding this plant is limited. Moreover, details of triterpene saponin biosynthesis in the Panax species are largely unknown. Results Using the 454 pyrosequencing technology, a one-quarter GS FLX titanium run resulted in 188,185 reads with an average length of 410 bases for P. notoginseng root. These reads were processed and assembled by 454 GS De Novo Assembler software into 30,852 unique sequences. A total of 70.2% of unique sequences were annotated by Basic Local Alignment Search Tool (BLAST) similarity searches against public sequence databases. The Kyoto Encyclopedia of Genes and Genomes (KEGG) assignment discovered 41 unique sequences representing 11 genes involved in triterpene saponin backbone biosynthesis in the 454-EST dataset. In particular, the transcript encoding dammarenediol synthase (DS), which is the first committed enzyme in the biosynthetic pathway of major triterpene saponins, is highly expressed in the root of four-year-old P. notoginseng. It is worth emphasizing that the candidate cytochrome P450 (Pn02132 and Pn00158) and UDP-glycosyltransferase (Pn00082) gene most likely to be involved in hydroxylation or glycosylation of aglycones for triterpene saponin biosynthesis were discovered from 174 cytochrome P450s and 242 glycosyltransferases by phylogenetic analysis, respectively. Putative transcription factors were detected in 906 unique sequences, including Myb, homeobox, WRKY, basic helix-loop-helix (bHLH), and other family proteins. Additionally, a total of 2,772 simple sequence repeat (SSR) were identified from 2,361 unique sequences, of which, di-nucleotide motifs were the most abundant motif. Conclusion This study is the first to present a large-scale EST dataset for P. notoginseng root acquired by next-generation sequencing (NGS) technology. The candidate genes involved in triterpene saponin biosynthesis, including the putative CYP450s and UGTs, were obtained in this study. Additionally, the identification of SSRs provided plenty of genetic makers for molecular breeding and genetics applications in this species. These data will provide information on gene discovery, transcriptional regulation and marker-assisted selection for P. notoginseng. The dataset establishes an important foundation for the study with the purpose of ensuring adequate drug resources for this species. PMID:22369100
Protein location prediction using atomic composition and global features of the amino acid sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.
2010-01-22
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
Tuteja, Reetu; Saxena, Rachit K; Davila, Jaime; Shah, Trushar; Chen, Wenbin; Xiao, Yong-Li; Fan, Guangyi; Saxena, K B; Alverson, Andrew J; Spillane, Charles; Town, Christopher; Varshney, Rajeev K
2013-10-01
The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea.
Biomining active cellulases from a mining bioremediation system.
Mewis, Keith; Armstrong, Zachary; Song, Young C; Baldwin, Susan A; Withers, Stephen G; Hallam, Steven J
2013-09-20
Functional metagenomics has emerged as a powerful method for gene model validation and enzyme discovery from natural and human engineered ecosystems. Here we report development of a high-throughput functional metagenomic screen incorporating bioinformatic and biochemical analyses features. A fosmid library containing 6144 clones sourced from a mining bioremediation system was screened for cellulase activity using 2,4-dinitrophenyl β-cellobioside, a previously proven cellulose model substrate. Fifteen active clones were recovered and fully sequenced revealing 9 unique clones with the ability to hydrolyse 1,4-β-D-glucosidic linkages. Transposon mutagenesis identified genes belonging to glycoside hydrolase (GH) 1, 3, or 5 as necessary for mediating this activity. Reference trees for GH 1, 3, and 5 families were generated from sequences in the CAZy database for automated phylogenetic analysis of fosmid end and active clone sequences revealing known and novel cellulase encoding genes. Active cellulase genes recovered in functional screens were subcloned into inducible high copy plasmids, expressed and purified to determine enzymatic properties including thermostability, pH optima, and substrate specificity. The workflow described here provides a general paradigm for recovery and characterization of microbially derived genes and gene products based on genetic logic and contemporary screening technologies developed for model organismal systems. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Pragmatic turn in biology: From biological molecules to genetic content operators.
Witzany, Guenther
2014-08-26
Erwin Schrödinger's question "What is life?" received the answer for decades of "physics + chemistry". The concepts of Alain Turing and John von Neumann introduced a third term: "information". This led to the understanding of nucleic acid sequences as a natural code. Manfred Eigen adapted the concept of Hammings "sequence space". Similar to Hilbert space, in which every ontological entity could be defined by an unequivocal point in a mathematical axiomatic system, in the abstract "sequence space" concept each point represents a unique syntactic structure and the value of their separation represents their dissimilarity. In this concept molecular features of the genetic code evolve by means of self-organisation of matter. Biological selection determines the fittest types among varieties of replication errors of quasi-species. The quasi-species concept dominated evolution theory for many decades. In contrast to this, recent empirical data on the evolution of DNA and its forerunners, the RNA-world and viruses indicate cooperative agent-based interactions. Group behaviour of quasi-species consortia constitute de novo and arrange available genetic content for adaptational purposes within real-life contexts that determine epigenetic markings. This review focuses on some fundamental changes in biology, discarding its traditional status as a subdiscipline of physics and chemistry.
Aspergillus Section Fumigati Typing by PCR-Restriction Fragment Polymorphism▿
Staab, Janet F.; Balajee, S. Arunmozhi; Marr, Kieren A.
2009-01-01
Recent studies have shown that there are multiple clinically important members of the Aspergillus section Fumigati that are difficult to distinguish on the basis of morphological features (e.g., Aspergillus fumigatus, A. lentulus, and Neosartorya udagawae). Identification of these organisms may be clinically important, as some species vary in their susceptibilities to antifungal agents. In a prior study, we utilized multilocus sequence typing to describe A. lentulus as a species distinct from A. fumigatus. The sequence data show that the gene encoding β-tubulin, benA, has high interspecies variability at intronic regions but is conserved among isolates of the same species. These data were used to develop a PCR-restriction fragment length polymorphism (PCR-RFLP) method that rapidly and accurately distinguishes A. fumigatus, A. lentulus, and N. udagawae, three major species within the section Fumigati that have previously been implicated in disease. Digestion of the benA amplicon with BccI generated unique banding patterns; the results were validated by screening a collection of clinical strains and by in silico analysis of the benA sequences of Aspergillus spp. deposited in the GenBank database. PCR-RFLP of benA is a simple method for the identification of clinically important, similar morphotypes of Aspergillus spp. within the section Fumigati. PMID:19403766
Aspergillus section Fumigati typing by PCR-restriction fragment polymorphism.
Staab, Janet F; Balajee, S Arunmozhi; Marr, Kieren A
2009-07-01
Recent studies have shown that there are multiple clinically important members of the Aspergillus section Fumigati that are difficult to distinguish on the basis of morphological features (e.g., Aspergillus fumigatus, A. lentulus, and Neosartorya udagawae). Identification of these organisms may be clinically important, as some species vary in their susceptibilities to antifungal agents. In a prior study, we utilized multilocus sequence typing to describe A. lentulus as a species distinct from A. fumigatus. The sequence data show that the gene encoding beta-tubulin, benA, has high interspecies variability at intronic regions but is conserved among isolates of the same species. These data were used to develop a PCR-restriction fragment length polymorphism (PCR-RFLP) method that rapidly and accurately distinguishes A. fumigatus, A. lentulus, and N. udagawae, three major species within the section Fumigati that have previously been implicated in disease. Digestion of the benA amplicon with BccI generated unique banding patterns; the results were validated by screening a collection of clinical strains and by in silico analysis of the benA sequences of Aspergillus spp. deposited in the GenBank database. PCR-RFLP of benA is a simple method for the identification of clinically important, similar morphotypes of Aspergillus spp. within the section Fumigati.
Tuteja, Reetu; Saxena, Rachit K.; Davila, Jaime; Shah, Trushar; Chen, Wenbin; Xiao, Yong-Li; Fan, Guangyi; Saxena, K. B.; Alverson, Andrew J.; Spillane, Charles; Town, Christopher; Varshney, Rajeev K.
2013-01-01
The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea. PMID:23792890
Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk
Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B.; Huson, Daniel H.; Frick, Julia-Stefanie
2016-01-01
Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. PMID:27071651
USDA-ARS?s Scientific Manuscript database
Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...
Durack, Juliana; Lynch, Susan V; Nariya, Snehal; Bhakta, Nirav R; Beigelman, Avraham; Castro, Mario; Dyer, Anne-Marie; Israel, Elliot; Kraft, Monica; Martin, Richard J; Mauger, David T; Rosenberg, Sharon R; Sharp-King, Tonya; White, Steven R; Woodruff, Prescott G; Avila, Pedro C; Denlinger, Loren C; Holguin, Fernando; Lazarus, Stephen C; Lugogo, Njira; Moore, Wendy C; Peters, Stephen P; Que, Loretta; Smith, Lewis J; Sorkness, Christine A; Wechsler, Michael E; Wenzel, Sally E; Boushey, Homer A; Huang, Yvonne J
2017-07-01
Compositional differences in the bronchial bacterial microbiota have been associated with asthma, but it remains unclear whether the findings are attributable to asthma, to aeroallergen sensitization, or to inhaled corticosteroid treatment. We sought to compare the bronchial bacterial microbiota in adults with steroid-naive atopic asthma, subjects with atopy but no asthma, and nonatopic healthy control subjects and to determine relationships of the bronchial microbiota to phenotypic features of asthma. Bacterial communities in protected bronchial brushings from 42 atopic asthmatic subjects, 21 subjects with atopy but no asthma, and 21 healthy control subjects were profiled by using 16S rRNA gene sequencing. Bacterial composition and community-level functions inferred from sequence profiles were analyzed for between-group differences. Associations with clinical and inflammatory variables were examined, including markers of type 2-related inflammation and change in airway hyperresponsiveness after 6 weeks of fluticasone treatment. The bronchial microbiome differed significantly among the 3 groups. Asthmatic subjects were uniquely enriched in members of the Haemophilus, Neisseria, Fusobacterium, and Porphyromonas species and the Sphingomonodaceae family and depleted in members of the Mogibacteriaceae family and Lactobacillales order. Asthma-associated differences in predicted bacterial functions included involvement of amino acid and short-chain fatty acid metabolism pathways. Subjects with type 2-high asthma harbored significantly lower bronchial bacterial burden. Distinct changes in specific microbiota members were seen after fluticasone treatment. Steroid responsiveness was linked to differences in baseline compositional and functional features of the bacterial microbiome. Even in subjects with mild steroid-naive asthma, differences in the bronchial microbiome are associated with immunologic and clinical features of the disease. The specific differences identified suggest possible microbiome targets for future approaches to asthma treatment or prevention. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Kim, Jihun; Kim, Deokhoon; Chun, Sung-Min; Kim, Jiyun; Kim, Tae Won; Park, Inja; Yu, Chang-Sik; Jang, Se Jin
2016-01-01
Early-onset colorectal cancers (EOCRCs) may have biological or genomic features distinct from late-onset CRCs (LOCRCs). Previous studies have mostly focused on the germline predisposition conditions of EOCRCs, but we hypothesized that EOCRCs may have distinct somatic aberrations that accelerate cancer development. To identify the somatic aberrations that accelerate cancer development at an early age, we conducted whole exome sequencing for 28 polyposis-unrelated, microsatellite stable (MSS) EOCRCs with no known germline predisposition conditions. Surprisingly, we found two distinct groups in the context of mutational burden: 6 hypermutated cases with 2325 to 10973 mutations and 22 nonhypermutated cases with 47 to 154 mutations. Further analysis revealed that four of the six hypermutated cases had the same POLE P286R mutation. We validated this finding in 83 MSS EOCRCs and 27 MSS LOCRCs, which revealed that 7.2% of EOCRCs (6/83) had the POLE P286R mutation, which was not found in LOCRCs. Clinicopathologically, EOCRCs with POLE mutations occurred far more frequently in the right colon than in the left colon, affecting men more frequently than women. In summary, we have identified a unique subclass of colon cancer characterized by a hypermutation associated with the POLE mutation. The acquisition of the POLE mutation leading to hypermutation can accelerate cancer development. Clinically, this subset with hypermutation may be susceptible to immune checkpoint blockade. PMID:27612425
A 3D sequence-independent representation of the protein data bank.
Fischer, D; Tsai, C J; Nussinov, R; Wolfson, H
1995-10-01
Here we address the following questions. How many structurally different entries are there in the Protein Data Bank (PDB)? How do the proteins populate the structural universe? To investigate these questions a structurally non-redundant set of representative entries was selected from the PDB. Construction of such a dataset is not trivial: (i) the considerable size of the PDB requires a large number of comparisons (there were more than 3250 structures of protein chains available in May 1994); (ii) the PDB is highly redundant, containing many structurally similar entries, not necessarily with significant sequence homology, and (iii) there is no clear-cut definition of structural similarity. The latter depend on the criteria and methods used. Here, we analyze structural similarity ignoring protein topology. To date, representative sets have been selected either by hand, by sequence comparison techniques which ignore the three-dimensional (3D) structures of the proteins or by using sequence comparisons followed by linear structural comparison (i.e. the topology, or the sequential order of the chains, is enforced in the structural comparison). Here we describe a 3D sequence-independent automated and efficient method to obtain a representative set of protein molecules from the PDB which contains all unique structures and which is structurally non-redundant. The method has two novel features. The first is the use of strictly structural criteria in the selection process without taking into account the sequence information. To this end we employ a fast structural comparison algorithm which requires on average approximately 2 s per pairwise comparison on a workstation. The second novel feature is the iterative application of a heuristic clustering algorithm that greatly reduces the number of comparisons required. We obtain a representative set of 220 chains with resolution better than 3.0 A, or 268 chains including lower resolution entries, NMR entries and models. The resulting set can serve as a basis for extensive structural classification and studies of 3D recurring motifs and of sequence-structure relationships. The clustering algorithm succeeds in classifying into the same structural family chains with no significant sequence homology, e.g. all the globins in one single group, all the trypsin-like serine proteases in another or all the immunoglobulin-like folds into a third. In addition, unexpected structural similarities of interest have been automatically detected between pairs of chains. A cluster analysis of the representative structures demonstrates the way the "structural universe' is populated.
Using the self-select paradigm to delineate the nature of speech motor programming.
Wright, David L; Robin, Don A; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H; Fox, Peter T
2009-06-01
The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weinzierl, Marion; Yeates, Anthony R.; Mackay, Duncan H.
2016-05-20
In this paper, we develop a new technique for driving global non-potential simulations of the Sun’s coronal magnetic field solely from sequences of radial magnetic maps of the solar photosphere. A primary challenge to driving such global simulations is that the required horizontal electric field cannot be uniquely determined from such maps. We show that an “inductive” electric field solution similar to that used by previous authors successfully reproduces specific features of the coronal field evolution in both single and multiple bipole simulations. For these cases, the true solution is known because the electric field was generated from a surfacemore » flux-transport model. The match for these cases is further improved by including the non-inductive electric field contribution from surface differential rotation. Then, using this reconstruction method for the electric field, we show that a coronal non-potential simulation can be successfully driven from a sequence of ADAPT maps of the photospheric radial field, without including additional physical observations which are not routinely available.« less
Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki
2010-01-01
A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
Improved microarray methods for profiling the yeast knockout strain collection
Yuan, Daniel S.; Pan, Xuewen; Ooi, Siew Loon; Peyser, Brian D.; Spencer, Forrest A.; Irizarry, Rafael A.; Boeke, Jef D.
2005-01-01
A remarkable feature of the Yeast Knockout strain collection is the presence of two unique 20mer TAG sequences in almost every strain. In principle, the relative abundances of strains in a complex mixture can be profiled swiftly and quantitatively by amplifying these sequences and hybridizing them to microarrays, but TAG microarrays have not been widely used. Here, we introduce a TAG microarray design with sophisticated controls and describe a robust method for hybridizing high concentrations of dye-labeled TAGs in single-stranded form. We also highlight the importance of avoiding PCR contamination and provide procedures for detection and eradication. Validation experiments using these methods yielded false positive (FP) and false negative (FN) rates for individual TAG detection of 3–6% and 15–18%, respectively. Analysis demonstrated that cross-hybridization was the chief source of FPs, while TAG amplification defects were the main cause of FNs. The materials, protocols, data and associated software described here comprise a suite of experimental resources that should facilitate the use of TAG microarrays for a wide variety of genetic screens. PMID:15994458
ROTATING STARS FROM KEPLER OBSERVED WITH GAIA DR1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davenport, James R. A.
2017-01-20
Astrometric data from the recent Gaia Data Release 1 have been matched against the sample of stars from Kepler with known rotation periods. A total of 1299 bright rotating stars were recovered from the subset of Gaia sources with good astrometric solutions, most with temperatures above 5000 K. From these sources, 894 were selected as lying near the main sequence using their absolute G -band magnitudes. These main-sequence stars show a bimodality in their rotation period distribution, centered roughly around a 600 Myr rotation isochrone. This feature matches the bimodal period distribution found in cooler stars with Kepler , butmore » was previously undetected for solar-type stars due to sample contamination by subgiants. A tenuous connection between the rotation period and total proper motion is found, suggesting that the period bimodality is due to the age distribution of stars within ∼300 pc of the Sun, rather than a phase of rapid angular momentum loss. This work emphasizes the unique power for understanding stellar populations that is created by combining temporal monitoring from Kepler with astrometric data from Gaia .« less
Improved maize reference genome with single-molecule technologies.
Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen
2017-06-22
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
NASA Technical Reports Server (NTRS)
Lanyi, J. K.
1986-01-01
The archaebacteria occupy a unique place in phylogenetic trees constructed from analyses of sequences from key informational macromolecules, and their study continues to yield interesting ideas on the early evolution and divergence of biological forms. It is now known that the halobacteria among these species contain various retinal-proteins, resembling eukaryotic rhodopsins, but with different functions. Two of these pigments, located in the cytoplasmic membranes of the bacteria, are bacteriorhodopsin (a light-driven proton pump) and halorhodopsin (a light-driven chloride pump). Comparison of these systems is expected to reveal structure/function relationships in these simple (primitive?) energy transducing membrane components and evolutionary relationships which had produced the structural features which allow the divergent functions. Findings indicate that very different primary structures are needed for these proteins to accomplish their different functions. Indeed, analysis of partial amino acid sequences from halo-opsin shows already that few if any long segments exist which are homologous to bacterio-opsin. Either these proteins diverged a very long time ago to allow for the observed differences, or the evolutionary clock in the halobacteria runs faster than usual.
A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors
Qin, Xin-Cheng; Shi, Mang; Tian, Jun-Hua; Lin, Xian-Dan; Gao, Dong-Ya; He, Jin-Rong; Wang, Jian-Bo; Li, Ci-Xiu; Kang, Yan-Jun; Yu, Bin; Zhou, Dun-Jin; Xu, Jianguo; Plyusnin, Alexander; Holmes, Edward C.; Zhang, Yong-Zhen
2014-01-01
Although segmented and unsegmented RNA viruses are commonplace, the evolutionary links between these two very different forms of genome organization are unclear. We report the discovery and characterization of a tick-borne virus—Jingmen tick virus (JMTV)—that reveals an unexpected connection between segmented and unsegmented RNA viruses. The JMTV genome comprises four segments, two of which are related to the nonstructural protein genes of the genus Flavivirus (family Flaviviridae), whereas the remaining segments are unique to this virus, have no known homologs, and contain a number of features indicative of structural protein genes. Remarkably, homology searching revealed that sequences related to JMTV were present in the cDNA library from Toxocara canis (dog roundworm; Nematoda), and that shared strong sequence and structural resemblances. Epidemiological studies showed that JMTV is distributed in tick populations across China, especially Rhipicephalus and Haemaphysalis spp., and experiences frequent host-switching and genomic reassortment. To our knowledge, JMTV is the first example of a segmented RNA virus with a genome derived in part from unsegmented viral ancestors. PMID:24753611
PopHuman: the human population genomics browser
Mulet, Roger; Villegas-Mirón, Pablo; Hervas, Sergi; Sanz, Esteve; Velasco, Daniel; Bertranpetit, Jaume; Laayouni, Hafid
2018-01-01
Abstract The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat. PMID:29059408
He, Qiye; Johnston, Jeff; Zeitlinger, Julia
2014-01-01
Understanding how eukaryotic enhancers are bound and regulated by specific combinations of transcription factors is still a major challenge. To better map transcription factor binding genome-wide at nucleotide resolution in vivo, we have developed a robust ChIP-exo protocol called ChIP experiments with nucleotide resolution through exonuclease, unique barcode and single ligation (ChIP-nexus), which utilizes an efficient DNA self-circularization step during library preparation. Application of ChIP-nexus to four proteins—human TBP and Drosophila NFkB, Twist and Max— demonstrates that it outperforms existing ChIP protocols in resolution and specificity, pinpoints relevant binding sites within enhancers containing multiple binding motifs and allows the analysis of in vivo binding specificities. Notably, we show that Max frequently interacts with DNA sequences next to its motif, and that this binding pattern correlates with local DNA sequence features such as DNA shape. ChIP-nexus will be broadly applicable to studying in vivo transcription factor binding specificity and its relationship to cis-regulatory changes in humans and model organisms. PMID:25751057
Katoh, Hiroshi; Miyata, Shin-ichi; Inoue, Hiromitsu; Iwanami, Toru
2014-01-01
Citrus greening (huanglongbing) is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, ‘Candidatus Liberibacter asiaticus’, ‘Ca. L. americanus’, and ‘Ca. L. africanus’. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol), in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative ‘Ca. L. asiaticus’ Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from ‘Ca. L. asiaticus’-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other ‘Ca. L. asiaticus’ strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region. PMID:25180586
n-CoDeR concept: unique types of antibodies for diagnostic use and therapy.
Carlsson, R; Söderlind, E
2001-05-01
The n-CoDeR recombinant antibody gene libraries are built on a single master framework, into which diverse in vivo-formed complementarity determining regions (CDRs) are allowed to recombine. These CDRs are sampled from in vivo-processed and proof-read gene sequences, thus ensuring an optimal level of correctly folded and functional molecules. By the modularized assembly process, up to six CDRs can be varied at the same time, providing a possibility for the creation of a hitherto undescribed genetic and functional variation. The n-CoDeR antibody gene libraries can be used to select highly specific, human antibody fragments with specificities to virtually any antigen, including carbohydrates and human self-proteins and with affinities down into the subnanomolar range. Furthermore, combining CDRs sampled from in vivo-processed sequences into a single framework result in molecules exhibiting a lower immunogenicity compared to normal human immunoglobulins, as determined by computer analyses. The distinguished features of the n-CoDeR libraries in the therapeutic and diagnostic areas are discussed.
Dosmann, Michael; Groover, Andrew
2012-01-01
Living botanical collections include germplasm repositories, long-term experimental plantings, and botanical gardens. We present here a series of vignettes to illustrate the central role that living collections have played in plant biology research, including evo-devo research. Looking toward the future, living collections will become increasingly important in support of future evo-devo research. The driving force behind this trend is nucleic acid sequencing technologies, which are rapidly becoming more powerful and cost-effective, and which can be applied to virtually any species. This allows for more extensive sampling, including non-model organisms with unique biological features and plants from diverse phylogenetic positions. Importantly, a major challenge for sequencing-based evo-devo research is to identify, access, and propagate appropriate plant materials. We use a vignette of the ongoing 1,000 Transcriptomes project as an example of the challenges faced by such projects. We conclude by identifying some of the pinch points likely to be encountered by future evo-devo researchers, and how living collections can help address them. PMID:22737158
Methylotrophic Methylobacterium Bacteria Nodulate and Fix Nitrogen in Symbiosis with Legumes
Sy, Abdoulaye; Giraud, Eric; Jourand, Philippe; Garcia, Nelly; Willems, Anne; de Lajudie, Philippe; Prin, Yves; Neyra, Marc; Gillis, Monique; Boivin-Masson, Catherine; Dreyfus, Bernard
2001-01-01
Rhizobia described so far belong to three distinct phylogenetic branches within the α-2 subclass of Proteobacteria. Here we report the discovery of a fourth rhizobial branch involving bacteria of the Methylobacterium genus. Rhizobia isolated from Crotalaria legumes were assigned to a new species, “Methylobacterium nodulans,” within the Methylobacterium genus on the basis of 16S ribosomal DNA analyses. We demonstrated that these rhizobia facultatively grow on methanol, which is a characteristic of Methylobacterium spp. but a unique feature among rhizobia. Genes encoding two key enzymes of methylotrophy and nodulation, the mxaF gene, encoding the α subunit of the methanol dehydrogenase, and the nodA gene, encoding an acyltransferase involved in Nod factor biosynthesis, were sequenced for the type strain, ORS2060. Plant tests and nodA amplification assays showed that “M. nodulans” is the only nodulating Methylobacterium sp. identified so far. Phylogenetic sequence analysis showed that “M. nodulans” NodA is closely related to Bradyrhizobium NodA, suggesting that this gene was acquired by horizontal gene transfer. PMID:11114919
Xu, Tingting; Zhou, Cong-Zhao; Xiao, Jianxi; Liu, Jinsong
2018-02-20
Naturally occurring interruptions in nonfibrillar collagen play key roles in molecular flexibility, collagen degradation, and ligand binding. The structural feature of the interruption sequences and the molecular basis for their functions have not been well studied. Here, we focused on a G5G type natural interruption sequence G-POALO-G from human type XIX collagen, a homotrimer collagen, as this sequence possesses distinct properties compared with those of a pathological similar Gly mutation sequence in collagen mimic peptides. We determined the crystal structures of the host-guest peptide (GPO) 3 -GPOALO-(GPO) 4 to 1.03 Å resolution in two crystal forms. In these structures, the interruption zone brings localized disruptions to the triple helix and introduces a light 6-8° bend with the same directional preference to the whole molecule, which may correspond structurally to the first physiological kink site in type XIX collagen. Furthermore, at the G5G interruption site, the presence of Ala and Leu residues, both with free N-H groups, allows the formation of more direct and water-mediated interchain hydrogen bonds than in the related Gly → Ala structure. These could partly explain the difference in thermal stability between the different interruptions. In addition, our structures provide a detailed view of the dynamic property of such an interrupted zone with respect to hydrogen bonding topology, torsion angles, and helical parameters. Our results, for the first time, also identified the binding of zinc to the end of the triple helix. These findings will shed light on how the interruption sequence influences the conformation of the collagen molecule and provide a structural basis for further functional studies.
Wilson, Benjamin; Smith, Kenny; Petkov, Christopher I
2015-03-01
Artificial grammars (AG) can be used to generate rule-based sequences of stimuli. Some of these can be used to investigate sequence-processing computations in non-human animals that might be related to, but not unique to, human language. Previous AG learning studies in non-human animals have used different AGs to separately test for specific sequence-processing abilities. However, given that natural language and certain animal communication systems (in particular, song) have multiple levels of complexity, mixed-complexity AGs are needed to simultaneously evaluate sensitivity to the different features of the AG. Here, we tested humans and Rhesus macaques using a mixed-complexity auditory AG, containing both adjacent (local) and non-adjacent (longer-distance) relationships. Following exposure to exemplary sequences generated by the AG, humans and macaques were individually tested with sequences that were either consistent with the AG or violated specific adjacent or non-adjacent relationships. We observed a considerable level of cross-species correspondence in the sensitivity of both humans and macaques to the adjacent AG relationships and to the statistical properties of the sequences. We found no significant sensitivity to the non-adjacent AG relationships in the macaques. A subset of humans was sensitive to this non-adjacent relationship, revealing interesting between- and within-species differences in AG learning strategies. The results suggest that humans and macaques are largely comparably sensitive to the adjacent AG relationships and their statistical properties. However, in the presence of multiple cues to grammaticality, the non-adjacent relationships are less salient to the macaques and many of the humans. © 2015 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Vázquez, Martín; Ben-Dov, Claudia; Lorenzi, Hernan; Moore, Troy; Schijman, Alejandro; Levin, Mariano J.
2000-01-01
The short interspersed repetitive element (SIRE) of Trypanosoma cruzi was first detected when comparing the sequences of loci that encode the TcP2β genes. It is present in about 1,500–3,000 copies per genome, depending on the strain, and it is distributed in all chromosomes. An initial analysis of SIRE sequences from 21 genomic fragments allowed us to derive a consensus nucleotide sequence and structure for the element, consisting of three regions (I, II, and III) each harboring distinctive features. Analysis of 158 transcribed SIREs demonstrates that the consensus is highly conserved. The sequences of 51 cDNAs show that SIRE is included in the 3′ end of several mRNAs, always transcribed from the sense strand, contributing the polyadenylation site in 63% of the cases. This study led to the characterization of VIPER (vestigial interposed retroelement), a 2,326-bp-long unusual retroelement. VIPER's 5′ end is formed by the first 182 bp of SIRE, whereas its 3′ end is formed by the last 220 bp of the element. Both SIRE moieties are connected by a 1,924-bp-long fragment that carries a unique ORF encoding a complete reverse transcriptase-RNase H gene whose 15 C-terminal amino acids derive from codons specified by SIRE's region II. The amino acid sequence of VIPER's reverse transcriptase-RNase H shares significant homology to that of long terminal repeat retrotransposons. The fact that SIRE and VIPER sequences are found only in the T. cruzi genome may be of relevance for studies concerning the evolution and the genome flexibility of this protozoan parasite. PMID:10688909
Madsen, James A.; Xu, Hua; Robinson, Michelle R.; Horton, Andrew P.; Shaw, Jared B.; Giles, David K.; Kaoud, Tamer S.; Dalby, Kevin N.; Trent, M. Stephen; Brodbelt, Jennifer S.
2013-01-01
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches. PMID:23695934
The DNA Methylome of Human Peripheral Blood Mononuclear Cells
Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing
2010-01-01
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693
Song, B K; Pan, M Z; Lau, Y L; Wan, K L
2014-07-29
Commercial flocks infected by Eimeria species parasites, including Eimeria maxima, have an increased risk of developing clinical or subclinical coccidiosis; an intestinal enteritis associated with increased mortality rates in poultry. Currently, infection control is largely based on chemotherapy or live vaccines; however, drug resistance is common and vaccines are relatively expensive. The development of new cost-effective intervention measures will benefit from unraveling the complex genetic mechanisms that underlie host-parasite interactions, including the identification and characterization of genes encoding proteins such as phosphatidylinositol 4-phosphate 5-kinase (PIP5K). We previously identified a PIP5K coding sequence within the E. maxima genome. In this study, we analyzed two bacterial artificial chromosome clones presenting a ~145-kb E. maxima (Weybridge strain) genomic region spanning the PIP5K gene locus. Sequence analysis revealed that ~95% of the simple sequence repeats detected were located within regions comparable to the previously described feature-rich segments of the Eimeria tenella genome. Comparative sequence analysis with the orthologous E. maxima (Houghton strain) region revealed a moderate level of conserved synteny. Unique segmental organizations and telomere-like repeats were also observed in both genomes. A number of incomplete transposable elements were detected and further scrutiny of these elements in both orthologous segments revealed interesting nesting events, which may play a role in facilitating genome plasticity in E. maxima. The current analysis provides more detailed information about the genome organization of E. maxima and may help to reveal genotypic differences that are important for expression of traits related to pathogenicity and virulence.
Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R
2005-09-01
We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Frequency of the first feature in action sequences influences feature binding.
Mattson, Paul S; Fournier, Lisa R; Behmer, Lawrence P
2012-10-01
We investigated whether binding among perception and action feature codes is a preliminary step toward creating a more durable memory trace of an action event. If so, increasing the frequency of a particular event (e.g., a stimulus requiring a movement with the left or right hand in an up or down direction) should increase the strength and speed of feature binding for this event. The results from two experiments, using a partial-repetition paradigm, confirmed that feature binding increased in strength and/or occurred earlier for a high-frequency (e.g., left hand moving up) than for a low-frequency (e.g., right hand moving down) event. Moreover, increasing the frequency of the first-specified feature in the action sequence alone (e.g., "left" hand) increased the strength and/or speed of action feature binding (e.g., between the "left" hand and movement in an "up" or "down" direction). The latter finding suggests an update to the theory of event coding, as not all features in the action sequence equally determine binding strength. We conclude that action planning involves serial binding of features in the order of action feature execution (i.e., associations among features are not bidirectional but are directional), which can lead to a more durable memory trace. This is consistent with physiological evidence suggesting that serial order is preserved in an action plan executed from memory and that the first feature in the action sequence may be critical in preserving this serial order.
Visualization of protein sequence features using JavaScript and SVG with pViz.js.
Mukhyala, Kiran; Masselot, Alexandre
2014-12-01
pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Evaluation of microbial community in hydrothermal field by direct DNA sequencing
NASA Astrophysics Data System (ADS)
Kawarabayasi, Y.; Maruyama, A.
2002-12-01
Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the differentiation of species. Thus, some novel archaeal species are expected to be in this field. The present direct cloning and sequencing technique is now opening a window to the new world in hydrothermal microbial community analysis.
BioSAVE: display of scored annotation within a sequence context.
Pollock, Richard F; Adryan, Boris
2008-03-20
Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage.
BioSAVE: Display of scored annotation within a sequence context
Pollock, Richard F; Adryan, Boris
2008-01-01
Background Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. Results We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Conclusion Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage. PMID:18366701
Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries
Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.
2012-01-01
Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365
Novel application of the MSSCP method in biodiversity studies.
Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula
2012-02-01
Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Effective Feature Selection for Classification of Promoter Sequences.
K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish
2016-01-01
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T
2017-10-01
Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.
Identification Of Minangkabau Landscape Characters
NASA Astrophysics Data System (ADS)
Asrina, M.; Gunawan, A.; Aris, Munandar
2017-10-01
Minangkabau is one of cultures in indonesia which occupies landscape intact. Landscape of Minangkabau have a very close relationship with the culture of the people. Uniqueness of Minangkabau culture and landscape forming an inseparable characterunity. The landscape is necessarily identified to know the inherent landscape characters. The objective of this study was to identify the character of the Minangkabau landscape characterizes its uniqueness. The study was conducted by using descriptive method comprised literature review and field observasion. Observed the landscape characters comprised two main features, they were major and minor features. Indetification of the features was conducted in two original areas (darek) of the Minangkabau traditional society. The research results showed that major features or natural features of the landscape were predominantly landform, landcover, and hidrology. All luhak (districts) of Minangkabau showed similar main features such as hill, canyon, lake, valley, and forest. The existence of natural features such as hills, canyon and valleys characterizes the nature of minangkabau landscape. Minor features formed by Minangkabau cultural society were agricultural land and settlement. Rumah gadang (big house) is one of famous minor features characterizes the Minangkabau culture. In addition, several historical artefacts of building and others structure may strengthen uniqueness of the Minangkabau landscape character, such as The royal palace, inscription, and tunnels.
Life in hot acid: Pathway analyses in extremely thermoacidophilic archaea
Auernik, Kathryne S.; Cooper, Charlotte R.; Kelly, Robert M.
2013-01-01
SUMMARY The extremely thermoacidophilic archaea are a particularly intriguing group of microorganisms that must simultaneously cope with biologically extreme pHs (≤ 4) and temperatures (Topt ≥ 60°C) in their natural environments. Their expandi ng biotechnological significance relates to their role in biomining of base and precious metals and their unique mechanisms of survival in hot acid, at both the cellular and biomolecular levels. Recent developments, such as advances in understanding of heavy metal tolerance mechanisms, implementation of a genetic system, and discovery of a new carbon fixation pathway, have been facilitated by availability of genome sequence data and molecular genetic systems. As a result, new insights into the metabolic pathways and physiological features that define extreme thermoacidophily have been obtained, in some cases suggesting prospects for biotechnological opportunities. PMID:18760359
A revision of the subtract-with-borrow random number generators
NASA Astrophysics Data System (ADS)
Sibidanov, Alexei
2017-12-01
The most popular and widely used subtract-with-borrow generator, also known as RANLUX, is reimplemented as a linear congruential generator using large integer arithmetic with the modulus size of 576 bits. Modern computers, as well as the specific structure of the modulus inferred from RANLUX, allow for the development of a fast modular multiplication - the core of the procedure. This was previously believed to be slow and have too high cost in terms of computing resources. Our tests show a significant gain in generation speed which is comparable with other fast, high quality random number generators. An additional feature is the fast skipping of generator states leading to a seeding scheme which guarantees the uniqueness of random number sequences. Licensing provisions: GPLv3 Programming language: C++, C, Assembler
Poltev, V; Anisimov, V M; Dominguez, V; Gonzalez, E; Deriabina, A; Garcia, D; Rivas, F; Polteva, N A
2018-02-01
Deciphering the mechanism of functioning of DNA as the carrier of genetic information requires identifying inherent factors determining its structure and function. Following this path, our previous DFT studies attributed the origin of unique conformational characteristics of right-handed Watson-Crick duplexes (WCDs) to the conformational profile of deoxydinucleoside monophosphates (dDMPs) serving as the minimal repeating units of DNA strand. According to those findings, the directionality of the sugar-phosphate chain and the characteristic ranges of dihedral angles of energy minima combined with the geometric differences between purines and pyrimidines determine the dependence on base sequence of the three-dimensional (3D) structure of WCDs. This work extends our computational study to complementary deoxydinucleotide-monophosphates (cdDMPs) of non-standard conformation, including those of Z-family, Hoogsteen duplexes, parallel-stranded structures, and duplexes with mispaired bases. For most of these systems, except Z-conformation, computations closely reproduce experimental data within the tolerance of characteristic limits of dihedral parameters for each conformation family. Computation of cdDMPs with Z-conformation reveals that their experimental structures do not correspond to the internal energy minimum. This finding establishes the leading role of external factors in formation of the Z-conformation. Energy minima of cdDMPs of non-Watson-Crick duplexes demonstrate different sequence-dependence features than those known for WCDs. The obtained results provide evidence that the biologically important regularities of 3D structure distinguish WCDs from duplexes having non-Watson-Crick nucleotide pairing.
Steckelberg, Anna-Lena; Akiyama, Benjamin M; Costantino, David A; Sit, Tim L; Nix, Jay C; Kieft, Jeffrey S
2018-06-19
Folded RNA elements that block processive 5' → 3' cellular exoribonucleases (xrRNAs) to produce biologically active viral noncoding RNAs have been discovered in flaviviruses, potentially revealing a new mode of RNA maturation. However, whether this RNA structure-dependent mechanism exists elsewhere and, if so, whether a singular RNA fold is required, have been unclear. Here we demonstrate the existence of authentic RNA structure-dependent xrRNAs in dianthoviruses, plant-infecting viruses unrelated to animal-infecting flaviviruses. These xrRNAs have no sequence similarity to known xrRNAs; thus, we used a combination of biochemistry and virology to characterize their sequence requirements and mechanism of stopping exoribonucleases. By solving the structure of a dianthovirus xrRNA by X-ray crystallography, we reveal a complex fold that is very different from that of the flavivirus xrRNAs. However, both versions of xrRNAs contain a unique topological feature, a pseudoknot that creates a protective ring around the 5' end of the RNA structure; this may be a defining structural feature of xrRNAs. Single-molecule FRET experiments reveal that the dianthovirus xrRNAs undergo conformational changes and can use "codegradational remodeling," exploiting the exoribonucleases' degradation-linked helicase activity to help form their resistant structure; such a mechanism has not previously been reported. Convergent evolution has created RNA structure-dependent exoribonuclease resistance in different contexts, which establishes it as a general RNA maturation mechanism and defines xrRNAs as an authentic functional class of RNAs.
Conflict Background Triggered Congruency Sequence Effects in Graphic Judgment Task
Zhao, Liang; Wang, Yonghui
2013-01-01
Congruency sequence effects refer to the reduction of congruency effects when following an incongruent trial than following a congruent trial. The conflict monitoring account, one of the most influential contributions to this effect, assumes that the sequential modulations are evoked by response conflict. The present study aimed at exploring the congruency sequence effects in the absence of response conflict. We found congruency sequence effects occurred in graphic judgment task, in which the conflict stimuli acted as irrelevant information. The findings reveal that processing task-irrelevant conflict stimulus features could also induce sequential modulations of interference. The results do not support the interpretation of conflict monitoring and favor a feature integration account that the congruency sequence effects are attributed to the repetitions of stimulus and response features. PMID:23372766
A new family of β-helix proteins with similarities to the polysaccharide lyases
Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.
2014-09-27
Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less
A new family of β-helix proteins with similarities to the polysaccharide lyases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.
Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less
Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates
Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas
2015-01-01
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834
Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.
2011-01-01
GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647
Dewhurst, Henry M; Choudhury, Shilpa; Torres, Matthew P
2015-08-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)--a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits--conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit-N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Methods, systems and devices for detecting threatening objects and for classifying magnetic data
Kotter, Dale K [Shelley, ID; Roybal, Lyle G [Idaho Falls, ID; Rohrbaugh, David T [Idaho Falls, ID; Spencer, David F [Idaho Falls, ID
2012-01-24
A method for detecting threatening objects in a security screening system. The method includes a step of classifying unique features of magnetic data as representing a threatening object. Another step includes acquiring magnetic data. Another step includes determining if the acquired magnetic data comprises a unique feature.
Algorithms exploiting ultrasonic sensors for subject classification
NASA Astrophysics Data System (ADS)
Desai, Sachi; Quoraishee, Shafik
2009-09-01
Proposed here is a series of techniques exploiting micro-Doppler ultrasonic sensors capable of characterizing various detected mammalian targets based on their physiological movements captured a series of robust features. Employed is a combination of unique and conventional digital signal processing techniques arranged in such a manner they become capable of classifying a series of walkers. These processes for feature extraction develops a robust feature space capable of providing discrimination of various movements generated from bipeds and quadrupeds and further subdivided into large or small. These movements can be exploited to provide specific information of a given signature dividing it in a series of subset signatures exploiting wavelets to generate start/stop times. After viewing a series spectrograms of the signature we are able to see distinct differences and utilizing kurtosis, we generate an envelope detector capable of isolating each of the corresponding step cycles generated during a walk. The walk cycle is defined as one complete sequence of walking/running from the foot pushing off the ground and concluding when returning to the ground. This time information segments the events that are readily seen in the spectrogram but obstructed in the temporal domain into individual walk sequences. This walking sequence is then subsequently translated into a three dimensional waterfall plot defining the expected energy value associated with the motion at particular instance of time and frequency. The value is capable of being repeatable for each particular class and employable to discriminate the events. Highly reliable classification is realized exploiting a classifier trained on a candidate sample space derived from the associated gyrations created by motion from actors of interest. The classifier developed herein provides a capability to classify events as an adult humans, children humans, horses, and dogs at potentially high rates based on the tested sample space. The algorithm developed and described will provide utility to an underused sensor modality for human intrusion detection because of the current high-rate of generated false alarms. The active ultrasonic sensor coupled in a multi-modal sensor suite with binary, less descriptive sensors like seismic devices realizing a greater accuracy rate for detection of persons of interest for homeland purposes.
Martin, Jodi; Bureau, Jean-François; Yurkowski, Kim; Fournier, Tania Renaud; Lafontaine, Marie-France; Cloutier, Paula
2016-06-01
The current investigation addressed the potential for unique influences of perceived childhood maltreatment, adverse family-life events, and parent-child relational trauma on the lifetime occurrence and addictive features of non-suicidal self-injury (NSSI). Participants included 957 undergraduate students (747 females; M = 20.14 years, SD = 3.88) who completed online questionnaires regarding the key variables under study. Although self-injuring youth reported more experiences with each family-based risk factor, different patterns of association were found when lifetime engagement in NSSI or its addictive features were under study. Perceived parent-child relational trauma was uniquely linked with NSSI behavior after accounting for perceived childhood maltreatment; adverse family-life events had an additional unique association. In contrast, perceived paternal maltreatment was uniquely related with NSSI's addictive features. Findings underline the importance of studying inter-related family-based risk factors of NSSI simultaneously for a comprehensive understanding of familial correlates of NSSI behavior and its underlying features. Copyright © 2016 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.
Awua, Adolf K; Adanu, Richard M K; Wiredu, Edwin K; Afari, Edwin A; Zubuch, Vanessa A; Asmah, Richard H; Severini, Alberto
2017-04-21
In addition to being useful for classification, sequence variations of human Papillomavirus (HPV) genotypes have been implicated in differential oncogenic potential and a differential association with the different histological forms of invasive cervical cancer. These associations have also been indicated for HPV genotype lineages and sub-lineages. In order to better understand the potential implications of lineage variation in the occurrence of cervical cancers in Ghana, we studied the lineages of the three most prevalent HPV genotypes among women with normal cytology as baseline to further studies. Of previously collected self- and health personnel-collected cervical specimen, 54, which were positive for HPV16, 18 and 45, were selected and the long control region (LCR) of each HPV genotype was separately amplified by a nested PCR. DNA sequences of 41 isolates obtained with the forward and reverse primers by Sanger sequencing were analysed. Nucleotide sequence variations of the HPV16 genotypes were observed at 30 positions within the LCR (7460 - 7840). Of these, 19 were the known variations for the lineages B and C (African lineages), while the other 11 positions had variations unique to the HPV16 isolates of this study. For the HPV18 isolates, the variations were at 35 positions, 22 of which were known variations of Africa lineages and the other 13 were unique variations observed for the isolates obtained in this study (at positions 7799 and 7813). HPV45 isolates had variations at 35 positions and 2 (positions 7114 and 97) were unique to the isolates of this study. This study provides the first data on the lineages of HPV 16, 18 and 45 isolates from Ghana. Although the study did not obtain full genome sequence data for a comprehensive comparison with known lineages, these genotypes were predominately of the Africa lineages and had some unique sequence variations at positions that suggest potential oncogenic implications. These data will be useful for comparison with lineages of these genotypes from women with cervical lesion and all the forms of invasive cervical cancers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.
MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).
Musical melody and speech intonation: singing a different tune.
Zatorre, Robert J; Baum, Shari R
2012-01-01
Music and speech are often cited as characteristically human forms of communication. Both share the features of hierarchical structure, complex sound systems, and sensorimotor sequencing demands, and both are used to convey and influence emotions, among other functions [1]. Both music and speech also prominently use acoustical frequency modulations, perceived as variations in pitch, as part of their communicative repertoire. Given these similarities, and the fact that pitch perception and production involve the same peripheral transduction system (cochlea) and the same production mechanism (vocal tract), it might be natural to assume that pitch processing in speech and music would also depend on the same underlying cognitive and neural mechanisms. In this essay we argue that the processing of pitch information differs significantly for speech and music; specifically, we suggest that there are two pitch-related processing systems, one for more coarse-grained, approximate analysis and one for more fine-grained accurate representation, and that the latter is unique to music. More broadly, this dissociation offers clues about the interface between sensory and motor systems, and highlights the idea that multiple processing streams are a ubiquitous feature of neuro-cognitive architectures.
Lu, Xi; Nahum-Shani, Inbal; Kasari, Connie; Lynch, Kevin G.; Oslin, David W.; Pelham, William E.; Fabiano, Gregory; Almirall, Daniel
2016-01-01
A dynamic treatment regime (DTR) is a sequence of decision rules, each of which recommends a treatment based on a patient’s past and current health status. Sequential, multiple assignment, randomized trials (SMARTs) are multi-stage trial designs that yield data specifically for building effective DTRs. Modeling the marginal mean trajectories of a repeated-measures outcome arising from a SMART presents challenges, because traditional longitudinal models used for randomized clinical trials do not take into account the unique design features of SMART. We discuss modeling considerations for various forms of SMART designs, emphasizing the importance of considering the timing of repeated measures in relation to the treatment stages in a SMART. For illustration, we use data from three SMART case studies with increasing level of complexity, in autism, child attention deficit hyperactivity disorder (ADHD), and adult alcoholism. In all three SMARTs we illustrate how to accommodate the design features along with the timing of the repeated measures when comparing DTRs based on mean trajectories of the repeated-measures outcome. PMID:26638988
The "handedness" of language: Directional symmetry breaking of sign usage in words.
Ashraf, Md Izhar; Sinha, Sitabhra
2018-01-01
Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.
Klinger, Christen M.; Ramirez-Macias, Inmaculada; Herman, Emily K.; Turkewitz, Aaron P.; Field, Mark C.; Dacks, Joel B.
2016-01-01
With advances in DNA sequencing technology, it is increasingly common and tractable to informatically look for genes of interest in the genomic databases of parasitic organisms and infer cellular states. Assignment of a putative gene function based on homology to functionally characterized genes in other organisms, though powerful, relies on the implicit assumption of functional homology, i.e. that orthology indicates conserved function. Eukaryotes reveal a dazzling array of cellular features and structural organization, suggesting a concomitant diversity in their underlying molecular machinery. Significantly, examples of novel functions for pre-existing or new paralogues are not uncommon. Do these examples undermine the basic assumption of functional homology, especially in parasitic protists, which are often highly derived? Here we examine the extent to which functional homology exists between organisms spanning the eukaryotic lineage. By comparing membrane trafficking proteins between parasitic protists and traditional model organisms, where direct functional evidence is available, we find that function is indeed largely conserved between orthologues, albeit with significant adaptation arising from the unique biological features within each lineage. PMID:27444378
The “handedness” of language: Directional symmetry breaking of sign usage in words
2018-01-01
Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon. PMID:29342176
Lu, Xi; Nahum-Shani, Inbal; Kasari, Connie; Lynch, Kevin G; Oslin, David W; Pelham, William E; Fabiano, Gregory; Almirall, Daniel
2016-05-10
A dynamic treatment regime (DTR) is a sequence of decision rules, each of which recommends a treatment based on a patient's past and current health status. Sequential, multiple assignment, randomized trials (SMARTs) are multi-stage trial designs that yield data specifically for building effective DTRs. Modeling the marginal mean trajectories of a repeated-measures outcome arising from a SMART presents challenges, because traditional longitudinal models used for randomized clinical trials do not take into account the unique design features of SMART. We discuss modeling considerations for various forms of SMART designs, emphasizing the importance of considering the timing of repeated measures in relation to the treatment stages in a SMART. For illustration, we use data from three SMART case studies with increasing level of complexity, in autism, child attention deficit hyperactivity disorder, and adult alcoholism. In all three SMARTs, we illustrate how to accommodate the design features along with the timing of the repeated measures when comparing DTRs based on mean trajectories of the repeated-measures outcome. Copyright © 2015 John Wiley & Sons, Ltd.
The Unique Biosynthetic Route from Lupinus β-Conglutin Gene to Blad
Monteiro, Sara; Freitas, Regina; Rajasekhar, Baru T.; Teixeira, Artur R.; Ferreira, Ricardo B.
2010-01-01
Background During seed germination, β-conglutin undergoes a major cycle of limited proteolysis in which many of its constituent subunits are processed into a 20 kDa polypeptide termed blad. Blad is the main component of a glycooligomer, accumulating exclusively in the cotyledons of Lupinus species, between days 4 and 12 after the onset of germination. Principal Findings The sequence of the gene encoding β-conglutin precursor (1791 nucleotides) is reported. This gene, which shares 44 to 57% similarity and 20 to 37% identity with other vicilin-like protein genes, includes several features in common with these globulins, but also specific hallmarks. Most notable is the presence of an ubiquitin interacting motif (UIM), which possibly links the unique catabolic route of β-conglutin to the ubiquitin/proteasome proteolytic pathway. Significance Blad forms through a unique route from and is a stable intermediary product of its precursor, β-conglutin, the major Lupinus seed storage protein. It is composed of 173 amino acid residues, is encoded by an intron-containing, internal fragment of the gene that codes for β-conglutin precursor (nucleotides 394 to 913) and exhibits an isoelectric point of 9.6 and a molecular mass of 20,404.85 Da. Consistent with its role as a storage protein, blad contains an extremely high proportion of the nitrogen-rich amino acids. PMID:20066045
Tai, Huanhuan; Lu, Xin; Opitz, Nina; Marcon, Caroline; Paschold, Anja; Lithio, Andrew; Nettleton, Dan; Hochholdinger, Frank
2016-02-01
Maize develops a complex root system composed of embryonic and post-embryonic roots. Spatio-temporal differences in the formation of these root types imply specific functions during maize development. A comparative transcriptomic study of embryonic primary and seminal, and post-embryonic crown roots of the maize inbred line B73 by RNA sequencing along with anatomical studies were conducted early in development. Seminal roots displayed unique anatomical features, whereas the organization of primary and crown roots was similar. For instance, seminal roots displayed fewer cortical cell files and their stele contained more meta-xylem vessels. Global expression profiling revealed diverse patterns of gene activity across all root types and highlighted the unique transcriptome of seminal roots. While functions in cell remodeling and cell wall formation were prominent in primary and crown roots, stress-related genes and transcriptional regulators were over-represented in seminal roots, suggesting functional specialization of the different root types. Dynamic expression of lignin biosynthesis genes and histochemical staining suggested diversification of cell wall lignification among the three root types. Our findings highlight a cost-efficient anatomical structure and a unique expression profile of seminal roots of the maize inbred line B73 different from primary and crown roots. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.
The sea cucumber genome provides insights into morphological evolution and visceral regeneration
Dai, Hui; Hamel, Jean-François; Liu, Chengzhang; Yu, Yang; Liu, Shilin; Lin, Wenchao; Guo, Kaimin; Jin, Songjun; Xu, Peng; Storey, Kenneth B.; Huan, Pin; Zhang, Tao; Zhou, Yi; Zhang, Jiquan; Lin, Chenggang; Li, Xiaoni; Xing, Lili; Huo, Da; Sun, Mingzhe; Wang, Lei; Mercier, Annie; Li, Fuhua; Yang, Hongsheng
2017-01-01
Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, A. japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of approximately 805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome, and proteome analyses of organ regrowth after induced evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem-duplicated prostatic secretory protein of 94 amino acids (PSP94)-like gene family and a significantly expanded fibrinogen-related protein (FREP) gene family. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multiomics data will be of prime value for commercial sea cucumber breeding programs. PMID:29023486
The sea cucumber genome provides insights into morphological evolution and visceral regeneration.
Zhang, Xiaojun; Sun, Lina; Yuan, Jianbo; Sun, Yamin; Gao, Yi; Zhang, Libin; Li, Shihao; Dai, Hui; Hamel, Jean-François; Liu, Chengzhang; Yu, Yang; Liu, Shilin; Lin, Wenchao; Guo, Kaimin; Jin, Songjun; Xu, Peng; Storey, Kenneth B; Huan, Pin; Zhang, Tao; Zhou, Yi; Zhang, Jiquan; Lin, Chenggang; Li, Xiaoni; Xing, Lili; Huo, Da; Sun, Mingzhe; Wang, Lei; Mercier, Annie; Li, Fuhua; Yang, Hongsheng; Xiang, Jianhai
2017-10-01
Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, A. japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of approximately 805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome, and proteome analyses of organ regrowth after induced evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem-duplicated prostatic secretory protein of 94 amino acids (PSP94)-like gene family and a significantly expanded fibrinogen-related protein (FREP) gene family. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multiomics data will be of prime value for commercial sea cucumber breeding programs.
DSAP: deep-sequencing small RNA analysis pipeline.
Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus
2010-07-01
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Possible ancient giant basin and related water enrichment in the Arabia Terra province, Mars
Dohm, J.M.; Barlow, N.G.; Anderson, R.C.; Williams, J.-P.; Miyamoto, H.; Ferris, J.C.; Strom, R.G.; Taylor, G.J.; Fairen, A.G.; Baker, V.R.; Boynton, W.V.; Keller, J.M.; Kerry, K.; Janes, D.; Rodriguez, J.A.P.; Hare, T.M.
2007-01-01
A circular albedo feature in the Arabia Terra province was first hypothesized as an ancient impact basin using Viking-era information. To test this unpublished hypothesis, we have analyzed the Viking era-information together with layers of new data derived from the Mars Global Surveyor (MGS) and Mars Odyssey (MO) missions. Our analysis indicates that Arabia Terra is an ancient geologic province of Mars with many distinct characteristics, including predominantly Noachian materials, a unique part of the highland-lowland boundary, a prominent paleotectonic history, the largest region of fretted terrain on the planet, outflow channels with no obvious origins, extensive exposures of eroded layered sedimentary deposits, and notable structural, albedo, thermal inertia, gravity, magnetic, and elemental signatures. The province also is marked by special impact crater morphologies, which suggest a persistent volatile-rich substrate. No one characteristic provides definitive answers to the dominant event(s) that shaped this unique province. Collectively the characteristics reported here support the following hypothesized sequence of events in Arabia Terra: (1) an enormous basin, possibly of impact origin, formed early in martian history when the magnetic dynamo was active and the lithosphere was relatively thin, (2) sediments and other materials were deposited in the basin during high erosion rates while maintaining isostatic equilibrium, (3) sediments became water enriched during the Noachian Period, and (4) basin materials were uplifted in response to the growth of the Tharsis Bulge, resulting in differential erosion exposing ancient stratigraphic sequences. Parts of the ancient basin remain water-enriched to the present day. ?? 2007 Elsevier Inc. All rights reserved.
Saad, A S A; Massoud, M A; Abdel-Megeed, A A M; Hamid, N A; Mourad, A K K; Barakat, A S T
2007-01-01
Field trails were conducted to determine the performance of three different sequences as a unique solution for the control of the leaf miner Liriomyza trifolii (Burgess) (Diptera: Agromyzidae) infesting garden beans (Phaseolus vulgaris L.) during the two successive seasons of 2004 and 2005. Furthermore, during the evaluation period, the side effect against the ectoparasite Diglyphus isaea (Walker) (Hymenoptera: Eulophidae) was put into consideration. Meanwhile, the comparative evaluation of the pesticides alone showed that abamectin and azadirachtin were highly effective against Liriomyza trifolii, while carbosulfan, pymetrozine and thiamethoxam provided to be of a moderate effect. Moreover, carbosulfan showed harmful effect to the larvae of the ectoparasite Diglyphus isaea (Walker), while abamectin and azadirachtin gave a moderate effect. Thiamethoxam and the the detergent (Masrol 410) had slight effect in this respect. The highly effective sequence among the sequences was abamectin, pymetrozine and azadirachtin, against Liriomyza trifolii (Burgess), with slight harmful effect on Diglyphus isaea (Walker). However the sequence of azadirachtin, pymetrozine and abamectin had a moderate effect on Liriomyza trifolii (Burgess) and exhibited a slight toxic effect on Diglyphus isaea (Walker). In contrast, the sequence of carbosulfan, thiamethoxam and pymetrozine was the least effective and represented a slight effect on Diglyphus isaea (Walker). From this study, it was concluded that abamectin, pymetrozine and azadirachtin sequence has proved to be a unique solution for the control of the leaf miner Liriomyza trifolii (Burgess) infesting garden beans (Phaseolus vulgaris L.) in Egypt.
Analysis of synonymous codon usage patterns in the genus Rhizobium.
Wang, Xinxin; Wu, Liang; Zhou, Ping; Zhu, Shengfeng; An, Wei; Chen, Yu; Zhao, Lin
2013-11-01
The codon usage patterns of rhizobia have received increasing attention. However, little information is available regarding the conserved features of the codon usage patterns in a typical rhizobial genus. The codon usage patterns of six completely sequenced strains belonging to the genus Rhizobium were analysed as model rhizobia in the present study. The relative neutrality plot showed that selection pressure played a role in codon usage in the genus Rhizobium. Spearman's rank correlation analysis combined with correspondence analysis (COA) showed that the codon adaptation index and the effective number of codons (ENC) had strong correlation with the first axis of the COA, which indicated the important role of gene expression level and the ENC in the codon usage patterns in this genus. The relative synonymous codon usage of Cys codons had the strongest correlation with the second axis of the COA. Accordingly, the usage of Cys codons was another important factor that shaped the codon usage patterns in Rhizobium genomes and was a conserved feature of the genus. Moreover, the comparison of codon usage between highly and lowly expressed genes showed that 20 unique preferred codons were shared among Rhizobium genomes, revealing another conserved feature of the genus. This is the first report of the codon usage patterns in the genus Rhizobium.
Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database
2017-01-01
Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799
Robust k-mer frequency estimation using gapped k-mers
Ghandi, Mahmoud; Mohammad-Noori, Morteza
2013-01-01
Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome. PMID:23861010
Robust k-mer frequency estimation using gapped k-mers.
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Beer, Michael A
2014-08-01
Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome.
NASA Astrophysics Data System (ADS)
Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen
2017-01-01
Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.
Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.
Girardot, Charles; Scholtalbers, Jelle; Sauer, Sajoscha; Su, Shu-Yi; Furlong, Eileen E M
2016-10-08
The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je . Je can also be easily installed in Galaxy through the Galaxy toolshed.
Leterrier, Marina; Holappa, Lynn D; Broglie, Karen E; Beckles, Diane M
2008-01-01
Background Starch is of great importance to humans as a food and biomaterial, and the amount and structure of starch made in plants is determined in part by starch synthase (SS) activity. Five SS isoforms, SSI, II, III, IV and Granule Bound SSI, have been identified, each with a unique catalytic role in starch synthesis. The basic mode of action of SSs is known; however our knowledge of several aspects of SS enzymology at the structural and mechanistic level is incomplete. To gain a better understanding of the differences in SS sequences that underscore their specificity, the previously uncharacterised SSIVb from wheat was cloned and extensive bioinformatics analyses of this and other SSs sequences were done. Results The wheat SSIV cDNA is most similar to rice SSIVb with which it shows synteny and shares a similar exon-intron arrangement. The wheat SSIVb gene was preferentially expressed in leaf and was not regulated by a circadian clock. Phylogenetic analysis showed that in plants, SSIV is closely related to SSIII, while SSI, SSII and Granule Bound SSI clustered together and distinctions between the two groups can be made at the genetic level and included chromosomal location and intron conservation. Further, identified differences at the amino acid level in their glycosyltransferase domains, predicted secondary structures, global conformations and conserved residues might be indicative of intragroup functional associations. Conclusion Based on bioinformatics analysis of the catalytic region of 36 SSs and 3 glycogen synthases (GSs), it is suggested that the valine residue in the highly conserved K-X-G-G-L motif in SSIII and SSIV may be a determining feature of primer specificity of these SSs as compared to GBSSI, SSI and SSII. In GBSSI, the Ile485 residue may partially explain that enzyme's unique catalytic features. The flexible 380s Loop in the starch catalytic domain may be important in defining the specificity of action for each different SS and the G-X-G in motif VI could define SSIV and SSIII action particularly. PMID:18826586
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs
2014-01-01
Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
Using the Self-Select Paradigm to Delineate the Nature of Speech Motor Programming
Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.
2015-01-01
Purpose The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. Method A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. Results INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. Conclusions The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance. PMID:19474396
RUCS: rapid identification of PCR primers for unique core sequences.
Thomsen, Martin Christen Frølund; Hasman, Henrik; Westh, Henrik; Kaya, Hülya; Lund, Ole
2017-12-15
Designing PCR primers to target a specific selection of whole genome sequenced strains can be a long, arduous and sometimes impractical task. Such tasks would benefit greatly from an automated tool to both identify unique targets, and to validate the vast number of potential primer pairs for the targets in silico. Here we present RUCS, a program that will find PCR primer pairs and probes for the unique core sequences of a positive genome dataset complement to a negative genome dataset. The resulting primer pairs and probes are in addition to simple selection also validated through a complex in silico PCR simulation. We compared our method, which identifies the unique core sequences, against an existing tool called ssGeneFinder, and found that our method was 6.5-20 times more sensitive. We used RUCS to design primer pairs that would target a set of genomes known to contain the mcr-1 colistin resistance gene. Three of the predicted pairs were chosen for experimental validation using PCR and gel electrophoresis. All three pairs successfully produced an amplicon with the target length for the samples containing mcr-1 and no amplification products were produced for the negative samples. The novel methods presented in this manuscript can reduce the time needed to identify target sequences, and provide a quick virtual PCR validation to eliminate time wasted on ambiguously binding primers. Source code is freely available on https://bitbucket.org/genomicepidemiology/rucs. Web service is freely available on https://cge.cbs.dtu.dk/services/RUCS. mcft@cbs.dtu.dk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao
2012-05-01
Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
Zepeda-Mendoza, Marie Lisandra; Bohmann, Kristine; Carmona Baez, Aldo; Gilbert, M Thomas P
2016-05-03
DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.