subsequent sequence analysis: Topics by Science.gov

Sample records for subsequent sequence analysis

Hierarchical Traces for Reduced NSM Memory Requirements

NASA Astrophysics Data System (ADS)

Dahl, Torbjørn S.

This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
Faster sequence homology searches by clustering subsequences.

PubMed

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2015-04-15

Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Utilization of sequence on relatives to improve analysis of individuals' low-coverage NGS data

USDA-ARS?s Scientific Manuscript database

Low-coverage sequence data is expected to have low call rates under the prevailing paradigm that genotypes are first “called” from sequence data of each individual independently and subsequent analyses (including determination of haplotypes) are dependent on those called genotypes. However, provide...
Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

PubMed Central

Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

2005-01-01

Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

PubMed Central

Mandal, Bijoy Kumar; Kim, Tai-hoon

2013-01-01

We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant) as well as query sequence (virus). Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size). This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length. PMID:24000321
Respiratory motion compensation algorithm of ultrasound hepatic perfusion data acquired in free-breathing

NASA Astrophysics Data System (ADS)

Wu, Kaizhi; Zhang, Xuming; Chen, Guangxie; Weng, Fei; Ding, Mingyue

2013-10-01

Images acquired in free breathing using contrast enhanced ultrasound exhibit a periodic motion that needs to be compensated for if a further accurate quantification of the hepatic perfusion analysis is to be executed. In this work, we present an algorithm to compensate the respiratory motion by effectively combining the PCA (Principal Component Analysis) method and block matching method. The respiratory kinetics of the ultrasound hepatic perfusion image sequences was firstly extracted using the PCA method. Then, the optimal phase of the obtained respiratory kinetics was detected after normalizing the motion amplitude and determining the image subsequences of the original image sequences. The image subsequences were registered by the block matching method using cross-correlation as the similarity. Finally, the motion-compensated contrast images can be acquired by using the position mapping and the algorithm was evaluated by comparing the TICs extracted from the original image sequences and compensated image subsequences. Quantitative comparisons demonstrated that the average fitting error estimated of ROIs (region of interest) was reduced from 10.9278 +/- 6.2756 to 5.1644 +/- 3.3431 after compensating.
Determination of the sequences of protein-derived peptides and peptide mixtures by mass spectrometry

PubMed Central

Morris, Howard R.; Williams, Dudley H.; Ambler, Richard P.

1971-01-01

Micro-quantities of protein-derived peptides have been converted into N-acetylated permethyl derivatives, and their sequences determined by low-resolution mass spectrometry without prior knowledge of their amino acid compositions or lengths. A new strategy is suggested for the mass spectrometric sequencing of oligopeptides or proteins, involving gel filtration of protein hydrolysates and subsequent sequence analysis of peptide mixtures. Finally, results are given that demonstrate for the first time the use of mass spectrometry for the analysis of a protein-derived peptide mixture, again without prior knowledge of the protein or components within the mixture. PMID:5158904
Bacterial population dynamics during the ensiling of Medicago sativa (alfalfa) and subsequent exposure to air.

PubMed

McGarvey, J A; Franco, R B; Palumbo, J D; Hnasko, R; Stanker, L; Mitloehner, F M

2013-06-01

To describe, at high resolution, the bacterial population dynamics and chemical transformations during the ensiling of alfalfa and subsequent exposure to air. Samples of alfalfa, ensiled alfalfa and silage exposed to air were collected and their bacterial population structures compared using 16S rRNA gene libraries containing approximately 1900 sequences each. Cultural and chemical analyses were also performed to complement the 16S gene sequence data. Sequence analysis revealed significant differences (P < 0·05) in the bacterial populations at each time point. The alfalfa-derived library contained mostly sequences associated with the Gammaproteobacteria (including the genera: Enterobacter, Erwinia and Pantoea); the ensiled material contained mostly sequences associated with the lactic acid bacteria (LAB) (including the genera: Lactobacillus, Pediococcus and Lactococcus). Exposure to air resulted in even greater percentages of LAB, especially among the genus Lactobacillus, and a significant drop in bacterial diversity. In-depth 16S rRNA gene sequence analysis revealed significant bacterial population structure changes during ensiling and again during exposure to air. This in-depth description of the bacterial population dynamics that occurred during ensiling and simulated feed out expands our knowledge of these processes. © 2013 The Society for Applied Microbiology No claim to US Government works.
Understanding Number Sequences Leads to Understanding Mathematics Concepts

ERIC Educational Resources Information Center

Pasnak, Robert; Schmerold, Katrina Lea; Robinson, Melissa Fetterer; Gadzichowski, K. Marinka; Bock, Allison M.; O'Brien, Sarah Eva; Kidd, Julie K.; Gallington, Deb A.

2016-01-01

Ninety-six first grade students in an urban school system were tested in October and May on reading, mathematics, and their understanding of sequences of letters and numbers. A time lag analysis was subsequently conducted. In such analyses, cross-correlations between the first measurement of one variable and the second measurement of another are…
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

PubMed

Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

2007-02-14

The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
Effects of Sequences of Cognitions on Group Performance Over Time

PubMed Central

Molenaar, Inge; Chiu, Ming Ming

2017-01-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854
Effects of Sequences of Cognitions on Group Performance Over Time.

PubMed

Molenaar, Inge; Chiu, Ming Ming

2017-04-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.
A Fly-Inspired Mushroom Bodies Model for Sensory-Motor Control Through Sequence and Subsequence Learning.

PubMed

Arena, Paolo; Calí, Marco; Patané, Luca; Portera, Agnese; Strauss, Roland

2016-09-01

Classification and sequence learning are relevant capabilities used by living beings to extract complex information from the environment for behavioral control. The insect world is full of examples where the presentation time of specific stimuli shapes the behavioral response. On the basis of previously developed neural models, inspired by Drosophila melanogaster, a new architecture for classification and sequence learning is here presented under the perspective of the Neural Reuse theory. Classification of relevant input stimuli is performed through resonant neurons, activated by the complex dynamics generated in a lattice of recurrent spiking neurons modeling the insect Mushroom Bodies neuropile. The network devoted to context formation is able to reconstruct the learned sequence and also to trace the subsequences present in the provided input. A sensitivity analysis to parameter variation and noise is reported. Experiments on a roving robot are reported to show the capabilities of the architecture used as a neural controller.
Emergence and subsequent functional specialization of kindlins during evolution of cell adhesiveness

PubMed Central

Meller, Julia; Rogozin, Igor B.; Poliakov, Eugenia; Meller, Nahum; Bedanov-Pack, Mark; Plow, Edward F.; Qin, Jun; Podrez, Eugene A.; Byzova, Tatiana V.

2015-01-01

Kindlins are integrin-interacting proteins essential for integrin-mediated cell adhesiveness. In this study, we focused on the evolutionary origin and functional specialization of kindlins as a part of the evolutionary adaptation of cell adhesive machinery. Database searches revealed that many members of the integrin machinery (including talin and integrins) existed before kindlin emergence in evolution. Among the analyzed species, all metazoan lineages—but none of the premetazoans—had at least one kindlin-encoding gene, whereas talin was present in several premetazoan lineages. Kindlin appears to originate from a duplication of the sequence encoding the N-terminal fragment of talin (the talin head domain) with a subsequent insertion of the PH domain of separate origin. Sequence analysis identified a member of the actin filament–associated protein 1 (AFAP1) superfamily as the most likely origin of the kindlin PH domain. The functional divergence between kindlin paralogues was assessed using the sequence swap (chimera) approach. Comparison of kindlin 2 (K2)/kindlin 3 (K3) chimeras revealed that the F2 subdomain, in particular its C-terminal part, is crucial for the differential functional properties of K2 and K3. The presence of this segment enables K2 but not K3 to localize to focal adhesions. Sequence analysis of the C-terminal part of the F2 subdomain of K3 suggests that insertion of a variable glycine-rich sequence in vertebrates contributed to the loss of constitutive K3 targeting to focal adhesions. Thus emergence and subsequent functional specialization of kindlins allowed multicellular organisms to develop additional tissue-specific adaptations of cell adhesiveness. PMID:25540429
Dissecting Sequences of Regulation and Cognition: Statistical Discourse Analysis of Primary School Children's Collaborative Learning

ERIC Educational Resources Information Center

Molenaar, Inge; Chiu, Ming Ming

2014-01-01

Extending past research showing that regulative activities (metacognitive and relational) can aid learning, this study tests whether sequences of cognitive, metacognitive and relational activities affect subsequent cognition. Scaffolded by a computer avatar, 54 primary school students (working in 18 groups of 3) discussed writing a report about a…
The ability of human nuclear DNA to cause false positive low-abundance heteroplasmy calls varies across the mitochondrial genome.

PubMed

Albayrak, Levent; Khanipov, Kamil; Pimenova, Maria; Golovko, George; Rojas, Mark; Pavlidis, Ioannis; Chumakov, Sergei; Aguilar, Gerardo; Chávez, Arturo; Widger, William R; Fofanov, Yuriy

2016-12-12

Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency ≤ 1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

PubMed

Scheuch, Matthias; Höper, Dirk; Beer, Martin

2015-03-03

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Factors Associated With Surgery Clerkship Performance and Subsequent USMLE Step Scores.

PubMed

Dong, Ting; Copeland, Annesley; Gangidine, Matthew; Schreiber-Gregory, Deanna; Ritter, E Matthew; Durning, Steven J

2018-03-12

We conducted an in-depth empirical investigation to achieve a better understanding of the surgery clerkship from multiple perspectives, including the influence of clerkship sequence on performance, the relationship between self-logged work hours and performance, as well as the association between surgery clerkship performance with subsequent USMLE Step exams' scores. The study cohort consisted of medical students graduating between 2015 and 2018 (n = 687). The primary measures of interest were clerkship sequence (internal medicine clerkship before or after surgery clerkship), self-logged work hours during surgery clerkship, surgery NBME subject exam score, surgery clerkship overall grade, and Step 1, Step 2 CK, and Step 3 exam scores. We reported the descriptive statistics and conducted correlation analysis, stepwise linear regression analysis, and variable selection analysis of logistic regression to answer the research questions. Students who completed internal medicine clerkship prior to surgery clerkship had better performance on surgery subject exam. The subject exam score explained an additional 28% of the variance of the Step 2 CK score, and the clerkship overall score accounted for an additional 24% of the variance after the MCAT scores and undergraduate GPA were controlled. Our finding suggests that the clerkship sequence does matter when it comes to performance on the surgery NBME subject exam. Performance on the surgery subject exam is predictive of subsequent performance on future USMLE Step exams. Copyright © 2018 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

PubMed

Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

2014-02-01

Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.

A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

PubMed Central

Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine

2016-01-01

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543
REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

PubMed Central

Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

2009-01-01

The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
A novel model for DNA sequence similarity analysis based on graph theory.

PubMed

Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

2011-01-01

Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.
Identification of Bacillus Probiotics Isolated from Soil Rhizosphere Using 16S rRNA, recA, rpoB Gene Sequencing and RAPD-PCR.

PubMed

Mohkam, Milad; Nezafat, Navid; Berenjian, Aydin; Mobasher, Mohammad Ali; Ghasemi, Younes

2016-03-01

Some Bacillus species, especially Bacillus subtilis and Bacillus pumilus groups, have highly similar 16S rRNA gene sequences, which are hard to identify based on 16S rDNA sequence analysis. To conquer this drawback, rpoB, recA sequence analysis along with randomly amplified polymorphic (RAPD) fingerprinting was examined as an alternative method for differentiating Bacillus species. The 16S rRNA, rpoB and recA genes were amplified via a polymerase chain reaction using their specific primers. The resulted PCR amplicons were sequenced, and phylogenetic analysis was employed by MEGA 6 software. Identification based on 16S rRNA gene sequencing was underpinned by rpoB and recA gene sequencing as well as RAPD-PCR technique. Subsequently, concatenation and phylogenetic analysis showed that extent of diversity and similarity were better obtained by rpoB and recA primers, which are also reinforced by RAPD-PCR methods. However, in one case, these approaches failed to identify one isolate, which in combination with the phenotypical method offsets this issue. Overall, RAPD fingerprinting, rpoB and recA along with concatenated genes sequence analysis discriminated closely related Bacillus species, which highlights the significance of the multigenic method in more precisely distinguishing Bacillus strains. This research emphasizes the benefit of RAPD fingerprinting, rpoB and recA sequence analysis superior to 16S rRNA gene sequence analysis for suitable and effective identification of Bacillus species as recommended for probiotic products.
Molecular identification and phylogenetic analysis of Wuchereria bancrofti from human blood samples in Egypt.

PubMed

Abdel-Shafi, Iman R; Shoieb, Eman Y; Attia, Samar S; Rubio, José M; Ta-Tang, Thuy-Huong; El-Badry, Ayman A

2017-03-01

Lymphatic filariasis (LF) is a serious vector-borne health problem, and Wuchereria bancrofti (W.b) is the major cause of LF worldwide and is focally endemic in Egypt. Identification of filarial infection using traditional morphologic and immunological criteria can be difficult and lead to misdiagnosis. The aim of the present study was molecular detection of W.b in residents in endemic areas in Egypt, sequence variance analysis, and phylogenetic analysis of W.b DNA. Collected blood samples from residents in filariasis endemic areas in five governorates were subjected to semi-nested PCR targeting repeated DNA sequence, for detection of W.b DNA. PCR products were sequenced; subsequently, a phylogenetic analysis of the obtained sequences was performed. Out of 300 blood samples, W.b DNA was identified in 48 (16%). Sequencing analysis confirmed PCR results identifying only W.b species. Sequence alignment and phylogenetic analysis indicated genetically distinct clusters of W.b among the study population. Study results demonstrated that the semi-nested PCR proved to be an effective diagnostic tool for accurate and rapid detection of W.b infections in nano-epidemics and is applicable for samples collected in the daytime as well as the night time. PCR products sequencing and phylogenitic analysis revealed three different nucleotide sequences variants. Further genetic studies of W.b in Egypt and other endemic areas are needed to distinguish related strains and the various ecological as well as drug effects exerted on them to support W.b elimination.
Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).

PubMed

Mori, Kazuki; Shirasawa, Kenta; Nogata, Hitoshi; Hirata, Chiharu; Tashiro, Kosuke; Habu, Tsuyoshi; Kim, Sangwan; Himeno, Shuichi; Kuhara, Satoru; Ikegami, Hidetoshi

2017-01-25

With the aim of identifying sex determinants of fig, we generated the first draft genome sequence of fig and conducted the subsequent analyses. Linkage analysis with a high-density genetic map established by a restriction-site associated sequencing technique, and genome-wide association study followed by whole-genome resequencing analysis identified two missense mutations in RESPONSIVE-TO-ANTAGONIST1 (RAN1) orthologue encoding copper-transporting ATPase completely associated with sex phenotypes of investigated figs. This result suggests that RAN1 is a possible sex determinant candidate in the fig genome. The genomic resources and genetic findings obtained in this study can contribute to general understanding of Ficus species and provide an insight into fig's and plant's sex determination system.
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

PubMed

Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

2014-08-08

Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
Matrix metalloproteinases: structures, evolution, and diversification.

PubMed

Massova, I; Kotra, L P; Fridman, R; Mobashery, S

1998-09-01

A comprehensive sequence alignment of 64 members of the family of matrix metalloproteinases (MMPs) for the entire sequences, and subsequently the catalytic and the hemopexin-like domains, have been performed. The 64 MMPs were selected from plants, invertebrates, and vertebrates. The analyses disclosed that as many as 23 distinct subfamilies of these proteins are known to exist. Information from the sequence alignments was correlated with structures, both crystallographic as well as computational, of the catalytic domains for the 23 representative members of the MMP family. A survey of the metal binding sites and two loops containing variable sequences of amino acids, which are important for substrate interactions, are discussed. The collective data support the proposal that the assembly of the domains into multidomain enzymes was likely to be an early evolutionary event. This was followed by diversification, perhaps in parallel among the MMPs, in a subsequent evolutionary time scale. Analysis indicates that a retrograde structure simplification may have accounted for the evolution of MMPs with simple domain constituents, such as matrilysin, from the larger and more elaborate enzymes.
KRAS Mutation Test in Korean Patients with Colorectal Carcinomas: A Methodological Comparison between Sanger Sequencing and a Real-Time PCR-Based Assay.

PubMed

Lee, Sung Hak; Chung, Arthur Minwoo; Lee, Ahwon; Oh, Woo Jin; Choi, Yeong Jin; Lee, Youn-Soo; Jung, Eun Sun

2017-01-01

Mutations in the KRAS gene have been identified in approximately 50% of colorectal cancers (CRCs). KRAS mutations are well established biomarkers in anti-epidermal growth factor receptor therapy. Therefore, assessment of KRAS mutations is needed in CRC patients to ensure appropriate treatment. We compared the analytical performance of the cobas test to Sanger sequencing in 264 CRC cases. In addition, discordant specimens were evaluated by 454 pyrosequencing. KRAS mutations for codons 12/13 were detected in 43.2% of cases (114/264) by Sanger sequencing. Of 257 evaluable specimens for comparison, KRAS mutations were detected in 112 cases (43.6%) by Sanger sequencing and 118 cases (45.9%) by the cobas test. Concordance between the cobas test and Sanger sequencing for each lot was 93.8% positive percent agreement (PPA) and 91.0% negative percent agreement (NPA) for codons 12/13. Results from the cobas test and Sanger sequencing were discordant for 20 cases (7.8%). Twenty discrepant cases were subsequently subjected to 454 pyrosequencing. After comprehensive analysis of the results from combined Sanger sequencing-454 pyrosequencing and the cobas test, PPA was 97.5% and NPA was 100%. The cobas test is an accurate and sensitive test for detecting KRAS -activating mutations and has analytical power equivalent to Sanger sequencing. Prescreening using the cobas test with subsequent application of Sanger sequencing is the best strategy for routine detection of KRAS mutations in CRC.
Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

PubMed

Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

2015-03-01

Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

NASA Astrophysics Data System (ADS)

Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

2017-07-01

Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Dialog detection in narrative video by shot and face analysis

NASA Astrophysics Data System (ADS)

Kroon, B.; Nesvadba, J.; Hanjalic, A.

2007-01-01

The proliferation of captured personal and broadcast content in personal consumer archives necessitates comfortable access to stored audiovisual content. Intuitive retrieval and navigation solutions require however a semantic level that cannot be reached by generic multimedia content analysis alone. A fusion with film grammar rules can help to boost the reliability significantly. The current paper describes the fusion of low-level content analysis cues including face parameters and inter-shot similarities to segment commercial content into film grammar rule-based entities and subsequently classify those sequences into so-called shot reverse shots, i.e. dialog sequences. Moreover shot reverse shot specific mid-level cues are analyzed augmenting the shot reverse shot information with dialog specific descriptions.
Evidence for tyrosine-linked glycosaminoglycan in a bacterial surface protein.

PubMed

Peters, J; Rudolf, S; Oschkinat, H; Mengele, R; Sumper, M; Kellermann, J; Lottspeich, F; Baumeister, W

1992-04-01

The S-layer protein of Acetogenium kivui was subjected to proteolysis with different proteases and several high molecular mass glycosaminoglycan peptides containing glucose, galactosamine and an unidentified sugar-related component were separated by molecular sieve chromatography and reversed-phase HPLC and subjected to N-terminal sequence analysis. By methylation analysis glucose was found to be uniformly 1,6-linked, whereas galactosamine was exclusively 1,4-linked. Hydrazinolysis and subsequent amino-acid analysis as well as two-dimensional NMR spectroscopy were used to demonstrate that in these peptides carbohydrate was covalently linked to tyrosine. As all of the four Tyr-glycosylation sites were found to be preceded by valine, a new recognition sequence for glycosylation is suggested.
Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.

PubMed

Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong

2018-05-01

This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.
Identification of the ancestral haplotype for apolipoprotein B suggests an African origin of Homo sapiens sapiens and traces their subsequent migration to Europe and the Pacific.

PubMed Central

Rapacz, J; Chen, L; Butler-Brunner, E; Wu, M J; Hasler-Rapacz, J O; Butler, R; Schumaker, V N

1991-01-01

The probable ancestral haplotype for human apolipoprotein B (apoB) has been identified through immunological analysis of chimpanzee and gorilla serum and sequence analysis of their DNA. Moreover, the frequency of this ancestral apoB haplotype among different human populations provides strong support for the African origin of Homo sapiens sapiens and their subsequent migration from Africa to Europe and to the Pacific. The approach used here for the identification of the ancestral human apoB haplotype is likely to be applicable to many other genes. PMID:1996341
Identification of the ancestral haplotype for apolipoprotein B suggests an African origin of Homo sapiens sapiens and traces their subsequent migration to Europe and the Pacific

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rapacz, J.; Hasler-Rapacz, J.O.; Chen, L.

1991-02-15

The probable ancestral haplotype for human apolipoprotein B (apoB) has been identified through immunological analysis of chimpanzee and gorilla serum and sequence analysis of their DNA. Moreover, the frequency of this ancestral apoB haplotype among different human populations provides strong support for the African origin of Homo sapiens sapiens and their subsequent migration from Africa to Europe and to the Pacific. The approach used here for the identification of the ancestral human apoB haplotype is likely to be applicable to many other genes.
Evaluation and Adaptation of a Laboratory-Based cDNA Library Preparation Protocol for Retrospective Sequencing of Archived MicroRNAs from up to 35-Year-Old Clinical FFPE Specimens

PubMed Central

Loudig, Olivier; Wang, Tao; Ye, Kenny; Lin, Juan; Wang, Yihong; Ramnauth, Andrew; Liu, Christina; Stark, Azadeh; Chitale, Dhananjay; Greenlee, Robert; Multerer, Deborah; Honda, Stacey; Daida, Yihe; Spencer Feigelson, Heather; Glass, Andrew; Couch, Fergus J.; Rohan, Thomas; Ben-Dov, Iddo Z.

2017-01-01

Formalin-fixed paraffin-embedded (FFPE) specimens, when used in conjunction with patient clinical data history, represent an invaluable resource for molecular studies of cancer. Even though nucleic acids extracted from archived FFPE tissues are degraded, their molecular analysis has become possible. In this study, we optimized a laboratory-based next-generation sequencing barcoded cDNA library preparation protocol for analysis of small RNAs recovered from archived FFPE tissues. Using matched fresh and FFPE specimens, we evaluated the robustness and reproducibility of our optimized approach, as well as its applicability to archived clinical specimens stored for up to 35 years. We then evaluated this cDNA library preparation protocol by performing a miRNA expression analysis of archived breast ductal carcinoma in situ (DCIS) specimens, selected for their relation to the risk of subsequent breast cancer development and obtained from six different institutions. Our analyses identified six miRNAs (miR-29a, miR-221, miR-375, miR-184, miR-363, miR-455-5p) differentially expressed between DCIS lesions from women who subsequently developed an invasive breast cancer (cases) and women who did not develop invasive breast cancer within the same time interval (control). Our thorough evaluation and application of this laboratory-based miRNA sequencing analysis indicates that the preparation of small RNA cDNA libraries can reliably be performed on older, archived, clinically-classified specimens. PMID:28335433
Codebook-based electrooculography data analysis towards cognitive activity recognition.

PubMed

Lagodzinski, P; Shirahama, K; Grzegorzek, M

2018-04-01

With the advancement in mobile/wearable technology, people started to use a variety of sensing devices to track their daily activities as well as health and fitness conditions in order to improve the quality of life. This work addresses an idea of eye movement analysis, which due to the strong correlation with cognitive tasks can be successfully utilized in activity recognition. Eye movements are recorded using an electrooculographic (EOG) system built into the frames of glasses, which can be worn more unobtrusively and comfortably than other devices. Since the obtained information is low-level sensor data expressed as a sequence representing values in constant intervals (100 Hz), the cognitive activity recognition problem is formulated as sequence classification. However, it is unclear what kind of features are useful for accurate cognitive activity recognition. Thus, a machine learning algorithm like a codebook approach is applied, which instead of focusing on feature engineering is using a distribution of characteristic subsequences (codewords) to describe sequences of recorded EOG data, where the codewords are obtained by clustering a large number of subsequences. Further, statistical analysis of the codeword distribution results in discovering features which are characteristic to a certain activity class. Experimental results demonstrate good accuracy of the codebook-based cognitive activity recognition reflecting the effective usage of the codewords. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

PubMed Central

Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

PubMed

Grunert, Steffen; Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.

A sequential analysis of classroom discourse in Italian primary schools: the many faces of the IRF pattern.

PubMed

Molinari, Luisa; Mameli, Consuelo; Gnisci, Augusto

2013-09-01

A sequential analysis of classroom discourse is needed to investigate the conditions under which the triadic initiation-response-feedback (IRF) pattern may host different teaching orientations. The purpose of the study is twofold: first, to describe the characteristics of classroom discourse and, second, to identify and explore the different interactive sequences that can be captured with a sequential statistical analysis. Twelve whole-class activities were video recorded in three Italian primary schools. We observed classroom interaction as it occurs naturally on an everyday basis. In total, we collected 587 min of video recordings. Subsequently, 828 triadic IRF patterns were extracted from this material and analysed with the programme Generalized Sequential Query (GSEQ). The results indicate that classroom discourse may unfold in different ways. In particular, we identified and described four types of sequences. Dialogic sequences were triggered by authentic questions, and continued through further relaunches. Monologic sequences were directed to fulfil the teachers' pre-determined didactic purposes. Co-constructive sequences fostered deduction, reasoning, and thinking. Scaffolding sequences helped and sustained children with difficulties. The application of sequential analyses allowed us to show that interactive sequences may account for a variety of meanings, thus making a significant contribution to the literature and research practice in classroom discourse. © 2012 The British Psychological Society.
Data compression of discrete sequence: A tree based approach using dynamic programming

NASA Technical Reports Server (NTRS)

Shivaram, Gurusrasad; Seetharaman, Guna; Rao, T. R. N.

1994-01-01

A dynamic programming based approach for data compression of a ID sequence is presented. The compression of an input sequence of size N to that of a smaller size k is achieved by dividing the input sequence into k subsequences and replacing the subsequences by their respective average values. The partitioning of the input sequence is carried with the intention of reducing the mean squared error in the reconstructed sequence. The complexity involved in finding the partitions which would result in such an optimal compressed sequence is reduced by using the dynamic programming approach, which is presented.
Whole exome sequencing: a state-of-the-art approach for defining (and exploring!) genetic landscapes in pediatric nephrology.

PubMed

Gulati, Ashima; Somlo, Stefan

2018-05-01

The genesis of whole exome sequencing as a powerful tool for detailing the protein coding sequence of the human genome was conceptualized based on the availability of next-generation sequencing technology and knowledge of the human reference genome. The field of pediatric nephrology enriched with molecularly unsolved phenotypes is allowing the clinical and research application of whole exome sequencing to enable novel gene discovery and provide amendment of phenotypic misclassification. Recent studies in the field have informed us that newer high-throughput sequencing techniques are likely to be of high yield when applied in conjunction with conventional genomic approaches such as linkage analysis and other strategies used to focus subsequent analysis. They have also emphasized the need for the validation of novel genetic findings in large collaborative cohorts and the production of robust corroborative biological data. The well-structured application of comprehensive genomic testing in clinical and research arenas will hopefully continue to advance patient care and precision medicine, but does call for attention to be paid to its integrated challenges.
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences

NASA Technical Reports Server (NTRS)

Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene

2006-01-01

This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
Two general models that generate long range correlation

NASA Astrophysics Data System (ADS)

Gan, Xiaocong; Han, Zhangang

2012-06-01

In this paper we study two models that generate sequences with LRC (long range correlation). For the IFT (inverse Fourier transform) model, our conclusion is the low frequency part leads to LRC, while the high frequency part tends to eliminate it. Therefore, a typical method to generate a sequence with LRC is multiplying the spectrum of a white noise sequence by a decaying function. A special case is analyzed: the linear combination of a smooth curve and a white noise sequence, in which the DFA plot consists of two line segments. For the patch model, our conclusion is long subsequences leads to LRC, while short subsequences tend to eliminate it. Therefore, we can generate a sequence with LRC by using a fat-tailed PDF (probability distribution function) of the length of the subsequences. A special case is also analyzed: if a patch model with long subsequences is mixed with a white noise sequence, the DFA plot will consist of two line segments. We have checked known models and actual data, and found they are all consistent with this study.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

PubMed Central

Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2015-01-01

Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Somatic mutations in benign breast disease tissue and risk of subsequent invasive breast cancer.

PubMed

Rohan, Thomas E; Miller, Christopher A; Li, Tiandao; Wang, Yihong; Loudig, Olivier; Ginsberg, Mindy; Glass, Andrew; Mardis, Elaine

2018-06-06

Insights into the molecular pathogenesis of breast cancer might come from molecular analysis of tissue from early stages of the disease. We conducted a case-control study nested in a cohort of women who had biopsy-confirmed benign breast disease (BBD) diagnosed between 1971 and 2006 at Kaiser Permanente Northwest and who were followed to mid-2015 to ascertain subsequent invasive breast cancer (IBC); cases (n = 218) were women with BBD who developed subsequent IBC and controls, individually matched (1:1) to cases, were women with BBD who did not develop IBC in the same follow-up interval as that for the corresponding case. Targeted sequence capture and sequencing were performed for 83 genes of importance in breast cancer. There were no significant case-control differences in mutation burden overall, for non-silent mutations, for individual genes, or with respect either to the nature of the gene mutations or to mutational enrichment at the pathway level. For seven subjects with DNA from the BBD and ipsilateral IBC, virtually no mutations were shared. This study, the first to use a targeted multi-gene sequencing approach on early breast cancer precursor lesions to investigate the genomic basis of the disease, showed that somatic mutations detected in BBD tissue were not associated with breast cancer risk.
Pichia stipitis genomics, transcriptomics, and gene clusters

Treesearch

Thomas W. Jeffries; Jennifer R. Headman Van Vleet

2009-01-01

Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...
Array of nucleic acid probes on biological chips for diagnosis of HIV and methods of using the same

DOEpatents

Chee, Mark; Gingeras, Thomas R.; Fodor, Stephen P. A.; Hubble, Earl A.; Morris, MacDonald S.

1999-01-19

The invention provides an array of oligonucleotide probes immobilized on a solid support for analysis of a target sequence from a human immunodeficiency virus. The array comprises at least four sets of oligonucleotide probes 9 to 21 nucleotides in length. A first probe set has a probe corresponding to each nucleotide in a reference sequence from a human immunodeficiency virus. A probe is related to its corresponding nucleotide by being exactly complementary to a subsequence of the reference sequence that includes the corresponding nucleotide. Thus, each probe has a position, designated an interrogation position, that is occupied by a complementary nucleotide to the corresponding nucleotide. The three additional probe sets each have a corresponding probe for each probe in the first probe set. Thus, for each nucleotide in the reference sequence, there are four corresponding probes, one from each of the probe sets. The three corresponding probes in the three additional probe sets are identical to the corresponding probe from the first probe or a subsequence thereof that includes the interrogation position, except that the interrogation position is occupied by a different nucleotide in each of the four corresponding probes.
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

PubMed Central

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Rapid gene identification in sugar beet using deep sequencing of DNA from phenotypic pools selected from breeding panels.

PubMed

Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd

2016-03-15

The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.
Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data

PubMed Central

Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.

2015-01-01

ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

PubMed

Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

2017-01-01

Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.
The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

PubMed

Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

2012-03-15

Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
Targeted exome sequencing and chromosomal microarray for the molecular diagnosis of nevoid basal cell carcinoma syndrome.

PubMed

Matsudate, Yoshihiro; Naruto, Takuya; Hayashi, Yumiko; Minami, Mitsuyoshi; Tohyama, Mikiko; Yokota, Kenji; Yamada, Daisuke; Imoto, Issei; Kubo, Yoshiaki

2017-06-01

Nevoid basal cell carcinoma syndrome (NBCCS) is an autosomal dominant disorder mainly caused by heterozygous mutations of PTCH1. In addition to characteristic clinical features, detection of a mutation in causative genes is reliable for the diagnosis of NBCCS; however, no mutations have been identified in some patients using conventional methods. To improve the method for the molecular diagnosis of NBCCS. We performed targeted exome sequencing (TES) analysis using a multi-gene panel, including PTCH1, PTCH2, SUFU, and other sonic hedgehog signaling pathway-related genes, based on next-generation sequencing (NGS) technology in 8 cases in whom possible causative mutations were not detected by previously performed conventional analysis and 2 recent cases of NBCCS. Subsequent analysis of gross deletion within or around PTCH1 detected by TES was performed using chromosomal microarray (CMA). Through TES analysis, specific single nucleotide variants or small indels of PTCH1 causing inferred amino acid changes were identified in 2 novel cases and 2 undiagnosed cases, whereas gross deletions within or around PTCH1, which are validated by CMA, were found in 3 undiagnosed cases. However, no mutations were detected even by TES in 3 cases. Among 3 cases with gross deletions of PTCH1, deletions containing the entire PTCH1 and additional neighboring genes were detected in 2 cases, one of which exhibited atypical clinical features, such as severe mental retardation, likely associated with genes located within the 4.3Mb deleted region, especially. TES-based simultaneous evaluation of sequences and copy number status in all targeted coding exons by NGS is likely to be more useful for the molecular diagnosis of NBCCS than conventional methods. CMA is recommended as a subsequent analysis for validation and detailed mapping of deleted regions, which may explain the atypical clinical features of NBCCS cases. Copyright © 2017 Japanese Society for Investigative Dermatology. Published by Elsevier B.V. All rights reserved.
The Application of Next-Generation Sequencing for Mutation Detection in Autosomal-Dominant Hereditary Hearing Impairment.

PubMed

Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K

2017-07-01

Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
Bayesian selection of Markov models for symbol sequences: application to microsaccadic eye movements.

PubMed

Bettenbühl, Mario; Rusconi, Marco; Engbert, Ralf; Holschneider, Matthias

2012-01-01

Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.
“Shovel-ready” Sequences as a Stimulus for the Next Generation of Life Scientists

PubMed Central

Boyle, Michael D.

2010-01-01

Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists. PMID:23653696
"Shovel-ready" Sequences as a Stimulus for the Next Generation of Life Scientists.

PubMed

Boyle, Michael D

2010-01-01

Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.

Application of the MIDAS approach for analysis of lysine acetylation sites.

PubMed

Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M

2013-01-01

Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.
Zaba: a novel miniature transposable element present in genomes of legume plants.

PubMed

Macas, J; Neumann, P; Pozárková, D

2003-08-01

A novel family of miniature transposable elements, named Zaba, was identified in pea (Pisum sativum) and subsequently also in other legume species using computer analysis of their DNA sequences. Zaba elements are 141-190 bp long, generate 10-bp target site duplications, and their terminal inverted repeats make up most of the sequence. Zaba elements thus resemble class 3 foldback transposons. The elements are only moderately repetitive in pea (tens to hundreds copies per haploid genome), but they are present in up to thousands of copies in the genomes of several Medicago and Vicia species. More detailed analysis of the elements from pea, including isolation of new sequences from a genomic library, revealed that a fraction of these elements are truncated, and that their last transposition probably did not occur recently. A search for Zaba sequences in EST databases showed that at least some elements are transcribed, most probably due to their association with genic regions.
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.

PubMed

Eernisse, D J

1992-04-01

DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
Sequence analysis to assess labour market participation following vocational rehabilitation: an observational study among patients sick-listed with low back pain from a randomised clinical trial in Denmark

PubMed Central

Lindholdt, Louise; Labriola, Merete; Nielsen, Claus Vinther; Horsbøl, Trine Allerslev; Lund, Thomas

2017-01-01

Introduction The return-to-work (RTW) process after long-term sickness absence is often complex and long and implies multiple shifts between different labour market states for the absentee. Standard methods for examining RTW research typically rely on the analysis of one outcome measure at a time, which will not capture the many possible states and transitions the absentee can go through. The purpose of this study was to explore the potential added value of sequence analysis in supplement to standard regression analysis of a multidisciplinary RTW intervention among patients with low back pain (LBP). Methods The study population consisted of 160 patients randomly allocated to either a hospital-based brief or a multidisciplinary intervention. Data on labour market participation following intervention were obtained from a national register and analysed in two ways: as a binary outcome expressed as active or passive relief at a 1-year follow-up and as four different categories for labour market participation. Logistic regression and sequence analysis were performed. Results The logistic regression analysis showed no difference in labour market participation for patients in the two groups after 1 year. Applying sequence analysis showed differences in subsequent labour market participation after 2 years after baseline in favour of the brief intervention group versus the multidisciplinary intervention group. Conclusion The study indicated that sequence analysis could provide added analytical value as a supplement to traditional regression analysis in prospective studies of RTW among patients with LBP. PMID:28729315
An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

PubMed

Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

2011-01-01

cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

PubMed

Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

2013-08-01

High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

PubMed Central

Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

2013-01-01

High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089
A basic analysis toolkit for biological sequences

PubMed Central

Giancarlo, Raffaele; Siragusa, Alessandro; Siragusa, Enrico; Utro, Filippo

2007-01-01

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL. PMID:17877802
Detecting cooperative sequences in the binding of RNA Polymerase-II

NASA Astrophysics Data System (ADS)

Glass, Kimberly; Rozenberg, Julian; Girvan, Michelle; Losert, Wolfgang; Ott, Ed; Vinson, Charles

2008-03-01

Regulation of the expression level of genes is a key biological process controlled largely by the 1000 base pair (bp) sequence preceding each gene (the promoter region). Within that region transcription factor binding sites (TFBS), 5-10 bp long sequences, act individually or cooperate together in the recruitment of, and therefore subsequent gene transcription by, RNA Polymerase-II (RNAP). We have measured the binding of RNAP to promoters on a genome-wide basis using Chromatin Immunoprecipitation (ChIP-on-Chip) microarray assays. Using all 8-base pair long sequences as a test set, we have identified the DNA sequences that are enriched in promoters with high RNAP binding values. We are able to demonstrate that virtually all sequences enriched in such promoters contain a CpG dinucleotide, indicating that TFBS that contain the CpG dinucleotide are involved in RNAP binding to promoters. Further analysis shows that the presence of pairs of CpG containing sequences cooperate to enhance the binding of RNAP to the promoter.
Genome Sequencing and Analysis of the Tasmanian Devil and Its Transmissible Cancer

PubMed Central

Murchison, Elizabeth P.; Schulz-Trieglaff, Ole B.; Ning, Zemin; Alexandrov, Ludmil B.; Bauer, Markus J.; Fu, Beiyuan; Hims, Matthew; Ding, Zhihao; Ivakhno, Sergii; Stewart, Caitlin; Ng, Bee Ling; Wong, Wendy; Aken, Bronwen; White, Simon; Alsop, Amber; Becq, Jennifer; Bignell, Graham R.; Cheetham, R. Keira; Cheng, William; Connor, Thomas R.; Cox, Anthony J.; Feng, Zhi-Ping; Gu, Yong; Grocock, Russell J.; Harris, Simon R.; Khrebtukova, Irina; Kingsbury, Zoya; Kowarsky, Mark; Kreiss, Alexandre; Luo, Shujun; Marshall, John; McBride, David J.; Murray, Lisa; Pearse, Anne-Maree; Raine, Keiran; Rasolonjatovo, Isabelle; Shaw, Richard; Tedder, Philip; Tregidgo, Carolyn; Vilella, Albert J.; Wedge, David C.; Woods, Gregory M.; Gormley, Niall; Humphray, Sean; Schroth, Gary; Smith, Geoffrey; Hall, Kevin; Searle, Stephen M.J.; Carter, Nigel P.; Papenfuss, Anthony T.; Futreal, P. Andrew; Campbell, Peter J.; Yang, Fengtang; Bentley, David R.; Evers, Dirk J.; Stratton, Michael R.

2012-01-01

Summary The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations. PaperClip PMID:22341448
Staphylococcus nepalensis in the guano of bats (Mammalia).

PubMed

Vandžurová, A; Bačkor, P; Javorský, P; Pristaš, P

2013-05-31

Thirty randomly selected mesophilic isolates from the six years old guano sample from mixed Myotis myotis and M. blythii summer roosts colony were isolated and identified as Staphylococcus nepalensis using MALDI TOF analysis. 16S rRNA gene sequencing of selected five isolates and subsequent phylogenetic analysis confirmed that all sequences showed the highest similarity to S. nepalensis sequences. Several virulence factors were produced by tested isolates, mainly capsule formation and resistance to tetracycline, ampicillin, gentamycin, and chloramphenicol antibiotics. Our experiments show that the majority of cultivable mesophilic bacteria from the guano of bats belong to the S. nepalensis species. This is the first report on the occurrence of this species in the guano of bats and our results indicate that the guano accumulated near or directly in human dwellings and buildings may represent a significant risk for human health. Copyright © 2013 Elsevier B.V. All rights reserved.
Computational and experimental analysis of DNA shuffling

PubMed Central

Maheshri, Narendra; Schaffer, David V.

2003-01-01

We describe a computational model of DNA shuffling based on the thermodynamics and kinetics of this process. The model independently tracks a representative ensemble of DNA molecules and records their states at every stage of a shuffling reaction. These data can subsequently be analyzed to yield information on any relevant metric, including reassembly efficiency, crossover number, type and distribution, and DNA sequence length distributions. The predictive ability of the model was validated by comparison to three independent sets of experimental data, and analysis of the simulation results led to several unique insights into the DNA shuffling process. We examine a tradeoff between crossover frequency and reassembly efficiency and illustrate the effects of experimental parameters on this relationship. Furthermore, we discuss conditions that promote the formation of useless “junk” DNA sequences or multimeric sequences containing multiple copies of the reassembled product. This model will therefore aid in the design of optimal shuffling reaction conditions. PMID:12626764
Haemagglutinin and neuraminidase sequencing delineate nosocomial influenza outbreaks with accuracy equivalent to whole genome sequencing.

PubMed

Houghton, Rebecca; Ellis, Joanna; Galiano, Monica; Clark, Tristan W; Wyllie, Sarah

2017-04-01

We describe haemagglutinin (HA) and neuraminidase (NA) sequencing in an apparent cross-site influenza A(H1N1) outbreak in renal transplant and haemodialysis patients, confirmed with whole genome sequencing (WGS). Isolates were sequenced from influenza positive individuals. Phylogenetic trees were constructed using HA and NA sequencing and subsequently WGS. Sequence data was analysed to determine genetic relatedness of viruses obtained from inpatient and outpatient cohorts and compared with epidemiological outbreak information. There were 6 patient cases of influenza in the inpatient renal ward cohort (associated with 3 deaths) and 9 patient cases in the outpatient haemodialysis unit cohort (no deaths). WGS confirmed clustered transmission of two genetically different influenza A(H1N1)pdm09 strains initially identified by analysis of HA and NA genes. WGS took longer, and in this case was not required to determine whether or not the two seemingly linked outbreaks were related. Rapid sequencing of HA and NA genes may be sufficient to aid early influenza outbreak investigation making it appealing for future outbreak investigation. However, as next generation sequencing becomes cheaper and more widely available and bioinformatics software is now freely accessible next generation whole genome analysis may increasingly become a valuable tool for real-time Influenza outbreak investigation. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Integrated systems analysis reveals a molecular network underlying autism spectrum disorders

PubMed Central

Li, Jingjing; Shi, Minyi; Ma, Zhihai; Zhao, Shuchun; Euskirchen, Ghia; Ziskin, Jennifer; Urban, Alexander; Hallmayer, Joachim; Snyder, Michael

2014-01-01

Autism is a complex disease whose etiology remains elusive. We integrated previously and newly generated data and developed a systems framework involving the interactome, gene expression and genome sequencing to identify a protein interaction module with members strongly enriched for autism candidate genes. Sequencing of 25 patients confirmed the involvement of this module in autism, which was subsequently validated using an independent cohort of over 500 patients. Expression of this module was dichotomized with a ubiquitously expressed subcomponent and another subcomponent preferentially expressed in the corpus callosum, which was significantly affected by our identified mutations in the network center. RNA-sequencing of the corpus callosum from patients with autism exhibited extensive gene mis-expression in this module, and our immunochemical analysis showed that the human corpus callosum is predominantly populated by oligodendrocyte cells. Analysis of functional genomic data further revealed a significant involvement of this module in the development of oligodendrocyte cells in mouse brain. Our analysis delineates a natural network involved in autism, helps uncover novel candidate genes for this disease and improves our understanding of its molecular pathology. PMID:25549968
Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan).

PubMed

Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Kumar, Vinay; Kale, Sandip M; Sinha, Pallavi; Chitikineni, Annapurna; Pazhamala, Lekha T; Garg, Vanika; Sharma, Mamta; Sameer Kumar, Chanda Venkata; Parupalli, Swathi; Vechalapu, Suryanarayana; Patil, Suyash; Muniswamy, Sonnappa; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Dharmaraj, Pallavi Subbanna; Varshney, Rajeev K

2016-05-01

To map resistance genes for Fusarium wilt (FW) and sterility mosaic disease (SMD) in pigeonpea, sequencing-based bulked segregant analysis (Seq-BSA) was used. Resistant (R) and susceptible (S) bulks from the extreme recombinant inbred lines of ICPL 20096 × ICPL 332 were sequenced. Subsequently, SNP index was calculated between R- and S-bulks with the help of draft genome sequence and reference-guided assembly of ICPL 20096 (resistant parent). Seq-BSA has provided seven candidate SNPs for FW and SMD resistance in pigeonpea. In parallel, four additional genotypes were re-sequenced and their combined analysis with R- and S-bulks has provided a total of 8362 nonsynonymous (ns) SNPs. Of 8362 nsSNPs, 60 were found within the 2-Mb flanking regions of seven candidate SNPs identified through Seq-BSA. Haplotype analysis narrowed down to eight nsSNPs in seven genes. These eight nsSNPs were further validated by re-sequencing 11 genotypes that are resistant and susceptible to FW and SMD. This analysis revealed association of four candidate nsSNPs in four genes with FW resistance and four candidate nsSNPs in three genes with SMD resistance. Further, In silico protein analysis and expression profiling identified two most promising candidate genes namely C.cajan_01839 for SMD resistance and C.cajan_03203 for FW resistance. Identified candidate genomic regions/SNPs will be useful for genomics-assisted breeding in pigeonpea. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Molecular evolution of an Avirulence Homolog (Avh) gene subfamily in Phytophthora ramorum

Treesearch

GossErica M.; Caroline M. Press; Niklaus J. Grünwald

2008-01-01

Pathogen effectors can serve a virulence function on behalf of the pathogen or trigger a rapid defense response in resistant hosts. Sequencing of the Phytophthora ramorum genome and subsequent analysis identified a diverse superfamily of approximately 350 genes that are homologous to the four known avirulence genes in plant pathogenic oomycetes and...
Comparative Modeling of Proteins: A Method for Engaging Students' Interest in Bioinformatics Tools

ERIC Educational Resources Information Center

Badotti, Fernanda; Barbosa, Alan Sales; Reis, André Luiz Martins; do Valle, Ítalo Faria; Ambrósio, Lara; Bitar, Mainá

2014-01-01

The huge increase in data being produced in the genomic era has produced a need to incorporate computers into the research process. Sequence generation, its subsequent storage, interpretation, and analysis are now entirely computer-dependent tasks. Universities from all over the world have been challenged to seek a way of encouraging students to…
Selection and Validation of a Multilocus Variable-Number Tandem-Repeat Analysis Panel for Typing Shigella spp.▿ †

PubMed Central

Gorgé, Olivier; Lopez, Stéphanie; Hilaire, Valérie; Lisanti, Olivier; Ramisse, Vincent; Vergnaud, Gilles

2008-01-01

The Shigella genus has historically been separated into four species, based on biochemical assays. The classification within each species relies on serotyping. Recently, genome sequencing and DNA assays, in particular the multilocus sequence typing (MLST) approach, greatly improved the current knowledge of the origin and phylogenetic evolution of Shigella spp. The Shigella and Escherichia genera are now considered to belong to a unique genomospecies. Multilocus variable-number tandem-repeat (VNTR) analysis (MLVA) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses of highly homogeneous bacterial pathogens. Here, we assess the capability of MLVA for Shigella typing. Thirty-two potentially polymorphic VNTRs were selected by analyzing in silico five Shigella genomic sequences and subsequently evaluated. Eventually, a panel of 15 VNTRs was selected (i.e., MLVA15 analysis). MLVA15 analysis of 78 strains or genome sequences of Shigella spp. and 11 strains or genome sequences of Escherichia coli distinguished 83 genotypes. Shigella population cluster analysis gave consistent results compared to MLST. MLVA15 analysis showed capabilities for E. coli typing, providing classification among pathogenic and nonpathogenic E. coli strains included in the study. The resulting data can be queried on our genotyping webpage (http://mlva.u-psud.fr). The MLVA15 assay is rapid, highly discriminatory, and reproducible for Shigella and Escherichia strains, suggesting that it could significantly contribute to epidemiological trace-back analysis of Shigella infections and pathogenic Escherichia outbreaks. Typing was performed on strains obtained mostly from collections. Further studies should include strains of much more diverse origins, including all pathogenic E. coli types. PMID:18216214
Common Amino Acid Subsequences in a Universal Proteome—Relevance for Food Science

PubMed Central

Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Sokołowska, Jolanta; Starowicz, Piotr; Bucholska, Justyna; Hrynkiewicz, Monika

2015-01-01

A common subsequence is a fragment of the amino acid chain that occurs in more than one protein. Common subsequences may be an object of interest for food scientists as biologically active peptides, epitopes, and/or protein markers that are used in comparative proteomics. An individual bioactive fragment, in particular the shortest fragment containing two or three amino acid residues, may occur in many protein sequences. An individual linear epitope may also be present in multiple sequences of precursor proteins. Although recent recommendations for prediction of allergenicity and cross-reactivity include not only sequence identity, but also similarities in secondary and tertiary structures surrounding the common fragment, local sequence identity may be used to screen protein sequence databases for potential allergens in silico. The main weakness of the screening process is that it overlooks allergens and cross-reactivity cases without identical fragments corresponding to linear epitopes. A single peptide may also serve as a marker of a group of allergens that belong to the same family and, possibly, reveal cross-reactivity. This review article discusses the benefits for food scientists that follow from the common subsequences concept. PMID:26340620
Structure-related statistical singularities along protein sequences: a correlation study.

PubMed

Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

2005-01-01

A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages

PubMed Central

Li, Qian; Lin, Sen

2017-01-01

Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats. PMID:28800357
Identification of differentially expressed genes through RNA sequencing in goats (Capra hircus) at different postnatal stages.

PubMed

Lin, Yaqiu; Zhu, Jiangjiang; Wang, Yong; Li, Qian; Lin, Sen

2017-01-01

Intramuscular fat (IMF) content and fatty acid composition of longissimus dorsi muscle (LM) change with growth, which partially determines the flavor and nutritional value of goat (Capra hircus) meat. However, unlike cattle, little information is available on the transcriptome-wide changes during different postnatal stages in small ruminants, especially goats. In this study, the sequencing reads of goat LM tissues collected from kid, youth, and adult period were mapped to the goat genome. Results showed that out of total 24 689 Unigenes, 20 435 Unigenes were annotated. Based on expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM), 111 annotated differentially expressed genes (DEGs) were identified among different postnatal stages, which were subsequently assigned to 16 possible expression patterns by series-cluster analysis. Functional classification by Gene Ontology (GO) analysis was used for selecting the genes showing highest expression related to lipid metabolism. Finally, we identified the node genes for lipid metabolism regulation using co-expression analysis. In conclusion, these data may uncover candidate genes having functional roles in regulation of goat muscle development and lipid metabolism during the various growth stages in goats.
Vander Lugt correlation of DNA sequence data

NASA Astrophysics Data System (ADS)

Christens-Barry, William A.; Hawk, James F.; Martin, James C.

1990-12-01

DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.
Microfluidic droplet enrichment for targeted sequencing

PubMed Central

Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

2015-01-01

Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
Sequence analysis to assess labour market participation following vocational rehabilitation: an observational study among patients sick-listed with low back pain from a randomised clinical trial in Denmark.

PubMed

Lindholdt, Louise; Labriola, Merete; Nielsen, Claus Vinther; Horsbøl, Trine Allerslev; Lund, Thomas

2017-07-20

The return-to-work (RTW) process after long-term sickness absence is often complex and long and implies multiple shifts between different labour market states for the absentee. Standard methods for examining RTW research typically rely on the analysis of one outcome measure at a time, which will not capture the many possible states and transitions the absentee can go through. The purpose of this study was to explore the potential added value of sequence analysis in supplement to standard regression analysis of a multidisciplinary RTW intervention among patients with low back pain (LBP). The study population consisted of 160 patients randomly allocated to either a hospital-based brief or a multidisciplinary intervention. Data on labour market participation following intervention were obtained from a national register and analysed in two ways: as a binary outcome expressed as active or passive relief at a 1-year follow-up and as four different categories for labour market participation. Logistic regression and sequence analysis were performed. The logistic regression analysis showed no difference in labour market participation for patients in the two groups after 1 year. Applying sequence analysis showed differences in subsequent labour market participation after 2 years after baseline in favour of the brief intervention group versus the multidisciplinary intervention group. The study indicated that sequence analysis could provide added analytical value as a supplement to traditional regression analysis in prospective studies of RTW among patients with LBP. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Use of Genome Sequence Information for Meat Quality Trait QTL Mining for Causal Genes and Mutations on Pig Chromosome 17

PubMed Central

Hu, Zhi-Liang; Ramos, Antonio M.; Humphray, Sean J.; Rogers, Jane; Reecy, James M.; Rothschild, Max F.

2011-01-01

The newly available pig genome sequence has provided new information to fine map quantitative trait loci (QTL) in order to eventually identify causal variants. With targeted genomic sequencing efforts, we were able to obtain high quality BAC sequences that cover a region on pig chromosome 17 where a number of meat quality QTL have been previously discovered. Sequences from 70 BAC clones were assembled to form an 8-Mbp contig. Subsequently, we successfully mapped five previously identified QTL, three for meat color and two for lactate related traits, to the contig. With an additional 25 genetic markers that were identified by sequence comparison, we were able to carry out further linkage disequilibrium analysis to narrow down the genomic locations of these QTL, which allowed identification of the chromosomal regions that likely contain the causative variants. This research has provided one practical approach to combine genetic and molecular information for QTL mining. PMID:22303339
Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers.

PubMed

Grüning, Björn A; Rasche, Eric; Rebolledo-Jaramillo, Boris; Eberhard, Carl; Houwaart, Torsten; Chilton, John; Coraor, Nate; Backofen, Rolf; Taylor, James; Nekrutenko, Anton

2017-05-01

What does it take to convert a heap of sequencing data into a publishable result? First, common tools are employed to reduce primary data (sequencing reads) to a form suitable for further analyses (i.e., the list of variable sites). The subsequent exploratory stage is much more ad hoc and requires the development of custom scripts and pipelines, making it problematic for biomedical researchers. Here, we describe a hybrid platform combining common analysis pathways with the ability to explore data interactively. It aims to fully encompass and simplify the "raw data-to-publication" pathway and make it reproducible.
Full-genome sequence and analysis of a novel human rhinovirus strain within a divergent HRV-A clade.

PubMed

Rathe, Jennifer A; Liu, Xinyue; Tallon, Luke J; Gern, James E; Liggett, Stephen B

2010-01-01

Genome sequences of human rhinoviruses (HRV) have primarily been from stocks collected in the 1960s, with genomes and phylogeny of modern HRVs remaining undefined. Here, two modern isolates (hrv-A101 and hrv-A101-v1) collected approximately 8 years apart were sequenced in their entirety. Incorporation into our full-genome HRV alignment with subsequent phylogenetic network inference indicated that these represent a unique HRV-A, localized within a distinct divergent clade. They appear to have resulted from recombination of the hrv-65 and hrv-78 lineages. These results support our contention that there are unrecognized distinct HRV-A strains, and that recombination is evident in currently circulating strains.
Nuclear counterparts of the cytoplasmic mitochondrial 12S rRNA gene: a problem of ancient DNA and molecular phylogenies.

PubMed

van der Kuyl, A C; Kuiken, C L; Dekker, J T; Perizonius, W R; Goudsmit, J

1995-06-01

Monkey mummy bones and teeth originating from the North Saqqara Baboon Galleries (Egypt), soft tissue from a mummified baboon in a museum collection, and nineteenth/twentieth-century skin fragments from mangabeys were used for DNA extraction and PCR amplification of part of the mitochondrial 12S rRNA gene. Sequences aligning with the 12S rRNA gene were recovered but were only distantly related to contemporary monkey mitochondrial 12S rRNA sequences. However, many of these sequences were identical or closely related to human nuclear DNA sequences resembling mitochondrial 12S rRNA (isolated from a cell line depleted in mitochondria) and therefore have to be considered contamination. Subsequently in a separate study we were able to recover genuine mitochondrial 12S rRNA sequences from many extant species of nonhuman Old World primates and sequences closely resembling the human nuclear integrations. Analysis of all sequences by the neighbor-joining (NJ) method indicated that mitochondrial DNA sequences and their nuclear counterparts can be divided into two distinct clusters. One cluster contained all temporary cytoplasmic mitochondrial DNA sequences and approximately half of the monkey nuclear mitochondriallike sequences. A second cluster contained most human nuclear sequences and the other half of monkey nuclear sequences with a separate branch leading to human and gorilla mitochondrial and nuclear sequences. Sequences recovered from ancient materials were equally divided between the two clusters. These results constitute a warning for when working with ancient DNA or performing phylogenetic analysis using mitochondrial DNA as a target sequence: Nuclear counterparts of mitochondrial genes may lead to faulty interpretation of results.
Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.

PubMed

Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo

2007-01-01

The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.
[EST-SSR identification, markers development of Ligusticum chuanxiong based on Ligusticum chuanxiong transcriptome sequences].

PubMed

Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao

2017-09-01

Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.
High-resolution biophysical analysis of the dynamics of nucleosome formation

PubMed Central

Hatakeyama, Akiko; Hartmann, Brigitte; Travers, Andrew; Nogues, Claude; Buckle, Malcolm

2016-01-01

We describe a biophysical approach that enables changes in the structure of DNA to be followed during nucleosome formation in in vitro reconstitution with either the canonical “Widom” sequence or a judiciously mutated sequence. The rapid non-perturbing photochemical analysis presented here provides ‘snapshots’ of the DNA configuration at any given moment in time during nucleosome formation under a very broad range of reaction conditions. Changes in DNA photochemical reactivity upon protein binding are interpreted as being mainly induced by alterations in individual base pair roll angles. The results strengthen the importance of the role of an initial (H3/H4)2 histone tetramer-DNA interaction and highlight the modulation of this early event by the DNA sequence. (H3/H4)2 binding precedes and dictates subsequent H2A/H2B-DNA interactions, which are less affected by the DNA sequence, leading to the final octameric nucleosome. Overall, our results provide a novel, exciting way to investigate those biophysical properties of DNA that constitute a crucial component in nucleosome formation and stabilization. PMID:27263658
A Characterization of Banach Spaces Containing l1

PubMed Central

Rosenthal, Haskell P.

1974-01-01

It is proved that a Banach space contains a subspace isomorphic to l1 if (and only if) it has a bounded sequence with no weak-Cauchy subsequence. The proof yields that a sequence of subsets of a given set has a subsequence that is either convergent or Boolean independent. PMID:16592162
[Completed sequences analysis on the Chinese attenuated yellow fever 17D vaccine strain and the WHO standard yellow fever 17D vaccine strain].

PubMed

Li, Jing; Yu, Yong-Xin; Dong, Guan-Mu

2009-04-01

To compare the molecular characteristics of the Chinese attenuated yellow fever 17D vaccine strain and the WHO reference yellow fever 17D vaccine strain. The primers were designed according to the published nucleotide sequences of YFV 17D strains in GenBank. Total RNA of was extracted by the Trizol and reverse transcripted. The each fragments of the YFV genome were amplified by PCR and sequenced subsequently. The fragments of the 5' and 3' end of the two strains were cloned into the pGEM T-easy vector and then sequenced. The nucleotide acid and amino acid sequences of the homology to both strains were 99% with each other. No obvious nulceotide changes were found in the sequences of the entire genome of each 17D strains. Moreover, there was no obvious changes in the E protein genes. But the E173 of YF17D Tiantan, associted with the virulence, had mutantions. And the two live attenuated yellow fever 17D vaccine strains fell to the same lineage by the phylogenetic analysis. The results indicated that the two attenuated yellow fever 17D vaccine viruses accumulates mutations at a very low frequency and the genomes were relative stable.
HLA genotyping by next-generation sequencing of complementary DNA.

PubMed

Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

2017-11-28

Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.
Isolation and characterization of full-length putative alcohol dehydrogenase genes from polygonum minus

NASA Astrophysics Data System (ADS)

Hamid, Nur Athirah Abd; Ismail, Ismanizan

2013-11-01

Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Hydraulic fracturing and the Crooked Lake Sequences: Insights gleaned from regional seismic networks

NASA Astrophysics Data System (ADS)

Schultz, Ryan; Stern, Virginia; Novakovic, Mark; Atkinson, Gail; Gu, Yu Jeffrey

2015-04-01

Within central Alberta, Canada, a new sequence of earthquakes has been recognized as of 1 December 2013 in a region of previous seismic quiescence near Crooked Lake, ~30 km west of the town of Fox Creek. We utilize a cross-correlation detection algorithm to detect more than 160 events to the end of 2014, which is temporally distinguished into five subsequences. This observation is corroborated by the uniqueness of waveforms clustered by subsequence. The Crooked Lake Sequences have come under scrutiny due to its strong temporal correlation (>99.99%) to the timing of hydraulic fracturing operations in the Duvernay Formation. We assert that individual subsequences are related to fracturing stimulation and, despite adverse initial station geometry, double-difference techniques allow us to spatially relate each cluster back to a unique horizontal well. Overall, we find that seismicity in the Crooked Lake Sequences is consistent with first-order observations of hydraulic fracturing induced seismicity.
[Molecular and prenatal diagnosis of a family with Fanconi anemia by next generation sequencing].

PubMed

Gong, Zhuwen; Yu, Yongguo; Zhang, Qigang; Gu, Xuefan

2015-04-01

To provide prenatal diagnosis for a pregnant woman who had given birth to a child with Fanconi anemia with combined next-generation sequencing (NGS) and Sanger sequencing. For the affected child, potential mutations of the FANCA gene were analyzed with NGS. Suspected mutation was verified with Sanger sequencing. For prenatal diagnosis, genomic DNA was extracted from cultured fetal amniotic fluid cells and subjected to analysis of the same mutations. A low-frequency frameshifting mutation c.989_995del7 (p.H330LfsX2, inherited from his father) and a truncating mutation c.3971C>T (p.P1324L, inherited from his mother) have been identified in the affected child and considered to be pathogenic. The two mutations were subsequently verified by Sanger sequencing. Upon prenatal diagnosis, the fetus was found to carry two mutations. The combined next-generation sequencing and Sanger sequencing can reduce the time for diagnosis and identify subtypes of Fanconi anemia and the mutational sites, which has enabled reliable prenatal diagnosis of this disease.
RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

PubMed

Brule, C E; Dean, K M; Grayhack, E J

2016-01-01

The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.
Norrie disease: first mutation report and prenatal diagnosis in an Indian family.

PubMed

Ghosh, Manju; Sharma, Shipra; Shastri, Shivaram; Arora, Sadhna; Shukla, Rashmi; Gupta, Neerja; Deka, Deepika; Kabra, Madhulika

2012-11-01

Norrie Disease (ND) is a rare X-linked recessive disorder characterised by congenital blindness due to severe retinal dysgenesis. Hearing loss and intellectual disability is present in 30-50 % cases. ND is caused by mutations in the NDP gene, located at Xp11.3. The authors describe mutation analysis of a proband with ND and subsequently prenatal diagnosis. Sequence analysis of the NDP gene revealed a hemizygous missense mutation arginine to serine in codon 41 (p.Arg41Ser) in the affected child. Mother was carrier for the mutation. In a subsequent di-chorionic di-amniotic pregnancy, the authors performed prenatal diagnosis by mutation analysis on chorionic villi sample at 11 wk of gestation. The fetuses were unaffected. This is a first mutation report and prenatal diagnosis of a familial case of Norrie disease from India. The importance of genetic testing of Norrie disease for confirmation, carrier testing, prenatal diagnosis and genetic counseling is emphasized.

Scanning sequences after Gibbs sampling to find multiple occurrences of functional elements

PubMed Central

Tharakaraman, Kannan; Mariño-Ramírez, Leonardo; Sheetlin, Sergey L; Landsman, David; Spouge, John L

2006-01-01

Background Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set. Results We describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM). The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence. Conclusion Datasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances. PMID:16961919
The spectrum of genomic signatures: from dinucleotides to chaos game representation.

PubMed

Wang, Yingwei; Hill, Kathleen; Singh, Shiva; Kari, Lila

2005-02-14

In the post genomic era, access to complete genome sequence data for numerous diverse species has opened multiple avenues for examining and comparing primary DNA sequence organization of entire genomes. Previously, the concept of a genomic signature was introduced with the observation of species-type specific Dinucleotide Relative Abundance Profiles (DRAPs); dinucleotides were identified as the subsequences with the greatest bias in representation in a majority of genomes. Herein, we demonstrate that DRAP is one particular genomic signature contained within a broader spectrum of signatures. Within this spectrum, an alternative genomic signature, Chaos Game Representation (CGR), provides a unique visualization of patterns in sequence organization. A genomic signature is associated with a particular integer order or subsequence length that represents a measure of the resolution or granularity in the analysis of primary DNA sequence organization. We quantitatively explore the organizational information provided by genomic signatures of different orders through different distance measures, including a novel Image Distance. The Image Distance and other existing distance measures are evaluated by comparing the phylogenetic trees they generate for 26 complete mitochondrial genomes from a diversity of species. The phylogenetic tree generated by the Image Distance is compatible with the known relatedness of species. Quantitative evaluation of the spectrum of genomic signatures may be used to ultimately gain insight into the determinants and biological relevance of the genome signatures.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

PubMed

Schäffer, Alejandro A; Nawrocki, Eric P; Choi, Yoon; Kitts, Paul A; Karsch-Mizrachi, Ilene; McVeigh, Richard

2018-03-01

Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches. A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions. Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. aschaffe@helix.nih.gov. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
Genetic heterogeneity in patients with Bartter syndrome type 1

PubMed Central

Sun, Mingran; Ning, Jing; Xu, Weihong; Zhang, Han; Zhao, Kaishu; Li, Wenfu; Li, Guiying; Li, Shibo

2017-01-01

Bartter syndrome (BS) type 1 is an autosomal recessive kidney disorder caused by loss-of-function mutations in the solute carrier family 12 member 1 (SLC12A1) gene. To date, 72 BS type 1 patients harboring SLC12A1 mutations have been documented. Of these 144 alleles studied, 68 different disease-causing mutations have been detected in 129 alleles, and no mutation was detected in the remaining 15 alleles. The mutation types included missense/nonsense mutations, splicing mutations and small insertions and deletions ranging from 1 to 4 nucleotides. A large deletion encompassing a whole exon in the SLC12A1 gene has not yet been reported. The current study initially identified an undocumented homozygous frameshift mutation (c.1833delT) by Sanger sequencing analysis of a single infant with BS type 1. However, in a subsequent analysis, the mutation was detected only in the father's DNA. Upon further investigation using a next-generation sequencing approach, a deletion in exons 14 and 15 in both the patient and patient's mother was detected. The deletion was subsequently confirmed by use of a long-range polymerase chain reaction and was determined to be 3.16 kb in size based on sequencing of the junction fragment. The results of the present study demonstrated that pathogenic variants of SLC12A1 are heterogeneous. Large deletions appear to serve an etiological role in BS type 1, and may be more prevalent than previously thought. PMID:28000888
Genetic heterogeneity in patients with Bartter syndrome type 1.

PubMed

Sun, Mingran; Ning, Jing; Xu, Weihong; Zhang, Han; Zhao, Kaishu; Li, Wenfu; Li, Guiying; Li, Shibo

2017-02-01

Bartter syndrome (BS) type 1 is an autosomal recessive kidney disorder caused by loss‑of‑function mutations in the solute carrier family 12 member 1 (SLC12A1) gene. To date, 72 BS type 1 patients harboring SLC12A1 mutations have been documented. Of these 144 alleles studied, 68 different disease‑causing mutations have been detected in 129 alleles, and no mutation was detected in the remaining 15 alleles. The mutation types included missense/nonsense mutations, splicing mutations and small insertions and deletions ranging from 1 to 4 nucleotides. A large deletion encompassing a whole exon in the SLC12A1 gene has not yet been reported. The current study initially identified an undocumented homozygous frameshift mutation (c.1833delT) by Sanger sequencing analysis of a single infant with BS type 1. However, in a subsequent analysis, the mutation was detected only in the father's DNA. Upon further investigation using a next‑generation sequencing approach, a deletion in exons 14 and 15 in both the patient and patient's mother was detected. The deletion was subsequently confirmed by use of a long‑range polymerase chain reaction and was determined to be 3.16 kb in size based on sequencing of the junction fragment. The results of the present study demonstrated that pathogenic variants of SLC12A1 are heterogeneous. Large deletions appear to serve an etiological role in BS type 1, and may be more prevalent than previously thought.
Droplet barcoding for single cell transcriptomics applied to embryonic stem cells

PubMed Central

Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W

2015-01-01

Summary It has long been the dream of biologists to map gene expression at the single cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after LIF withdrawal. The reproducibility of these high-throughput single cell data allowed us to deconstruct cell populations and infer gene expression relationships. PMID:26000487
SBLOCA outside containment at Browns Ferry Unit One: accident sequence analysis. [Small break

DOE Office of Scientific and Technical Information (OSTI.GOV)

Condon, W.A.; Harrington, R.M.; Greene, S.R.

1982-11-01

This study describes the predicted response of Unit 1 at the Browns Ferry Nuclear Plant to a postulated small-break loss-of-coolant accident outside of the primary containment. The break has been assumed to occur in the scram discharge volume piping immediately following a reactor scram that cannot be reset. The events before core uncovering are discussed for both the worst-case accident sequence without operator action and for the more likely sequences with operator action. Without operator action, the events after core uncovering would include core meltdown and subsequent containment failure, and this event sequence has been determined through use of themore » MARCH code. An estimate of the magnitude and timing of the concomitant release of the noble gas, cesium, and iodine-based fission products to the environment is provided in Volume 2 of this report.« less
Whole-Exome Sequencing to Identify Novel Biological Pathways Associated With Infertility After Pelvic Inflammatory Disease.

PubMed

Taylor, Brandie D; Zheng, Xiaojing; Darville, Toni; Zhong, Wujuan; Konganti, Kranti; Abiodun-Ojo, Olayinka; Ness, Roberta B; O'Connell, Catherine M; Haggerty, Catherine L

2017-01-01

Ideal management of sexually transmitted infections (STI) may require risk markers for pathology or vaccine development. Previously, we identified common genetic variants associated with chlamydial pelvic inflammatory disease (PID) and reduced fecundity. As this explains only a proportion of the long-term morbidity risk, we used whole-exome sequencing to identify biological pathways that may be associated with STI-related infertility. We obtained stored DNA from 43 non-Hispanic black women with PID from the PID Evaluation and Clinical Health Study. Infertility was assessed at a mean of 84 months. Principal component analysis revealed no population stratification. Potential covariates did not significantly differ between groups. Sequencing kernel association test was used to examine associations between aggregates of variants on a single gene and infertility. The results from the sequencing kernel association test were used to choose "focus genes" (P < 0.01; n = 150) for subsequent Ingenuity Pathway Analysis to identify "gene sets" that are enriched in biologically relevant pathways. Pathway analysis revealed that focus genes were enriched in canonical pathways including, IL-1 signaling, P2Y purinergic receptor signaling, and bone morphogenic protein signaling. Focus genes were enriched in pathways that impact innate and adaptive immunity, protein kinase A activity, cellular growth, and DNA repair. These may alter host resistance or immunopathology after infection. Targeted sequencing of biological pathways identified in this study may provide insight into STI-related infertility.
TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.

PubMed

Richard, François D; Kajava, Andrey V

2014-06-01

The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.
Implementation of Amplicon Parallel Sequencing Leads to Improvement of Diagnosis and Therapy of Lung Cancer Patients.

PubMed

König, Katharina; Peifer, Martin; Fassunke, Jana; Ihle, Michaela A; Künstlinger, Helen; Heydt, Carina; Stamm, Katrin; Ueckeroth, Frank; Vollbrecht, Claudia; Bos, Marc; Gardizi, Masyar; Scheffler, Matthias; Nogova, Lucia; Leenders, Frauke; Albus, Kerstin; Meder, Lydia; Becker, Kerstin; Florin, Alexandra; Rommerscheidt-Fuss, Ursula; Altmüller, Janine; Kloth, Michael; Nürnberg, Peter; Henkel, Thomas; Bikár, Sven-Ernö; Sos, Martin L; Geese, William J; Strauss, Lewis; Ko, Yon-Dschun; Gerigk, Ulrich; Odenthal, Margarete; Zander, Thomas; Wolf, Jürgen; Merkelbach-Bruse, Sabine; Buettner, Reinhard; Heukamp, Lukas C

2015-07-01

The Network Genomic Medicine Lung Cancer was set up to rapidly translate scientific advances into early clinical trials of targeted therapies in lung cancer performing molecular analyses of more than 3500 patients annually. Because sequential analysis of the relevant driver mutations on fixated samples is challenging in terms of workload, tissue availability, and cost, we established multiplex parallel sequencing in routine diagnostics. The aim was to analyze all therapeutically relevant mutations in lung cancer samples in a high-throughput fashion while significantly reducing turnaround time and amount of input DNA compared with conventional dideoxy sequencing of single polymerase chain reaction amplicons. In this study, we demonstrate the feasibility of a 102 amplicon multiplex polymerase chain reaction followed by sequencing on an Illumina sequencer on formalin-fixed paraffin-embedded tissue in routine diagnostics. Analysis of a validation cohort of 180 samples showed this approach to require significantly less input material and to be more reliable, robust, and cost-effective than conventional dideoxy sequencing. Subsequently, 2657 lung cancer patients were analyzed. We observed that comprehensive biomarker testing provided novel information in addition to histological diagnosis and clinical staging. In 2657 consecutively analyzed lung cancer samples, we identified driver mutations at the expected prevalence. Furthermore we found potentially targetable DDR2 mutations at a frequency of 3% in both adenocarcinomas and squamous cell carcinomas. Overall, our data demonstrate the utility of systematic sequencing analysis in a clinical routine setting and highlight the dramatic impact of such an approach on the availability of therapeutic strategies for the targeted treatment of individual cancer patients.
Interaction of healthcare worker hands and portable medical equipment: a sequence analysis to show potential transmission opportunities.

PubMed

Jinadatha, Chetan; Villamaria, Frank C; Coppin, John D; Dale, Charles R; Williams, Marjory D; Whitworth, Ryan; Stibich, Mark

2017-12-28

While research has demonstrated the importance of a clean health care environment, there is a lack of research on the role portable medical equipment (PME) play in the transmission cycle of healthcare-acquired infections (HAIs). This study investigated the patterns and sequence of contact events among health care workers, patients, surfaces, and medical equipment in a hospital environment. Research staff observed patient care events over six different 24 h periods on six different hospital units. Each encounter was recorded as a sequence of events and analyzed using sequence analysis and visually represented by network plots. In addition, a point prevalence microbial sample was taken from the computer on wheels (COW). The most touched items during patient care was the individual patient (850), bedrail (375), bed-surface (302), and bed side Table (223). Three of the top ten most common subsequences included touching PME and the patient: computer on wheels ➔ patient (62 of 274 total sequences, 22.6%, contained this sequence), patient ➔ COW (20.4%), and patient ➔ IV pump (16.1%). The network plots revealed large interconnectedness among objects in the room, the patient, PME, and the healthcare worker. Our results demonstrated that PME such as COW and IV pump were two of the most highly-touched items during patient care. Even with proper hand sanitization and personal protective equipment, this sequence analysis reveals the potential for contamination from the patient and environment, to a vector such as portable medical equipment, and ultimately to another patient in the hospital.
Mutation Analysis of SLC26A4 for Pendred Syndrome and Nonsyndromic Hearing Loss by High-Resolution Melting

PubMed Central

Chen, Neng; Tranebjærg, Lisbeth; Rendtorff, Nanna Dahl; Schrijver, Iris

2011-01-01

Pendred syndrome and DFNB4 (autosomal recessive nonsyndromic congenital deafness, locus 4) are associated with autosomal recessive congenital sensorineural hearing loss and mutations in the SLC26A4 gene. Extensive allelic heterogeneity, however, necessitates analysis of all exons and splice sites to identify mutations for individual patients. Although Sanger sequencing is the gold standard for mutation detection, screening methods supplemented with targeted sequencing can provide a cost-effective alternative. One such method, denaturing high-performance liquid chromatography, was developed for clinical mutation detection in SLC26A4. However, this method inherently cannot distinguish homozygous changes from wild-type sequences. High-resolution melting (HRM), on the other hand, can detect heterozygous and homozygous changes cost-effectively, without any post-PCR modifications. We developed a closed-tube HRM mutation detection method specific for SLC26A4 that can be used in the clinical diagnostic setting. Twenty-eight primer pairs were designed to cover all 21 SLC26A4 exons and splice junction sequences. Using the resulting amplicons, initial HRM analysis detected all 45 variants previously identified by sequencing. Subsequently, a 384-well plate format was designed for up to three patient samples per run. Blinded HRM testing on these plates of patient samples collected over 1 year in a clinical diagnostic laboratory accurately detected all variants identified by sequencing. In conclusion, HRM with targeted sequencing is a reliable, simple, and cost-effective method for SLC26A4 mutation screening and detection. PMID:21704276
A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

PubMed

Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

2015-01-01

Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.
High-performance liquid chromatography-mass spectrometry for mapping and sequencing glycosaminoglycan-derived oligosaccharides

PubMed Central

Volpi, Nicola; Linhardt, Robert J

2012-01-01

Glycosaminoglycans (GAGs) have proven to be very difficult to analyze and characterize because of their high negative charge density, polydispersity and sequence heterogeneity. As the specificity of the interactions between GAGs and proteins results from the structure of these polysaccharides, an understanding of GAG structure is essential for developing a structure–activity relationship. Electrospray ionization (ESI) mass spectrometry (MS) is particularly promising for the analysis of oligosaccharides chemically or enzymatically generated by GAGs because of its relatively soft ionization capacity. Furthermore, on-line high-performance liquid chromatography (HPLC)-MS greatly enhances the characterization of complex mixtures of GAG-derived oligosaccharides, providing important structural information and affording their disaccharide composition. A detailed protocol for producing oligosaccharides from various GAGs, using controlled, specific enzymatic or chemical depolymerization, is presented, together with their HPLC separation, using volatile reversed-phase ion-pairing reagents and on-line ESI-MS structural identification. This analysis provides an oligosaccharide map together with sequence information from a reading frame beginning at the nonreducing end of the GAG chains. The preparation of oligosaccharides can be carried out in 10 h, with subsequent HPLC analysis in 1–2 h and HPLC-MS analysis taking another 2 h. PMID:20448545
Compositional segmentation and complexity measurement in stock indices

NASA Astrophysics Data System (ADS)

Wang, Haifeng; Shang, Pengjian; Xia, Jianan

2016-01-01

In this paper, we introduce a complexity measure based on the entropic segmentation called sequence compositional complexity (SCC) into the analysis of financial time series. SCC was first used to deal directly with the complex heterogeneity in nonstationary DNA sequences. We already know that SCC was found to be higher in sequences with long-range correlation than those with low long-range correlation, especially in the DNA sequences. Now, we introduce this method into financial index data, subsequently, we find that the values of SCC of some mature stock indices, such as S & P 500 (simplified with S & P in the following) and HSI, are likely to be lower than the SCC value of Chinese index data (such as SSE). What is more, we find that, if we classify the indices with the method of SCC, the financial market of Hong Kong has more similarities with mature foreign markets than Chinese ones. So we believe that a good correspondence is found between the SCC of the index sequence and the complexity of the market involved.
Evolution and Diversity of the Human Hepatitis D Virus Genome

PubMed Central

Huang, Chi-Ruei; Lo, Szecheng J.

2010-01-01

Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work. PMID:20204073
Genomic Identification and Analysis of Shared Cis-regulator Elements in a Developmentally Critical homeobox Cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chris Amemiya

2003-04-01

The goals of this project were to isolate, characterize, and sequence the Dlx3/Dlx7 bigene cluster from twelve different species of mammals. The Dlx3 and Dlx7 genes are known to encode homeobox transcription factors involved in patterning of structures in the vertebrate jaw as well as vertebrate limbs. Genomic sequences from the respective taxa will subsequently be compared in order to identify conserved non-coding sequences that are potential cis-regulatory elements. Based on the comparisons they will fashion transgenic mouse experiments to functionally test the strength of the potential cis-regulatory elements. A goal of the project is to attempt to identify thosemore » elements that may function in coordinately regulating both Dlx3 and Dlx7 functions.« less
The golden ratio and Loshu-Fibonacci Diagram: novel research view on relationship of Chinese medicine and modern biology.

PubMed

Chen, Zhao-xue; Huang, Yun-kun; Sun, Ying

2014-02-01

Associating geometric arrangements of 9 Loshu numbers modulo 5, investigating property of golden rectangles and characteristics of Fibonacci sequence modulo 10 as well as the two subsequences of its modular sequence by modulo 5, the Loshu-Fibonacci Diagram is created based on strict logical deduction in this paper, which can disclose inherent relationship among Taiji sign, Loshu and Fibonacci sequence modulo 10 perfectly and unite such key ideas of holism, symmetry, holographic thought and yin-yang balance pursuit from Chinese medicine as a whole. Based on further analysis and reasoning, the authors discover that taking the golden ratio and Loshu-Fibonacci Diagram as a link, there is profound and universal association existing between researches of Chinese medicine and modern biology.
SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

2002-01-01

Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less

Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing

PubMed Central

Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.

2015-01-01

Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487
Next-Generation Sequence Analysis Reveals Transfer of Methicillin Resistance to a Methicillin-Susceptible Staphylococcus aureus Strain That Subsequently Caused a Methicillin-Resistant Staphylococcus aureus Outbreak: a Descriptive Study.

PubMed

Weterings, Veronica; Bosch, Thijs; Witteveen, Sandra; Landman, Fabian; Schouls, Leo; Kluytmans, Jan

2017-09-01

Resistance to methicillin in Staphylococcus aureus is caused primarily by the mecA gene, which is carried on a mobile genetic element, the staphylococcal cassette chromosome mec (SCC mec ). Horizontal transfer of this element is supposed to be an important factor in the emergence of new clones of methicillin-resistant Staphylococcus aureus (MRSA) but has been rarely observed in real time. In 2012, an outbreak occurred involving a health care worker (HCW) and three patients, all carrying a fusidic acid-resistant MRSA strain. The husband of the HCW was screened for MRSA carriage, but only a methicillin-susceptible S. aureus (MSSA) strain, which was also resistant to fusidic acid, was detected. Multiple-locus variable-number tandem-repeat analysis (MLVA) typing showed that both the MSSA and MRSA isolates were MT4053-MC0005. This finding led to the hypothesis that the MSSA strain acquired the SCC mec and subsequently caused an outbreak. To support this hypothesis, next-generation sequencing of the MSSA and MRSA isolates was performed. This study showed that the MSSA isolate clustered closely with the outbreak isolates based on whole-genome multilocus sequence typing and single-nucleotide polymorphism (SNP) analysis, with a genetic distance of 17 genes and 44 SNPs, respectively. Remarkably, there were relatively large differences in the mobile genetic elements in strains within and between individuals. The limited genetic distance between the MSSA and MRSA isolates in combination with a clear epidemiologic link supports the hypothesis that the MSSA isolate acquired a SCC mec and that the resulting MRSA strain caused an outbreak. Copyright © 2017 American Society for Microbiology.
Comment on "Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage".

PubMed

Nakagome, Shigeki; Mano, Shuhei; Hasegawa, Masami

2013-03-29

Based on nuclear and mitochondrial DNA, Hailer et al. (Reports, 20 April 2012, p. 344) suggested early divergence of polar bears from a common ancestor with brown bears and subsequent introgression. Our population genetic analysis that traces each of the genealogies in the independent nuclear loci does not support the evolutionary model proposed by the authors.
Development and integration of block operations for data invariant automation of digital preprocessing and analysis of biological and biomedical Raman spectra.

PubMed

Schulze, H Georg; Turner, Robin F B

2015-06-01

High-throughput information extraction from large numbers of Raman spectra is becoming an increasingly taxing problem due to the proliferation of new applications enabled using advances in instrumentation. Fortunately, in many of these applications, the entire process can be automated, yielding reproducibly good results with significant time and cost savings. Information extraction consists of two stages, preprocessing and analysis. We focus here on the preprocessing stage, which typically involves several steps, such as calibration, background subtraction, baseline flattening, artifact removal, smoothing, and so on, before the resulting spectra can be further analyzed. Because the results of some of these steps can affect the performance of subsequent ones, attention must be given to the sequencing of steps, the compatibility of these sequences, and the propensity of each step to generate spectral distortions. We outline here important considerations to effect full automation of Raman spectral preprocessing: what is considered full automation; putative general principles to effect full automation; the proper sequencing of processing and analysis steps; conflicts and circularities arising from sequencing; and the need for, and approaches to, preprocessing quality control. These considerations are discussed and illustrated with biological and biomedical examples reflecting both successful and faulty preprocessing.
Hand gesture recognition by analysis of codons

NASA Astrophysics Data System (ADS)

Ramachandra, Poornima; Shrikhande, Neelima

2007-09-01

The problem of recognizing gestures from images using computers can be approached by closely understanding how the human brain tackles it. A full fledged gesture recognition system will substitute mouse and keyboards completely. Humans can recognize most gestures by looking at the characteristic external shape or the silhouette of the fingers. Many previous techniques to recognize gestures dealt with motion and geometric features of hands. In this thesis gestures are recognized by the Codon-list pattern extracted from the object contour. All edges of an image are described in terms of sequence of Codons. The Codons are defined in terms of the relationship between maxima, minima and zeros of curvature encountered as one traverses the boundary of the object. We have concentrated on a catalog of 24 gesture images from the American Sign Language alphabet (Letter J and Z are ignored as they are represented using motion) [2]. The query image given as an input to the system is analyzed and tested against the Codon-lists, which are shape descriptors for external parts of a hand gesture. We have used the Weighted Frequency Indexing Transform (WFIT) approach which is used in DNA sequence matching for matching the Codon-lists. The matching algorithm consists of two steps: 1) the query sequences are converted to short sequences and are assigned weights and, 2) all the sequences of query gestures are pruned into match and mismatch subsequences by the frequency indexing tree based on the weights of the subsequences. The Codon sequences with the most weight are used to determine the most precise match. Once a match is found, the identified gesture and corresponding interpretation are shown as output.
Molecular characterization of canine parvovirus (CPV) infection in dogs in Turkey.

PubMed

Timurkan, Mehmet; Oğuzoğlu, Tuba

2015-01-01

This study provides data about canine parvovirus (CPV) types circulating among dogs in Turkey. Sixty-five samples from dogs with and without clinical signs of parvovirus infection were collected between April 2009 and February 2010. The samples were subsequently tested for CPV using polymerase chain reaction (PCR). Twenty-five samples (38.4%) were positive; when positive samples were characterized by sequence analysis, results showed that both CPV-2a (17/25, 68%) and CPV-2b (8/25, 32%) strains are circulating among domestic dogs in Turkey. This is the first molecular characterization study of CPVs from dogs based on partial VP2 gene sequences in Turkey.
A Simulation Based Approach for Contingency Planning for Aircraft Turnaround Operation System Activities in Airline Hubs

NASA Technical Reports Server (NTRS)

Adeleye, Sanya; Chung, Christopher

2006-01-01

Commercial aircraft undergo a significant number of maintenance and logistical activities during the turnaround operation at the departure gate. By analyzing the sequencing of these activities, more effective turnaround contingency plans may be developed for logistical and maintenance disruptions. Turnaround contingency plans are particularly important as any kind of delay in a hub based system may cascade into further delays with subsequent connections. The contingency sequencing of the maintenance and logistical turnaround activities were analyzed using a combined network and computer simulation modeling approach. Experimental analysis of both current and alternative policies provides a framework to aid in more effective tactical decision making.
A Sequence-Independent Strategy for Detection and Cloning of Circular DNA Virus Genomes by Using Multiply Primed Rolling-Circle Amplification

PubMed Central

Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc

2004-01-01

The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879
Detection of rat hepatitis E virus in wild Norway rats (Rattus norvegicus) and Black rats (Rattus rattus) from 11 European countries.

PubMed

Ryll, René; Bernstein, Samuel; Heuser, Elisa; Schlegel, Mathias; Dremsek, Paul; Zumpe, Maxi; Wolf, Sandro; Pépin, Michel; Bajomi, Daniel; Müller, Gabi; Heiberg, Ann-Charlotte; Spahr, Carina; Lang, Johannes; Groschup, Martin H; Ansorge, Hermann; Freise, Jona; Guenther, Sebastian; Baert, Kristof; Ruiz-Fons, Francisco; Pikula, Jiri; Knap, Nataša; Tsakmakidis, Ιoannis; Dovas, Chrysostomos; Zanet, Stefania; Imholt, Christian; Heckel, Gerald; Johne, Reimar; Ulrich, Rainer G

2017-09-01

Rat hepatitis E virus (HEV) is genetically only distantly related to hepeviruses found in other mammalian reservoirs and in humans. It was initially detected in Norway rats (Rattus norvegicus) from Germany, and subsequently in rats from Vietnam, the USA, Indonesia, China, Denmark and France. Here, we report on a molecular survey of Norway rats and Black rats (Rattus rattus) from 12 European countries for ratHEV and human pathogenic hepeviruses. RatHEV-specific real-time and conventional RT-PCR investigations revealed the presence of ratHEV in 63 of 508 (12.4%) rats at the majority of sites in 11 of 12 countries. In contrast, a real-time RT-PCR specific for human pathogenic HEV genotypes 1-4 and a nested broad-spectrum (NBS) RT-PCR with subsequent sequence determination did not detect any infections with these genotypes. Only in a single Norway rat from Belgium a rabbit HEV-like genotype 3 sequence was detected. Phylogenetic analysis indicated a clustering of all other novel Norway and Black rat-derived sequences with ratHEV sequences from Europe, the USA and a Black rat-derived sequence from Indonesia within the proposed ratHEV genotype 1. No difference in infection status was detected related to age, sex, rat species or density of human settlements and zoological gardens. In conclusion, our investigation shows a broad geographical distribution of ratHEV in Norway and Black rats from Europe and its presence in all settlement types investigated. Copyright © 2017 Elsevier B.V. All rights reserved.
Draft Sequences of the Radish (Raphanus sativus L.) Genome

PubMed Central

Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

2014-01-01

Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699
Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17

PubMed Central

Hart, Elizabeth A; Caccamo, Mario; Harrow, Jennifer L; Humphray, Sean J; Gilbert, James GR; Trevanion, Steve; Hubbard, Tim; Rogers, Jane; Rothschild, Max F

2007-01-01

Background We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage. Results Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs. Conclusion We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS. PMID:17705864
Contribution of silent mutations to thermal adaptation of RNA bacteriophage Qβ.

PubMed

Kashiwagi, Akiko; Sugawara, Ryu; Sano Tsushima, Fumie; Kumagai, Tomofumi; Yomo, Tetsuya

2014-10-01

Changes in protein function and other biological properties, such as RNA structure, are crucial for adaptation of organisms to novel or inhibitory environments. To investigate how mutations that do not alter amino acid sequence may be positively selected, we performed a thermal adaptation experiment using the single-stranded RNA bacteriophage Qβ in which the culture temperature was increased from 37.2°C to 41.2°C and finally to an inhibitory temperature of 43.6°C in a stepwise manner in three independent lines. Whole-genome analysis revealed 31 mutations, including 14 mutations that did not result in amino acid sequence alterations, in this thermal adaptation. Eight of the 31 mutations were observed in all three lines. Reconstruction and fitness analyses of Qβ strains containing only mutations observed in all three lines indicated that five mutations that did not result in amino acid sequence changes but increased the amplification ratio appeared in the course of adaptation to growth at 41.2°C. Moreover, these mutations provided a suitable genetic background for subsequent mutations, altering the fitness contribution from deleterious to beneficial. These results clearly showed that mutations that do not alter the amino acid sequence play important roles in adaptation of this single-stranded RNA virus to elevated temperature. Recent studies using whole-genome analysis technology suggested the importance of mutations that do not alter the amino acid sequence for adaptation of organisms to novel environmental conditions. It is necessary to investigate how these mutations may be positively selected and to determine to what degree such mutations that do not alter amino acid sequences contribute to adaptive evolution. Here, we report the roles of these silent mutations in thermal adaptation of RNA bacteriophage Qβ based on experimental evolution during which Qβ showed adaptation to growth at an inhibitory temperature. Intriguingly, four synonymous mutations and one mutation in the untranslated region that spread widely in the Qβ population during the adaptation process at moderately high temperature provided a suitable genetic background to alter the fitness contribution of subsequent mutations from deleterious to beneficial at a higher temperature. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Noninvasive Prenatal Testing and Incidental Detection of Occult Maternal Malignancies.

PubMed

Bianchi, Diana W; Chudova, Darya; Sehnert, Amy J; Bhatt, Sucheta; Murray, Kathryn; Prosen, Tracy L; Garber, Judy E; Wilkins-Haug, Louise; Vora, Neeta L; Warsof, Stephen; Goldberg, James; Ziainia, Tina; Halks-Miller, Meredith

2015-07-14

Understanding the relationship between aneuploidy detection on noninvasive prenatal testing (NIPT) and occult maternal malignancies may explain results that are discordant with the fetal karyotype and improve maternal clinical care. To evaluate massively parallel sequencing data for patterns of copy-number variations that might prospectively identify occult maternal malignancies. Case series identified from 125,426 samples submitted between February 15, 2012, and September 30, 2014, from asymptomatic pregnant women who underwent plasma cell-free DNA sequencing for clinical prenatal aneuploidy screening. Analyses were conducted in a clinical laboratory that performs DNA sequencing. Among the clinical samples, abnormal results were detected in 3757 (3%); these were reported to the ordering physician with recommendations for further evaluation. NIPT for fetal aneuploidy screening (chromosomes 13, 18, 21, X, and Y). Detailed genome-wide bioinformatics analysis was performed on available sequencing data from 8 of 10 women with known cancers. Genome-wide copy-number changes in the original NIPT samples and in subsequent serial samples from individual patients when available are reported. Copy-number changes detected in NIPT sequencing data in the known cancer cases were compared with the types of aneuploidies detected in the overall cohort. From a cohort of 125,426 NIPT results, 3757 (3%) were positive for 1 or more aneuploidies involving chromosomes 13, 18, 21, X, or Y. From this set of 3757 samples, 10 cases of maternal cancer were identified. Detailed clinical and sequencing data were obtained in 8. Maternal cancers most frequently occurred with the rare NIPT finding of more than 1 aneuploidy detected (7 known cancers among 39 cases of multiple aneuploidies by NIPT, 18% [95% CI, 7.5%-33.5%]). All 8 cases that underwent further bioinformatics analysis showed unique patterns of nonspecific copy-number gains and losses across multiple chromosomes. In 1 case, blood was sampled after completion of treatment for colorectal cancer and the abnormal pattern was no longer evident. In this preliminary study, a small number of cases of occult malignancy were subsequently diagnosed among pregnant women whose noninvasive prenatal testing results showed discordance with the fetal karyotype. The clinical importance of these findings will require further research.
Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

PubMed

Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

2016-05-01

The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Prediction of the translocon-mediated membrane insertion free energies of protein sequences.

PubMed

Park, Yungki; Helms, Volkhard

2008-05-15

Helical membrane proteins (HMPs) play crucial roles in a variety of cellular processes. Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. This process of membrane insertion is mediated by the translocon complex. Thus, it is of great interest to develop computational methods for predicting the translocon-mediated membrane insertion free energies of protein sequences. We have developed Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark test gives a correlation coefficient of 0.74 between predicted and observed free energies for 357 known cases, which corresponds to a mean unsigned error of 0.41 kcal/mol. These results are significantly better than those obtained by traditional hydropathy analysis. Moreover, the ability of MINS to reasonably predict membrane insertion free energies of protein sequences allows for effective identification of transmembrane (TM) segments. Subsequently, MINS was applied to predict the membrane insertion free energies of 316 TM segments found in known structures. An in-depth analysis of the predicted free energies reveals a number of interesting findings about the biogenesis and structural stability of HMPs. A web server for MINS is available at http://service.bioinformatik.uni-saarland.de/mins
A Single Early Introduction of HIV-1 Subtype B into Central America Accounts for Most Current Cases

PubMed Central

Murillo, Wendy; Veras, Nazle; Prosperi, Mattia; de Rivera, Ivette Lorenzana; Paz-Bailey, Gabriela; Morales-Miranda, Sonia; Juarez, Sandra I.; Yang, Chunfu; DeVos, Joshua; Marín, José Pablo; Mild, Mattias; Albert, Jan

2013-01-01

Human immunodeficiency virus type 1 (HIV-1) variants show considerable geographical separation across the world, but there is limited information from Central America. We provide the first detailed investigation of the genetic diversity and molecular epidemiology of HIV-1 in six Central American countries. Phylogenetic analysis was performed on 625 HIV-1 pol gene sequences collected between 2002 and 2010 in Honduras, El Salvador, Nicaragua, Costa Rica, Panama, and Belize. Published sequences from neighboring countries (n = 57) and the rest of the world (n = 740) were included as controls. Maximum likelihood methods were used to explore phylogenetic relationships. Bayesian coalescence-based methods were used to time HIV-1 introductions. Nearly all (98.9%) Central American sequences were of subtype B. Phylogenetic analysis revealed that 437 (70%) sequences clustered within five significantly supported monophyletic clades formed essentially by Central American sequences. One clade contained 386 (62%) sequences from all six countries; the other four clades were smaller and more country specific, suggesting discrete subepidemics. The existence of one large well-supported Central American clade provides evidence that a single introduction of HIV-1 subtype B in Central America accounts for most current cases. An introduction during the early phase of the HIV-1 pandemic may explain its epidemiological success. Moreover, the smaller clades suggest a subsequent regional spread related to specific transmission networks within each country. PMID:23616665
CDR3 analysis of TCR Vβ repertoire of CD8⁺ T cells from chickens infected with Eimeria maxima.

PubMed

Ren, Chao; Yin, Guangwen; Qin, Mei; Suo, Jingxia; Lv, Qiyao; Xie, Li; Wang, Yunzhou; Huang, Xiaoxi; Chen, Yuchen; Liu, Xianyong; Suo, Xun

2014-08-01

CD8(+) T cells play a major role in the immune protection of host against the reinfection of Eimeria maxima, the most immunogenic species of eimerian parasites in chickens. To explore the dominant complementarity-determining regions 3 (CDR3) of CD8(+) T cell populations induced by the infection of this parasite, sequence analysis was performed in this study for CDR3 of CD8(+) T cells from E. maxima infected chickens. After 5 days post the third or forth infection, intraepithelial lymphocytes were isolated from the jejunum of bird. CD3(+)CD8(+) T cells were sorted and subjected to total RNA isolation and cDNA preparation. PCR amplification and cloning of the loci between Vβ1 and Cβ was conducted for the subsequent sequencing of CDR3 of T cell receptor (TCR). After the forth infection, 2 birds exhibited two same frequent TCR CDR3 sequences, i.e., AKQDWGTGGYSNMI and AGRVLNIQY; while the third bird showed two different frequent TCR CDR3 sequences, AKQGARGHTPLN and AKQDIEVRGPNTPLN. No frequent CDR3 sequence was detected from uninfected birds, though AGRVLNIQY was also found in two uninfected birds. Our result preliminarily demonstrates that frequent CDR3 sequences may exist in E. maxima immunized chickens, encouraging the mining of the immunodominant CD8(+) T cells against E. maxima infection. Copyright © 2014 Elsevier Inc. All rights reserved.
Rapid phylogenetic dissection of prokaryotic community structure in tidal flat using pyrosequencing.

PubMed

Kim, Bong-Soo; Kim, Byung Kwon; Lee, Jae-Hak; Kim, Myungjin; Lim, Young Woon; Chun, Jongsik

2008-08-01

Dissection of prokaryotic community structure is prerequisite to understand their ecological roles. Various methods are available for such a purpose which amplification and sequencing of 16S rRNA genes gained its popularity. However, conventional methods based on Sanger sequencing technique require cloning process prior to sequencing, and are expensive and labor-intensive. We investigated prokaryotic community structure in tidal flat sediments, Korea, using pyrosequencing and a subsequent automated bioinformatic pipeline for the rapid and accurate taxonomic assignment of each amplicon. The combination of pyrosequencing and bioinformatic analysis showed that bacterial and archaeal communities were more diverse than previously reported in clone library studies. Pyrosequencing analysis revealed 21 bacterial divisions and 37 candidate divisions. Proteobacteria was the most abundant division in the bacterial community, of which Gamma-and Delta-Proteobacteria were the most abundant. Similarly, 4 archaeal divisions were found in tidal flat sediments. Euryarchaeota was the most abundant division in the archaeal sequences, which were further divided into 8 classes and 11 unclassified euryarchaeota groups. The system developed here provides a simple, in-depth and automated way of dissecting a prokaryotic community structure without extensive pretreatment such as cloning.
Defining objective clusters for rabies virus sequences using affinity propagation clustering

PubMed Central

Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

2018-01-01

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361
Intestinal flora of FAP patients containing APC-like sequences.

PubMed

Hainova, K; Adamcikova, Z; Ciernikova, S; Stevurkova, V; Tyciakova, S; Zajac, V

2014-01-01

Colorectal cancer mortality is one of the most common cause of cancer-related mortality. A multiple risk factors are associated with colorectal cancer, including hereditary, enviromental and inflammatory syndromes affecting the gastrointestinal tract. Familial adenomatous polyposis (FAP) is characterized by the emergence of hundreds to thousands of colorectal adenomatous polyps and FAP syndrome is caused by mutations within the adenomatous polyposis coli (APC) tumor suppressor gene. We analyzed 21 rectal bacterial subclones isolated from FAP patient 41-1 with confirmed 5bp ACAAA deletion within codons 1060-1063 for the presence of APC-like sequences in longest exon 15. The studied section was defined by primers 15Efor-15Erev, what correlates with mutation cluster region (MCR) in which the 75% of all APC germline mutations were detected. More than 90% homology was showed by sequencing and subsequent software comparison. The expression of APC-like sequences was demostrated by Western blot analysis using monoclonal and polyclonal antibodies against APC protein. To study missing link between the DNA analysis (PCR, DNA sequencing) and protein expresion experiments (Western blotting) we analyzed bacterial transcripts containing the 15Efor-15Erev sequence of APC gene by reverse transcription-PCR, what indicated that an APC gene derived fragment may be produced. We observed 97-100 % homology after computer comparison of cDNA PCR products. Our results suggest that presence of APC-like sequences in intestinal/rectal bacteria is enrichment of bacterial genetic information in which horizontal gene transfer between humans and microflora play an important role.

Object-oriented parsing of biological databases with Python.

PubMed

Ramu, C; Gemünd, C; Gibson, T J

2000-07-01

While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way. We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for later analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.
Biallelic Mutations in NBAS Cause Recurrent Acute Liver Failure with Onset in Infancy.

PubMed

Haack, Tobias B; Staufner, Christian; Köpke, Marlies G; Straub, Beate K; Kölker, Stefan; Thiel, Christian; Freisinger, Peter; Baric, Ivo; McKiernan, Patrick J; Dikow, Nicola; Harting, Inga; Beisse, Flemming; Burgard, Peter; Kotzaeridou, Urania; Kühr, Joachim; Himbert, Urban; Taylor, Robert W; Distelmaier, Felix; Vockley, Jerry; Ghaloul-Gonzalez, Lina; Zschocke, Johannes; Kremer, Laura S; Graf, Elisabeth; Schwarzmayr, Thomas; Bader, Daniel M; Gagneur, Julien; Wieland, Thomas; Terrile, Caterina; Strom, Tim M; Meitinger, Thomas; Hoffmann, Georg F; Prokisch, Holger

2015-07-02

Acute liver failure (ALF) in infancy and childhood is a life-threatening emergency. Few conditions are known to cause recurrent acute liver failure (RALF), and in about 50% of cases, the underlying molecular cause remains unresolved. Exome sequencing in five unrelated individuals with fever-dependent RALF revealed biallelic mutations in NBAS. Subsequent Sanger sequencing of NBAS in 15 additional unrelated individuals with RALF or ALF identified compound heterozygous mutations in an additional six individuals from five families. Immunoblot analysis of mutant fibroblasts showed reduced protein levels of NBAS and its proposed interaction partner p31, both involved in retrograde transport between endoplasmic reticulum and Golgi. We recommend NBAS analysis in individuals with acute infantile liver failure, especially if triggered by fever. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

PubMed

Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

2016-01-01

Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.
Re-examination of population structure and phylogeography of hawksbill turtles in the wider Caribbean using longer mtDNA sequences.

PubMed

Leroux, Robin A; Dutton, Peter H; Abreu-Grobois, F Alberto; Lagueux, Cynthia J; Campbell, Cathi L; Delcroix, Eric; Chevalier, Johan; Horrocks, Julia A; Hillis-Starr, Zandy; Troëng, Sebastian; Harrison, Emma; Stapleton, Seth

2012-01-01

Management of the critically endangered hawksbill turtle in the Wider Caribbean (WC) has been hampered by knowledge gaps regarding stock structure. We carried out a comprehensive stock structure re-assessment of 11 WC hawksbill rookeries using longer mtDNA sequences, larger sample sizes (N = 647), and additional rookeries compared to previous surveys. Additional variation detected by 740 bp sequences between populations allowed us to differentiate populations such as Barbados-Windward and Guadeloupe (F (st) = 0.683, P < 0.05) that appeared genetically indistinguishable based on shorter 380 bp sequences. POWSIM analysis showed that longer sequences improved power to detect population structure and that when N < 30, increasing the variation detected was as effective in increasing power as increasing sample size. Geographic patterns of genetic variation suggest a model of periodic long-distance colonization coupled with region-wide dispersal and subsequent secondary contact within the WC. Mismatch analysis results for individual clades suggest a general population expansion in the WC following a historic bottleneck about 100 000-300 000 years ago. We estimated an effective female population size (N (ef)) of 6000-9000 for the WC, similar to the current estimated numbers of breeding females, highlighting the importance of these regional rookeries to maintaining genetic diversity in hawksbills. Our results provide a basis for standardizing future work to 740 bp sequence reads and establish a more complete baseline for determining stock boundaries in this migratory marine species. Finally, our findings illustrate the value of maintaining an archive of specimens for re-analysis as new markers become available.
Whole Exome Analysis of Early Onset Alzheimer’s Disease

DTIC Science & Technology

2013-04-01

FTD), FTD with Parkinsonism , and early-onset Alzheimer Disease (EOAD)-like presentations. Using whole exome capture with subsequent sequencing, we...dementia. The MAPT R406W mutation is associated with EOAD-like symptoms and Parkinsonism without FTD, as well as distinct cognitive courses. KEY...OUTCOMES: Carney RM, Kohli MA, Kunkle BW, Naj AC, Gilbert JR, Züchner S, PERICAK-VANCE MA, Parkinsonism and distinct dementia patterns in a
Analysis of ERTS imagery using special electronic viewing/measuring equipment

NASA Technical Reports Server (NTRS)

Evans, W. E.; Serebreny, S. M.

1973-01-01

An electronic satellite image analysis console (ESIAC) is being employed to process imagery for use by USGS investigators in several different disciplines studying dynamic hydrologic conditions. The ESIAC provides facilities for storing registered image sequences in a magnetic video disc memory for subsequent recall, enhancement, and animated display in monochrome or color. Quantitative measurements of distances, areas, and brightness profiles can be extracted digitally under operator supervision. Initial results are presented for the display and measurement of snowfield extent, glacier development, sediment plumes from estuary discharge, playa inventory, phreatophyte and other vegetative changes.
Homozygous/Compound Heterozygous Triadin Mutations Associated With Autosomal-Recessive Long-QT Syndrome and Pediatric Sudden Cardiac Arrest: Elucidation of the Triadin Knockout Syndrome.

PubMed

Altmann, Helene M; Tester, David J; Will, Melissa L; Middha, Sumit; Evans, Jared M; Eckloff, Bruce W; Ackerman, Michael J

2015-06-09

Long-QT syndrome (LQTS) may result in syncope, seizures, or sudden cardiac arrest. Although 16 LQTS-susceptibility genes have been discovered, 20% to 25% of LQTS remains genetically elusive. We performed whole-exome sequencing child-parent trio analysis followed by recessive and sporadic inheritance modeling and disease-network candidate analysis gene ranking to identify a novel underlying genetic mechanism for LQTS. Subsequent mutational analysis of the candidate gene was performed with polymerase chain reaction, denaturing high-performance liquid chromatography, and DNA sequencing on a cohort of 33 additional unrelated patients with genetically elusive LQTS. After whole-exome sequencing and variant filtration, a homozygous p.D18fs*13 TRDN-encoded triadin frameshift mutation was discovered in a 10-year-old female patient with LQTS with a QTc of 500 milliseconds who experienced recurrent exertion-induced syncope/cardiac arrest beginning at 1 year of age. Subsequent mutational analysis of TRDN revealed either homozygous or compound heterozygous frameshift mutations in 4 of 33 unrelated cases of LQTS (12%). All 5 TRDN-null patients displayed extensive T-wave inversions in precordial leads V1 through V4, with either persistent or transient QT prolongation and severe disease expression of exercise-induced cardiac arrest in early childhood (≤3 years of age) and required aggressive therapy. The overall yield of TRDN mutations was significantly greater in patients ≤10 years of age (5 of 10, 50%) compared with older patients (0 of 24, 0%; P=0.0009). We identified TRDN as a novel underlying genetic basis for recessively inherited LQTS. All TRDN-null patients had strikingly similar phenotypes. Given the recurrent nature of potential lethal arrhythmias, patients fitting this phenotypic profile should undergo cardiac TRDN genetic testing. © 2015 American Heart Association, Inc.
Classification of Fowl Adenovirus Serotypes by Use of High-Resolution Melting-Curve Analysis of the Hexon Gene Region▿

PubMed Central

Steer, Penelope A.; Kirkpatrick, Naomi C.; O'Rourke, Denise; Noormohammadi, Amir H.

2009-01-01

Identification of fowl adenovirus (FAdV) serotypes is of importance in epidemiological studies of disease outbreaks and the adoption of vaccination strategies. In this study, real-time PCR and subsequent high-resolution melting (HRM)-curve analysis of three regions of the hexon gene were developed and assessed for their potential in differentiating 12 FAdV reference serotypes. The results were compared to previously described PCR and restriction enzyme analyses of the hexon gene. Both HRM-curve analysis of a 191-bp region of the hexon gene and restriction enzyme analysis failed to distinguish a number of serotypes used in this study. In addition, PCR of the region spanning nucleotides (nt) 144 to 1040 failed to amplify FAdV-5 in sufficient quantities for further analysis. However, HRM-curve analysis of the region spanning nt 301 to 890 proved a sensitive and specific method of differentiating all 12 serotypes. All melt curves were highly reproducible, and replicates of each serotype were correctly genotyped with a mean confidence value of more than 99% using normalized HRM curves. Sequencing analysis revealed that each profile was related to a unique sequence, with some sequences sharing greater than 94% identity. Melting-curve profiles were found to be related mainly to GC composition and distribution throughout the amplicons, regardless of sequence identity. The results presented in this study show that the closed-tube method of PCR and HRM-curve analysis provides an accurate, rapid, and robust genotyping technique for the identification of FAdV serotypes and can be used as a model for developing genotyping techniques for other pathogens. PMID:19036935
Four distinct types of E.C. 1.2.1.30 enzymes can catalyze the reduction of carboxylic acids to aldehydes.

PubMed

Stolterfoht, Holly; Schwendenwein, Daniel; Sensen, Christoph W; Rudroff, Florian; Winkler, Margit

2017-09-10

Increasing demand for chemicals from renewable resources calls for the development of new biotechnological methods for the reduction of oxidized bio-based compounds. Enzymatic carboxylate reduction is highly selective, both in terms of chemo- and product selectivity, but not many carboxylate reductase enzymes (CARs) have been identified on the sequence level to date. Thus far, their phylogeny is unexplored and very little is known about their structure-function-relationship. CARs minimally contain an adenylation domain, a phosphopantetheinylation domain and a reductase domain. We have recently identified new enzymes of fungal origin, using similarity searches against genomic sequences from organisms in which aldehydes were detected upon incubation with carboxylic acids. Analysis of sequences with known CAR functionality and CAR enzymes recently identified in our laboratory suggests that the three-domain architecture mentioned above is modular. The construction of a distance tree with a subsequent 1000-replicate bootstrap analysis showed that the CAR sequences included in our study fall into four distinct subgroups (one of bacterial origin and three of fungal origin, respectively), each with a bootstrap value of 100%. The multiple sequence alignment of all experimentally confirmed CAR protein sequences revealed fingerprint sequences of residues which are likely to be involved in substrate and co-substrate binding and one of the three catalytic substeps, respectively. The fingerprint sequences broaden our understanding of the amino acids that might be essential for the reduction of organic acids to the corresponding aldehydes in CAR proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
Cerebellar activation during motor sequence learning is associated with subsequent transfer to new sequences.

PubMed

Shimizu, Renee E; Wu, Allan D; Knowlton, Barbara J

2016-12-01

Effective learning results not only in improved performance on a practiced task, but also in the ability to transfer the acquired knowledge to novel, similar tasks. Using a modified serial reaction time (RT) task, the authors examined the ability to transfer to novel sequences after practicing sequences in a repetitive order versus a nonrepeating interleaved order. Interleaved practice resulted in better performance on new sequences than repetitive practice. In a second study, participants practiced interleaved sequences in a functional MRI (fMRI) scanner and received a transfer test of novel sequences. Transfer ability was positively correlated with cerebellar blood oxygen level dependent activity during practice, indicating that greater cerebellar engagement during training resulted in better subsequent transfer performance. Interleaved practice may thus result in a more generalized representation that is robust to interference, and the degree of activation in the cerebellum may be a reflection of the instantiation and engagement of internal models. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

PubMed

Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

2010-01-01

The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

PubMed

Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

2013-12-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

PubMed Central

Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

2013-01-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
Splice-site mutations identified in PDE6A responsible for retinitis pigmentosa in consanguineous Pakistani families

PubMed Central

Khan, Shahid Y.; Ali, Shahbaz; Naeem, Muhammad Asif; Khan, Shaheen N.; Husnain, Tayyab; Butt, Nadeem H.; Qazi, Zaheeruddin A.; Akram, Javed; Riazuddin, Sheikh; Ayyagari, Radha; Hejtmancik, J. Fielding

2015-01-01

Purpose This study was conducted to localize and identify causal mutations associated with autosomal recessive retinitis pigmentosa (RP) in consanguineous familial cases of Pakistani origin. Methods Ophthalmic examinations that included funduscopy and electroretinography (ERG) were performed to confirm the affectation status. Blood samples were collected from all participating individuals, and genomic DNA was extracted. A genome-wide scan was performed, and two-point logarithm of odds (LOD) scores were calculated. Sanger sequencing was performed to identify the causative variants. Subsequently, we performed whole exome sequencing to rule out the possibility of a second causal variant within the linkage interval. Sequence conservation was performed with alignment analyses of PDE6A orthologs, and in silico splicing analysis was completed with Human Splicing Finder version 2.4.1. Results A large multigenerational consanguineous family diagnosed with early-onset RP was ascertained. An ophthalmic clinical examination consisting of fundus photography and electroretinography confirmed the diagnosis of RP. A genome-wide scan was performed, and suggestive two-point LOD scores were observed with markers on chromosome 5q. Haplotype analyses identified the region; however, the region did not segregate with the disease phenotype in the family. Subsequently, we performed a second genome-wide scan that excluded the entire genome except the chromosome 5q region harboring PDE6A. Next-generation whole exome sequencing identified a splice acceptor site mutation in intron 16: c.2028–1G>A, which was completely conserved in PDE6A orthologs and was absent in ethnically matched 350 control chromosomes, the 1000 Genomes database, and the NHLBI Exome Sequencing Project. Subsequently, we investigated our entire cohort of RP familial cases and identified a second family who harbored a splice acceptor site mutation in intron 10: c.1408–2A>G. In silico analysis suggested that these mutations will result in the elimination of wild-type splice acceptor sites that would result in either skipping of the respective exon or the creation of a new cryptic splice acceptor site; both possibilities would result in retinal photoreceptor cells that lack PDE6A wild-type protein. Conclusions we report two splice acceptor site variations in PDE6A in consanguineous Pakistani families who manifested cardinal symptoms of RP. Taken together with our previously published work, our data suggest that mutations in PDE6A account for about 2% of the total genetic load of RP in our cohort and possibly in the Pakistani population as well. PMID:26321862
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.

PubMed

Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V

2018-02-01

Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.
A novel method for landslide displacement prediction by integrating advanced computational intelligence algorithms.

PubMed

Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Fu, Xiaolin

2018-05-08

Landslide displacement prediction is considered as an essential component for developing early warning systems. The modelling of conventional forecast methods requires enormous monitoring data that limit its application. To conduct accurate displacement prediction with limited data, a novel method is proposed and applied by integrating three computational intelligence algorithms namely: the wavelet transform (WT), the artificial bees colony (ABC), and the kernel-based extreme learning machine (KELM). At first, the total displacement was decomposed into several sub-sequences with different frequencies using the WT. Next each sub-sequence was predicted separately by the KELM whose parameters were optimized by the ABC. Finally the predicted total displacement was obtained by adding all the predicted sub-sequences. The Shuping landslide in the Three Gorges Reservoir area in China was taken as a case study. The performance of the new method was compared with the WT-ELM, ABC-KELM, ELM, and the support vector machine (SVM) methods. Results show that the prediction accuracy can be improved by decomposing the total displacement into sub-sequences with various frequencies and by predicting them separately. The ABC-KELM algorithm shows the highest prediction capacity followed by the ELM and SVM. Overall, the proposed method achieved excellent performance both in terms of accuracy and stability.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

PubMed

Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

2015-03-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

PubMed Central

DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

2015-01-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

PubMed

Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

2016-01-01

Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.
De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

PubMed Central

Meena, Seema; Kumar, Sarma R.; Venkata Rao, D. K.; Dwivedi, Varun; Shilpashree, H. B.; Rastogi, Shubhra; Shasany, Ajit K.; Nagegowda, Dinesh A.

2016-01-01

Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768

Characterization of Hepatitis C Virus (HCV) Envelope Diversification from Acute to Chronic Infection within a Sexually Transmitted HCV Cluster by Using Single-Molecule, Real-Time Sequencing

PubMed Central

Ho, Cynthia K. Y.; Raghwani, Jayna; Koekkoek, Sylvie; Liang, Richard H.; Van der Meer, Jan T. M.; Van Der Valk, Marc; De Jong, Menno; Pybus, Oliver G.

2016-01-01

ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. PMID:28077634
Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.

PubMed

Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W

2015-05-21

It has long been the dream of biologists to map gene expression at the single-cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single-cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after leukemia inhibitory factor (LIF) withdrawal. The reproducibility of these high-throughput single-cell data allowed us to deconstruct cell populations and infer gene expression relationships. VIDEO ABSTRACT. Copyright © 2015 Elsevier Inc. All rights reserved.
Alkaptonuria and Pompe disease in one patient: metabolic and molecular analysis.

PubMed

Zouheir Habbal, Mohammad; Bou Assi, Tarek; Mansour, Hicham

2013-04-29

Pompe disease is characterised by deficiency of acid α-glucosidase that results in abnormal glycogen deposition in the muscles. Alkaptonuria is caused by a defect in the enzyme homogentisate 1,2-dioxygenase with subsequent accumulation of homogentisic acid. We report the case of a 6-year-old boy diagnosed with Pompe disease and alkaptonuria. Urine organic acids and α-glucosidase were measured. Homogentisate 1,2-dioxygenase (HGO) and acid alpha-glucosidase (GAA) genes were sequenced by Sanger DNA sequencing. The level of α-glucosidase in white blood cells was markedly decreased (4 nm/mg) while the level of homogentisic acid was markedly increased (15 027 mmol/mol creatine). GAA sequencing detected two heterozygous GAA mutations (C.670C>T and C.1064T>C) while HGO sequencing revealed three polymorphisms in exons 4, 5 and 6, respectively. To the best of our knowledge, this is the first reported instance of Pompe disease and alkaptonuria occurring in the same individual.
Alkaptonuria and pompe disease in one patient: metabolic and molecular analysis

PubMed Central

Habbal, Mohammad Zouheir; Bou Assi, Tarek; Mansour, Hicham

2013-01-01

Pompe disease is characterised by deficiency of acid α-glucosidase that results in abnormal glycogen deposition in the muscles. Alkaptonuria is caused by a defect in the enzyme homogentisate 1,2-dioxygenase with subsequent accumulation of homogentisic acid. We report the case of a 6-year-old boy diagnosed with Pompe disease and alkaptonuria. Urine organic acids and α-glucosidase were measured. Homogentisate 1,2-dioxygenase (HGO) and acid alpha-glucosidase (GAA) genes were sequenced by Sanger DNA sequencing. The level of α-glucosidase in white blood cells was markedly decreased (4 nm/mg) while the level of homogentisic acid was markedly increased (15 027 mmol/mol creatine). GAA sequencing detected two heterozygous GAA mutations (C.670C>T and C.1064T>C) while HGO sequencing revealed three polymorphisms in exons 4, 5 and 6, respectively. To the best of our knowledge, this is the first reported instance of Pompe disease and alkaptonuria occurring in the same individual. PMID:23632174
The DNA sequence of the human X chromosome

PubMed Central

Ross, Mark T.; Grafham, Darren V.; Coffey, Alison J.; Scherer, Steven; McLay, Kirsten; Muzny, Donna; Platzer, Matthias; Howell, Gareth R.; Burrows, Christine; Bird, Christine P.; Frankish, Adam; Lovell, Frances L.; Howe, Kevin L.; Ashurst, Jennifer L.; Fulton, Robert S.; Sudbrak, Ralf; Wen, Gaiping; Jones, Matthew C.; Hurles, Matthew E.; Andrews, T. Daniel; Scott, Carol E.; Searle, Stephen; Ramser, Juliane; Whittaker, Adam; Deadman, Rebecca; Carter, Nigel P.; Hunt, Sarah E.; Chen, Rui; Cree, Andrew; Gunaratne, Preethi; Havlak, Paul; Hodgson, Anne; Metzker, Michael L.; Richards, Stephen; Scott, Graham; Steffen, David; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Ainscough, Rachael; Ambrose, Kerrie D.; Ansari-Lari, M. Ali; Aradhya, Swaroop; Ashwell, Robert I. S.; Babbage, Anne K.; Bagguley, Claire L.; Ballabio, Andrea; Banerjee, Ruby; Barker, Gary E.; Barlow, Karen F.; Barrett, Ian P.; Bates, Karen N.; Beare, David M.; Beasley, Helen; Beasley, Oliver; Beck, Alfred; Bethel, Graeme; Blechschmidt, Karin; Brady, Nicola; Bray-Allen, Sarah; Bridgeman, Anne M.; Brown, Andrew J.; Brown, Mary J.; Bonnin, David; Bruford, Elspeth A.; Buhay, Christian; Burch, Paula; Burford, Deborah; Burgess, Joanne; Burrill, Wayne; Burton, John; Bye, Jackie M.; Carder, Carol; Carrel, Laura; Chako, Joseph; Chapman, Joanne C.; Chavez, Dean; Chen, Ellson; Chen, Guan; Chen, Yuan; Chen, Zhijian; Chinault, Craig; Ciccodicola, Alfredo; Clark, Sue Y.; Clarke, Graham; Clee, Chris M.; Clegg, Sheila; Clerc-Blankenburg, Kerstin; Clifford, Karen; Cobley, Vicky; Cole, Charlotte G.; Conquer, Jen S.; Corby, Nicole; Connor, Richard E.; David, Robert; Davies, Joy; Davis, Clay; Davis, John; Delgado, Oliver; DeShazo, Denise; Dhami, Pawandeep; Ding, Yan; Dinh, Huyen; Dodsworth, Steve; Draper, Heather; Dugan-Rocha, Shannon; Dunham, Andrew; Dunn, Matthew; Durbin, K. James; Dutta, Ireena; Eades, Tamsin; Ellwood, Matthew; Emery-Cohen, Alexandra; Errington, Helen; Evans, Kathryn L.; Faulkner, Louisa; Francis, Fiona; Frankland, John; Fraser, Audrey E.; Galgoczy, Petra; Gilbert, James; Gill, Rachel; Glöckner, Gernot; Gregory, Simon G.; Gribble, Susan; Griffiths, Coline; Grocock, Russell; Gu, Yanghong; Gwilliam, Rhian; Hamilton, Cerissa; Hart, Elizabeth A.; Hawes, Alicia; Heath, Paul D.; Heitmann, Katja; Hennig, Steffen; Hernandez, Judith; Hinzmann, Bernd; Ho, Sarah; Hoffs, Michael; Howden, Phillip J.; Huckle, Elizabeth J.; Hume, Jennifer; Hunt, Paul J.; Hunt, Adrienne R.; Isherwood, Judith; Jacob, Leni; Johnson, David; Jones, Sally; de Jong, Pieter J.; Joseph, Shirin S.; Keenan, Stephen; Kelly, Susan; Kershaw, Joanne K.; Khan, Ziad; Kioschis, Petra; Klages, Sven; Knights, Andrew J.; Kosiura, Anna; Kovar-Smith, Christie; Laird, Gavin K.; Langford, Cordelia; Lawlor, Stephanie; Leversha, Margaret; Lewis, Lora; Liu, Wen; Lloyd, Christine; Lloyd, David M.; Loulseged, Hermela; Loveland, Jane E.; Lovell, Jamieson D.; Lozado, Ryan; Lu, Jing; Lyne, Rachael; Ma, Jie; Maheshwari, Manjula; Matthews, Lucy H.; McDowall, Jennifer; McLaren, Stuart; McMurray, Amanda; Meidl, Patrick; Meitinger, Thomas; Milne, Sarah; Miner, George; Mistry, Shailesh L.; Morgan, Margaret; Morris, Sidney; Müller, Ines; Mullikin, James C.; Nguyen, Ngoc; Nordsiek, Gabriele; Nyakatura, Gerald; O’Dell, Christopher N.; Okwuonu, Geoffery; Palmer, Sophie; Pandian, Richard; Parker, David; Parrish, Julia; Pasternak, Shiran; Patel, Dina; Pearce, Alex V.; Pearson, Danita M.; Pelan, Sarah E.; Perez, Lesette; Porter, Keith M.; Ramsey, Yvonne; Reichwald, Kathrin; Rhodes, Susan; Ridler, Kerry A.; Schlessinger, David; Schueler, Mary G.; Sehra, Harminder K.; Shaw-Smith, Charles; Shen, Hua; Sheridan, Elizabeth M.; Shownkeen, Ratna; Skuce, Carl D.; Smith, Michelle L.; Sotheran, Elizabeth C.; Steingruber, Helen E.; Steward, Charles A.; Storey, Roy; Swann, R. Mark; Swarbreck, David; Tabor, Paul E.; Taudien, Stefan; Taylor, Tineace; Teague, Brian; Thomas, Karen; Thorpe, Andrea; Timms, Kirsten; Tracey, Alan; Trevanion, Steve; Tromans, Anthony C.; d’Urso, Michele; Verduzco, Daniel; Villasana, Donna; Waldron, Lenee; Wall, Melanie; Wang, Qiaoyan; Warren, James; Warry, Georgina L.; Wei, Xuehong; West, Anthony; Whitehead, Siobhan L.; Whiteley, Mathew N.; Wilkinson, Jane E.; Willey, David L.; Williams, Gabrielle; Williams, Leanne; Williamson, Angela; Williamson, Helen; Wilming, Laurens; Woodmansey, Rebecca L.; Wray, Paul W.; Yen, Jennifer; Zhang, Jingkun; Zhou, Jianling; Zoghbi, Huda; Zorilla, Sara; Buck, David; Reinhardt, Richard; Poustka, Annemarie; Rosenthal, André; Lehrach, Hans; Meindl, Alfons; Minx, Patrick J.; Hillier, LaDeana W.; Willard, Huntington F.; Wilson, Richard K.; Waterston, Robert H.; Rice, Catherine M.; Vaudin, Mark; Coulson, Alan; Nelson, David L.; Weinstock, George; Sulston, John E.; Durbin, Richard; Hubbard, Tim; Gibbs, Richard A.; Beck, Stephan; Rogers, Jane; Bentley, David R.

2009-01-01

The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence. PMID:15772651
Management of familial cancer: sequencing, surveillance and society.

PubMed

Samuel, Nardin; Villani, Anita; Fernandez, Conrad V; Malkin, David

2014-12-01

The clinical management of familial cancer begins with recognition of patterns of cancer occurrence suggestive of genetic susceptibility in a proband or pedigree, to enable subsequent investigation of the underlying DNA mutations. In this regard, next-generation sequencing of DNA continues to transform cancer diagnostics, by enabling screening for cancer-susceptibility genes in the context of known and emerging familial cancer syndromes. Increasingly, not only are candidate cancer genes sequenced, but also entire 'healthy' genomes are mapped in children with cancer and their family members. Although large-scale genomic analysis is considered intrinsic to the success of cancer research and discovery, a number of accompanying ethical and technical issues must be addressed before this approach can be adopted widely in personalized therapy. In this Perspectives article, we describe our views on how the emergence of new sequencing technologies and cancer surveillance strategies is altering the framework for the clinical management of hereditary cancer. Genetic counselling and disclosure issues are discussed, and strategies for approaching ethical dilemmas are proposed.
Loss of DHR sequences at Browns Ferry Unit One - accident-sequence analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cook, D.H.; Grene, S.R.; Harrington, R.M.

1983-05-01

This study describes the predicted response of Unit One at the Browns Ferry Nuclear Plant to a postulated loss of decay heat removal (DHR) capability following scram from full power with the power conversion system unavailable. In accident sequences without DHR capability, the residual heat removal (RHR) system functions of pressure suppression pool cooling and reactor vessel shutdown cooling are unavailable. Consequently, all decay heat energy is stored in the pressure suppression pool with a concomitant increase in pool temperature and primary containment pressure. With the assumption that DHR capability is not regained during the lengthy course of this accidentmore » sequence, the containment ultimately fails by overpressurization. Although unlikely, this catastrophic failure might lead to loss of the ability to inject cooling water into the reactor vessel, causing subsequent core uncovery and meltdown. The timing of these events and the effective mitigating actions that might be taken by the operator are discussed in this report.« less
Computationally assisted screening and design of cell-interactive peptides by a cell-based assay using peptide arrays and a fuzzy neural network algorithm.

PubMed

Kaga, Chiaki; Okochi, Mina; Tomita, Yasuyuki; Kato, Ryuji; Honda, Hiroyuki

2008-03-01

We developed a method of effective peptide screening that combines experiments and computational analysis. The method is based on the concept that screening efficiency can be enhanced from even limited data by use of a model derived from computational analysis that serves as a guide to screening and combining the model with subsequent repeated experiments. Here we focus on cell-adhesion peptides as a model application of this peptide-screening strategy. Cell-adhesion peptides were screened by use of a cell-based assay of a peptide array. Starting with the screening data obtained from a limited, random 5-mer library (643 sequences), a rule regarding structural characteristics of cell-adhesion peptides was extracted by fuzzy neural network (FNN) analysis. According to this rule, peptides with unfavored residues in certain positions that led to inefficient binding were eliminated from the random sequences. In the restricted, second random library (273 sequences), the yield of cell-adhesion peptides having an adhesion rate more than 1.5-fold to that of the basal array support was significantly high (31%) compared with the unrestricted random library (20%). In the restricted third library (50 sequences), the yield of cell-adhesion peptides increased to 84%. We conclude that a repeated cycle of experiments screening limited numbers of peptides can be assisted by the rule-extracting feature of FNN.
Use of wavelet-packet transforms to develop an engineering model for multifractal characterization of mutation dynamics in pathological and nonpathological gene sequences

NASA Astrophysics Data System (ADS)

Walker, David Lee

1999-12-01

This study uses dynamical analysis to examine in a quantitative fashion the information coding mechanism in DNA sequences. This exceeds the simple dichotomy of either modeling the mechanism by comparing DNA sequence walks as Fractal Brownian Motion (fbm) processes. The 2-D mappings of the DNA sequences for this research are from Iterated Function System (IFS) (Also known as the ``Chaos Game Representation'' (CGR)) mappings of the DNA sequences. This technique converts a 1-D sequence into a 2-D representation that preserves subsequence structure and provides a visual representation. The second step of this analysis involves the application of Wavelet Packet Transforms, a recently developed technique from the field of signal processing. A multi-fractal model is built by using wavelet transforms to estimate the Hurst exponent, H. The Hurst exponent is a non-parametric measurement of the dynamism of a system. This procedure is used to evaluate gene- coding events in the DNA sequence of cystic fibrosis mutations. The H exponent is calculated for various mutation sites in this gene. The results of this study indicate the presence of anti-persistent, random walks and persistent ``sub-periods'' in the sequence. This indicates the hypothesis of a multi-fractal model of DNA information encoding warrants further consideration. This work examines the model's behavior in both pathological (mutations) and non-pathological (healthy) base pair sequences of the cystic fibrosis gene. These mutations both natural and synthetic were introduced by computer manipulation of the original base pair text files. The results show that disease severity and system ``information dynamics'' correlate. These results have implications for genetic engineering as well as in mathematical biology. They suggest that there is scope for more multi-fractal models to be developed.
Peptide Array X-Linking (PAX): A New Peptide-Protein Identification Approach

PubMed Central

Okada, Hirokazu; Uezu, Akiyoshi; Soderblom, Erik J.; Moseley, M. Arthur; Gertler, Frank B.; Soderling, Scott H.

2012-01-01

Many protein interaction domains bind short peptides based on canonical sequence consensus motifs. Here we report the development of a peptide array-based proteomics tool to identify proteins directly interacting with ligand peptides from cell lysates. Array-formatted bait peptides containing an amino acid-derived cross-linker are photo-induced to crosslink with interacting proteins from lysates of interest. Indirect associations are removed by high stringency washes under denaturing conditions. Covalently trapped proteins are subsequently identified by LC-MS/MS and screened by cluster analysis and domain scanning. We apply this methodology to peptides with different proline-containing consensus sequences and show successful identifications from brain lysates of known and novel proteins containing polyproline motif-binding domains such as EH, EVH1, SH3, WW domains. These results suggest the capacity of arrayed peptide ligands to capture and subsequently identify proteins by mass spectrometry is relatively broad and robust. Additionally, the approach is rapid and applicable to cell or tissue fractions from any source, making the approach a flexible tool for initial protein-protein interaction discovery. PMID:22606326
Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

PubMed Central

Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

2012-01-01

Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922
Modulations of neural activity in auditory streaming caused by spectral and temporal alternation in subsequent stimuli: a magnetoencephalographic study.

PubMed

Chakalov, Ivan; Draganova, Rossitza; Wollbrink, Andreas; Preissl, Hubert; Pantev, Christo

2012-06-20

The aim of the present study was to identify a specific neuronal correlate underlying the pre-attentive auditory stream segregation of subsequent sound patterns alternating in spectral or temporal cues. Fifteen participants with normal hearing were presented with series' of two consecutive ABA auditory tone-triplet sequences, the initial triplets being the Adaptation sequence and the subsequent triplets being the Test sequence. In the first experiment, the frequency separation (delta-f) between A and B tones in the sequences was varied by 2, 4 and 10 semitones. In the second experiment, a constant delta-f of 6 semitones was maintained but the Inter-Stimulus Intervals (ISIs) between A and B tones were varied. Auditory evoked magnetic fields (AEFs) were recorded using magnetoencephalography (MEG). Participants watched a muted video of their choice and ignored the auditory stimuli. In a subsequent behavioral study both MEG experiments were replicated to provide information about the participants' perceptual state. MEG measurements showed a significant increase in the amplitude of the B-tone related P1 component of the AEFs as delta-f increased. This effect was seen predominantly in the left hemisphere. A significant increase in the amplitude of the N1 component was only obtained for a Test sequence delta-f of 10 semitones with a prior Adaptation sequence of 2 semitones. This effect was more pronounced in the right hemisphere. The additional behavioral data indicated an increased probability of two-stream perception for delta-f = 4 and delta-f = 10 semitones with a preceding Adaptation sequence of 2 semitones. However, neither the neural activity nor the perception of the successive streaming sequences were modulated when the ISIs were alternated. Our MEG experiment demonstrated differences in the behavior of P1 and N1 components during the automatic segregation of sounds when induced by an initial Adaptation sequence. The P1 component appeared enhanced in all Test-conditions and thus demonstrates the preceding context effect, whereas N1 was specifically modulated only by large delta-f Test sequences induced by a preceding small delta-f Adaptation sequence. These results suggest that P1 and N1 components represent at least partially-different systems that underlie the neural representation of auditory streaming.
Cloning and expression of Bartonella henselae sucB gene encoding an immunogenic dihydrolipoamide succinyltransferase homologous protein.

PubMed

Kabeya, Hidenori; Maruyama, Soichi; Hirano, Kouji; Mikami, Takeshi

2003-01-01

Immunoscreening of a ZAP genomic library of Bartonella henselae strain Houston-1 expressed in Escherichia coli resulted in the isolation of a clone containing 3.5 kb BamHI genomic DNA fragment. This 3.5 kb DNA fragment was found to contain a sequence of a gene encoding a protein with significant homology to the dihydrolipoamide succinyltransferase of Brucella melitensis (sucB). Subsequent cloning and DNA sequence analysis revealed that the deduced amino acid sequence from the cloned gene showed 66.5% identity to SucB protein of B. melitensis, and 43.4 and 47.2% identities to those of Coxiella burnetii and E. coli, respectively. The gene was expressed as a His-Nus A-tagged fusion protein. The recombinant SucB protein (rSucB) was shown to be an immunoreactive protein of about 115 kDa by Western blot analysis with sera from B. henselae-immunized mice. Therefore the rSucB may be a candidate antigen for a specific serological diagnosis of B. henselae infection.
Genetic Variation and Population Differentiation in a Medical Herb Houttuynia cordata in China Revealed by Inter-Simple Sequence Repeats (ISSRs)

PubMed Central

Wei, Lin; Wu, Xian-Jin

2012-01-01

Houttuynia cordata is an important traditional Chinese herb with unresolved genetics and taxonomy, which lead to potential problems in the conservation and utilization of the resource. Inter-simple sequence repeat (ISSR) markers were used to assess the level and distribution of genetic diversity in 226 individuals from 15 populations of H. cordata in China. ISSR analysis revealed low genetic variations within populations but high genetic differentiations among populations. This genetic structure probably mainly reflects the historical association among populations. Genetic cluster analysis showed that the basal clade is composed of populations from Southwest China, and the other populations have continuous and eastward distributions. The structure of genetic diversity in H. cordata demonstrated that this species might have survived in Southwest China during the glacial age, and subsequently experienced an eastern postglacial expansion. Based on the results of genetic analysis, it was proposed that as many as possible targeted populations for conservation be included. PMID:22942696
Genetic variation and population differentiation in a medical herb Houttuynia cordata in China revealed by inter-simple sequence repeats (ISSRs).

PubMed

Wei, Lin; Wu, Xian-Jin

2012-01-01

Houttuynia cordata is an important traditional Chinese herb with unresolved genetics and taxonomy, which lead to potential problems in the conservation and utilization of the resource. Inter-simple sequence repeat (ISSR) markers were used to assess the level and distribution of genetic diversity in 226 individuals from 15 populations of H. cordata in China. ISSR analysis revealed low genetic variations within populations but high genetic differentiations among populations. This genetic structure probably mainly reflects the historical association among populations. Genetic cluster analysis showed that the basal clade is composed of populations from Southwest China, and the other populations have continuous and eastward distributions. The structure of genetic diversity in H. cordata demonstrated that this species might have survived in Southwest China during the glacial age, and subsequently experienced an eastern postglacial expansion. Based on the results of genetic analysis, it was proposed that as many as possible targeted populations for conservation be included.
Frame sequences analysis technique of linear objects movement

NASA Astrophysics Data System (ADS)

Oshchepkova, V. Y.; Berg, I. A.; Shchepkin, D. V.; Kopylova, G. V.

2017-12-01

Obtaining data by noninvasive methods are often needed in many fields of science and engineering. This is achieved through video recording in various frame rate and light spectra. In doing so quantitative analysis of movement of the objects being studied becomes an important component of the research. This work discusses analysis of motion of linear objects on the two-dimensional plane. The complexity of this problem increases when the frame contains numerous objects whose images may overlap. This study uses a sequence containing 30 frames at the resolution of 62 × 62 pixels and frame rate of 2 Hz. It was required to determine the average velocity of objects motion. This velocity was found as an average velocity for 8-12 objects with the error of 15%. After processing dependencies of the average velocity vs. control parameters were found. The processing was performed in the software environment GMimPro with the subsequent approximation of the data obtained using the Hill equation.
Lactobacillus apodemi sp. nov., a tannase-producing species isolated from wild mouse faeces.

PubMed

Osawa, Ro; Fujisawa, Tomohiko; Pukall, Rüdiger

2006-07-01

A Gram-positive, rod-shaped, non-endospore-forming bacterium, strain ASB1(T), able to degrade tannin, was isolated from faeces of the Japanese large wood mouse, Apodemus speciosus. Comparative analysis of the 16S rRNA gene sequence revealed that the strain could be assigned as a member of the genus Lactobacillus. The nearest phylogenetic neighbours were determined as Lactobacillus animalis DSM 20602(T) (98.9 % 16S rRNA gene sequence similarity) and Lactobacillus murinus ASF 361 (98.9 %). Subsequent polyphasic analysis, including automated ribotyping and DNA-DNA hybridization experiments, confirmed that the isolate represents a novel species, for which the name Lactobacillus apodemi sp. nov. is proposed. The DNA G+C content of the novel strain is 38.5 mol%. The cell-wall peptidoglycan is of type A4alpha L-lys-D-asp. The type strain is ASB1(T) (=DSM 16634(T)=CIP 108913(T)).
Chiasmatic and achiasmatic inverted meiosis of plants with holocentric chromosomes

PubMed Central

Cabral, Gabriela; Marques, André; Schubert, Veit; Pedrosa-Harand, Andrea; Schlögelhofer, Peter

2014-01-01

Meiosis is a specialized cell division in sexually reproducing organisms before gamete formation. Following DNA replication, the canonical sequence in species with monocentric chromosomes is characterized by reductional segregation of homologous chromosomes during the first and equational segregation of sister chromatids during the second meiotic division. Species with holocentric chromosomes employ specific adaptations to ensure regular disjunction during meiosis. Here we present the analysis of two closely related plant species with holocentric chromosomes that display an inversion of the canonical meiotic sequence, with the equational division preceding the reductional. In-depth analysis of the meiotic divisions of Rhynchospora pubera and R. tenuis reveals that during meiosis I sister chromatids are bi-oriented, display amphitelic attachment to the spindle and are subsequently separated. During prophase II, chromatids are connected by thin chromatin threads that appear instrumental for the regular disjunction of homologous non-sister chromatids in meiosis II. PMID:25295686
PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context

PubMed Central

Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing

2016-01-01

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community. PMID:27282833
Modeling participation duration, with application to the North American Breeding Bird Survey

USGS Publications Warehouse

Link, William; Sauer, John

2014-01-01

We consider “participation histories,” binary sequences consisting of alternating finite sequences of 1s and 0s, ending with an infinite sequence of 0s. Our work is motivated by a study of observer tenure in the North American Breeding Bird Survey (BBS). In our analysis, j indexes an observer’s years of service and Xj is an indicator of participation in the survey; 0s interspersed among 1s correspond to years when observers did not participate, but subsequently returned to service. Of interest is the observer’s duration D = max {j: Xj = 1}. Because observed records X = (X1, X2,..., Xn)1 are of finite length, all that we can directly infer about duration is that D ⩾ max {j ⩽n: Xj = 1}; model-based analysis is required for inference about D. We propose models in which lengths of 0s and 1s sequences have distributions determined by the index j at which they begin; 0s sequences are infinite with positive probability, an estimable parameter. We found that BBS observers’ lengths of service vary greatly, with 25.3% participating for only a single year, 49.5% serving for 4 or fewer years, and an average duration of 8.7 years, producing an average of 7.7 counts.

Predicting Flavonoid UGT Regioselectivity

PubMed Central

Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

2011-01-01

Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
A DNA 'barcode blitz': rapid digitization and sequencing of a natural history collection.

PubMed

Hebert, Paul D N; Dewaard, Jeremy R; Zakharov, Evgeny V; Prosser, Sean W J; Sones, Jayme E; McKeown, Jaclyn T A; Mantle, Beth; La Salle, John

2013-01-01

DNA barcoding protocols require the linkage of each sequence record to a voucher specimen that has, whenever possible, been authoritatively identified. Natural history collections would seem an ideal resource for barcode library construction, but they have never seen large-scale analysis because of concerns linked to DNA degradation. The present study examines the strength of this barrier, carrying out a comprehensive analysis of moth and butterfly (Lepidoptera) species in the Australian National Insect Collection. Protocols were developed that enabled tissue samples, specimen data, and images to be assembled rapidly. Using these methods, a five-person team processed 41,650 specimens representing 12,699 species in 14 weeks. Subsequent molecular analysis took about six months, reflecting the need for multiple rounds of PCR as sequence recovery was impacted by age, body size, and collection protocols. Despite these variables and the fact that specimens averaged 30.4 years old, barcode records were obtained from 86% of the species. In fact, one or more barcode compliant sequences (>487 bp) were recovered from virtually all species represented by five or more individuals, even when the youngest was 50 years old. By assembling specimen images, distributional data, and DNA barcode sequences on a web-accessible informatics platform, this study has greatly advanced accessibility to information on thousands of species. Moreover, much of the specimen data became publically accessible within days of its acquisition, while most sequence results saw release within three months. As such, this study reveals the speed with which DNA barcode workflows can mobilize biodiversity data, often providing the first web-accessible information for a species. These results further suggest that existing collections can enable the rapid development of a comprehensive DNA barcode library for the most diverse compartment of terrestrial biodiversity - insects.
Open-Source Sequence Clustering Methods Improve the State Of the Art.

PubMed

Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

2016-01-01

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Locating and Activating Molecular ‘Time Bombs’: Induction of Mycolata Prophages

PubMed Central

Dyson, Zoe A.; Brown, Teagan L.; Farrar, Ben; Doyle, Stephen R.; Tucci, Joseph; Seviour, Robert J.; Petrovski, Steve

2016-01-01

Little is known about the prevalence, functionality and ecological roles of temperate phages for members of the mycolic acid producing bacteria, the Mycolata. While many lytic phages infective for these organisms have been isolated, and assessed for their suitability for use as biological control agents of activated sludge foaming, no studies have investigated how temperate phages might be induced for this purpose. Bioinformatic analysis using the PHAge Search Tool (PHAST) on Mycolata whole genome sequence data in GenBank for members of the genera Gordonia, Mycobacterium, Nocardia, Rhodococcus, and Tsukamurella revealed 83% contained putative prophage DNA sequences. Subsequent prophage inductions using mitomycin C were conducted on 17 Mycolata strains. This led to the isolation and genome characterization of three novel Caudovirales temperate phages, namely GAL1, GMA1, and TPA4, induced from Gordonia alkanivorans, Gordonia malaquae, and Tsukamurella paurometabola, respectively. All possessed highly distinctive dsDNA genome sequences. PMID:27487243
GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

PubMed

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2016-01-01

Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.
A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).

PubMed

Kottmann, Renzo; Gray, Tanya; Murphy, Sean; Kagan, Leonid; Kravitz, Saul; Lombardot, Thierry; Field, Dawn; Glöckner, Frank Oliver

2008-06-01

The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).
Exome Sequence Reveals Mutations in CoA Synthase as a Cause of Neurodegeneration with Brain Iron Accumulation

PubMed Central

Dusi, Sabrina; Valletta, Lorella; Haack, Tobias B.; Tsuchiya, Yugo; Venco, Paola; Pasqualato, Sebastiano; Goffrini, Paola; Tigano, Marco; Demchenko, Nikita; Wieland, Thomas; Schwarzmayr, Thomas; Strom, Tim M.; Invernizzi, Federica; Garavaglia, Barbara; Gregory, Allison; Sanford, Lynn; Hamada, Jeffrey; Bettencourt, Conceição; Houlden, Henry; Chiapparini, Luisa; Zorzi, Giovanna; Kurian, Manju A.; Nardocci, Nardo; Prokisch, Holger; Hayflick, Susan; Gout, Ivan; Tiranti, Valeria

2014-01-01

Neurodegeneration with brain iron accumulation (NBIA) comprises a clinically and genetically heterogeneous group of disorders with progressive extrapyramidal signs and neurological deterioration, characterized by iron accumulation in the basal ganglia. Exome sequencing revealed the presence of recessive missense mutations in COASY, encoding coenzyme A (CoA) synthase in one NBIA-affected subject. A second unrelated individual carrying mutations in COASY was identified by Sanger sequence analysis. CoA synthase is a bifunctional enzyme catalyzing the final steps of CoA biosynthesis by coupling phosphopantetheine with ATP to form dephospho-CoA and its subsequent phosphorylation to generate CoA. We demonstrate alterations in RNA and protein expression levels of CoA synthase, as well as CoA amount, in fibroblasts derived from the two clinical cases and in yeast. This is the second inborn error of coenzyme A biosynthesis to be implicated in NBIA. PMID:24360804
Morphological identification and COI barcodes of adult flies help determine species identities of chironomid larvae (Diptera, Chironomidae).

PubMed

Failla, A J; Vasquez, A A; Hudson, P; Fujimoto, M; Ram, J L

2016-02-01

Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or 'species group' level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassessment of chironomid communities.
Morphological identification and COI barcodes of adult flies help determine species identities of chironomid larvae (Diptera, Chironomidae)

USGS Publications Warehouse

Failla, Andrew Joseph; Vasquez, Adrian Amelio; Hudson, Patrick L.; Fujimoto, Masanori; Ram, Jeffrey L.

2016-01-01

Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or ‘species group’ level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassesment of chironomid communities.
BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.

PubMed

Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun

2015-01-15

It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Microbial evolution of sulphate reduction when lateral gene transfer is geographically restricted.

PubMed

Chi Fru, E

2011-07-01

Lateral gene transfer (LGT) is an important mechanism by which micro-organisms acquire new functions. This process has been suggested to be central to prokaryotic evolution in various environments. However, the influence of geographical constraints on the evolution of laterally acquired genes in microbial metabolic evolution is not yet well understood. In this study, the influence of geographical isolation on the evolution of laterally acquired dissimilatory sulphite reductase (dsr) gene sequences in the sulphate-reducing micro-organisms (SRM) was investigated. Sequences on four continental blocks related to SRM known to have received dsr by LGT were analysed using standard phylogenetic and multidimensional statistical methods. Sequences related to lineages with large genetic diversity correlated positively with habitat divergence. Those affiliated to Thermodesulfobacterium indicated strong biogeographical delineation; hydrothermal-vent sequences clustered independently from hot-spring sequences. Some of the hydrothermal-vent and hot-spring sequences suggested to have been acquired from a common ancestral source may have diverged upon isolation within distinct habitats. In contrast, analysis of some Desulfotomaculum sequences indicated they could have been transferred from different ancestral sources but converged upon isolation within the same niche. These results hint that, after lateral acquisition of dsr genes, barriers to gene flow probably play a strong role in their subsequent evolution.
Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species

PubMed Central

Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N.

2014-01-01

Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021
GFam: a platform for automatic annotation of gene families.

PubMed

Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

2012-10-01

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Evolutionary history of the enolase gene family.

PubMed

Tracy, M R; Hedges, S B

2000-12-23

The enzyme enolase [EC 4.2.1.11] is found in all organisms, with vertebrates exhibiting tissue-specific isozymes encoded by three genes: alpha (alpha), beta (beta), and gamma (gamma) enolase. Limited taxonomic sampling of enolase has obscured the timing of gene duplication events. To help clarify the evolutionary history of the gene family, cDNAs were sequenced from six taxa representing major lineages of vertebrates: Chiloscyllium punctatum (shark), Amia calva (bowfin), Salmo trutta (trout), Latimeria chalumnae (coelacanth), Lepidosiren paradoxa (South American lungfish), and Neoceratodus forsteri (Australian lungfish). Phylogenetic analysis of all enolase and related gene sequences revealed an early gene duplication event prior to the last common ancestor of living organisms. Several distantly related archaebacterial sequences were designated as 'enolase-2', whereas all other enolase sequences were designated 'enolase-1'. Two of the three isozymes of enolase-1, alpha- and beta-enolase, were discovered in actinopterygian, sarcopterygian, and chondrichthian fishes. Phylogenetic analysis of vertebrate enolases revealed that the two gene duplications leading to the three isozymes of enolase-1 occurred subsequent to the divergence of living agnathans, near the Proterozoic/Phanerozoic boundary (approximately 550Mya). Two copies of enolase, designated alpha(1) and alpha(2), were found in the trout and are presumed to be the result of a genome duplication event.
Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding

PubMed Central

Best, Katharine; Oakes, Theres; Heather, James M.; Shawe-Taylor, John; Chain, Benny

2015-01-01

The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology. In combination with High Throughput Sequencing (HTS), PCR is widely used to quantify transcript abundance for RNA-seq, and in the context of analysis of T and B cell receptor repertoires. In this study, we combine DNA barcoding with HTS to quantify PCR output from individual target molecules. We develop computational tools that simulate both the PCR branching process itself, and the subsequent subsampling which typically occurs during HTS sequencing. We explore the influence of different types of heterogeneity on sequencing output, and compare them to experimental results where the efficiency of amplification is measured by barcodes uniquely identifying each molecule of starting template. Our results demonstrate that the PCR process introduces substantial amplification heterogeneity, independent of primer sequence and bulk experimental conditions. This heterogeneity can be attributed both to inherited differences between different template DNA molecules, and the inherent stochasticity of the PCR process. The results demonstrate that PCR heterogeneity arises even when reaction and substrate conditions are kept as constant as possible, and therefore single molecule barcoding is essential in order to derive reproducible quantitative results from any protocol combining PCR with HTS. PMID:26459131
Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

PubMed

Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

2011-03-04

Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
Molecular cloning and expression of the calmodulin gene from guinea pig hearts.

PubMed

Feng, Rui; Liu, Yan; Sun, Xuefei; Wang, Yan; Hu, Huiyuan; Guo, Feng; Zhao, Jinsheng; Hao, Liying

2015-06-01

The aim of the present study was to isolate and characterize a complementary DNA (cDNA) clone encoding the calmodulin (CaM; GenBank accession no. FJ012165) gene from guinea pig hearts. The CaM gene was amplified from cDNA collected from guinea pig hearts and inserted into a pGEM®-T Easy vector. Subsequently, CaM nucleotide and protein sequence similarity analysis was conducted between guinea pigs and other species. In addition, reverse transcription-polymerase chain reaction (RT-PCR) was performed to investigate the CaM 3 expression patterns in different guinea pig tissues. Sequence analysis revealed that the CaM gene isolated from the guinea pig heart had ∼90% sequence identity with the CaM 3 genes in humans, mice and rats. Furthermore, the deduced peptide sequences of CaM 3 in the guinea pig showed 100% homology to the CaM proteins from other species. In addition, the RT-PCR results indicated that CaM 3 was widely and differentially expressed in guinea pigs. In conclusion, the current study provided valuable information with regard to the cloning and expression of CaM 3 in guinea pig hearts. These findings may be helpful for understanding the function of CaM3 and the possible role of CaM3 in cardiovascular diseases.
From Genome to Function: Systematic Analysis of the Soil Bacterium Bacillus Subtilis

PubMed Central

Crawshaw, Samuel G.; Wipat, Anil

2001-01-01

Bacillus subtilis is a sporulating Gram-positive bacterium that lives primarily in the soil and associated water sources. Whilst this bacterium has been studied extensively in the laboratory, relatively few studies have been undertaken to study its activity in natural environments. The publication of the B. subtilis genome sequence and subsequent systematic functional analysis programme have provided an opportunity to develop tools for analysing the role and expression of Bacillus genes in situ. In this paper we discuss analytical approaches that are being developed to relate genes to function in environments such as the rhizosphere. PMID:18628943
Molecular characterization of an Akabane virus isolate from West Java, Indonesia.

PubMed

Purnomo Edi, Suryo; Ibrahim, Afif; Sukoco, Rinto; Bunali, Lukman; Taguchi, Masaji; Kato, Tomoko; Yanase, Tohru; Shirafuji, Hiroaki

2017-04-08

We isolated an arbovirus from bovine blood in Indonesia. The arbovirus was obtained from the plasma of a cow showing no clinical symptoms in West Java in February 2014, and was identified as Akabane virus (AKAV) by AKAV-specific RT-PCR and subsequent sequence analysis. Phylogenetic analysis based on partial S segment indicated the AKAV isolate, WJ-1SA/P/2014, was most closely related with two isolates from Israel and Turkey reported in 2001 and 2015, respectively, and that WJ-1SA/P/2014 isolate belongs to AKAV genogroup Ib. This is the first isolation of AKAV from Indonesia.
Model for turbidite-to-contourite continuum and multiple process transport in deep marine settings: examples in the rock record

NASA Astrophysics Data System (ADS)

Stanley, Daniel Jean

1993-01-01

Petrological analysis of geological sections in St. Croix in the Caribbean, the Niesenflysch in Switzerland and the Annot Sandstone in the French Maritime Alps sheds light on multiple process transport in deep marine settings. A model depicting a turbidite-to-contourite continuum of stratal types is applied to these three rock units. Recognition of a diverse suite of bedforms, coupled with analysis of paleocurrents, helps to better interpret depositional origin and basin paleogeography. The St. Croix strata record emplacement by gravity flows and, subsequently, by bottom currents flowing parallel to the base of slope; these sediments accumulated on a lower slope apron. A Niesenflysch section in the Swiss Alps west of Adelboden includes turbidites which were deposited at fairly regular intervals beyond the base of slope, in a setting more distal than that of the St. Croix sequences. Most of these turbidites appear to have been partially reworked by bottom currents related to basin circulation or to density flows from the basin margins. In the Annot Sandstone, reworked turbidites (termed transitional variants) and packets of entirely rippled strata are observed in submarine fan and slope sequences in the Peira-Cava area. In contrast to those in St. Croix and the Niesenflysch, the current-emplaced deposits of the Annot Sandstone are directly associated with fan-valley deposits. Such rippled strata in channels are deposits of gravity flow origin which were subsequently reworked downslope by currents generated by successive gravity flows; they also occur on levees by overbank flow. Consideration of multiple process transport is of special help to interpret sections which are poorly exposed, or which can be examined in cores, or which are located in sequences that have been highly deformed structurally.

Arterial signal timing optimization using PASSER II-87

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, E.C.P.; Messer, C.J.; Garza, R.U.

1988-11-01

PASSER is the acronym for the Progression Analysis and Signal System Evaluation Routine. PASSER II was originally developed by the Texas Transportation Institute (TTI) for the Dallas Corridor Project. The Texas State Department of Highways and Public Transportation (SDHPT) has sponsored the subsequent program development on both mainframe computers and microcomputers. The theory, model structure, methodology, and logic of PASSER II have been evaluated and well documented. PASSER II is widely used because of its ability to easily select multiple-phase sequences by adjusting the background cycle length and progression speeds to find the optimal timing plants, such as cycle, greenmore » split, phase sequence, and offsets, that can efficiently maximize the two-way progression bands.« less
Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

PubMed

Pan, Xiaoyong; Shen, Hong-Bin

2018-05-02

RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Characterization of a human X-linked gene from the DXS732E locus in the candidate region for the anhidrotic ectodermal dysplasia (EDA) gene (Xq13.1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gault, J.; Zonana, J.; Zeltinger, J.

A conserved mouse genomic clone was used to identify a homologous human genomic clone (the DXS732E locus), which was subsequently employed to isolate cDNAs from a human fetal brain library. Nine unique overlapping cDNAs were isolated, and sequences analysis of 3.9 kb identified a putative 1 kb ORF. GRAIL analysis of the sequence supported the hypothesis that the putative ORF was coding sequence, and Prosite analysis of the putative ORF identified potential glycosylation and phosphorylation sites. The 5{prime} end of the gene maps within a CpG island, and comparison of cDNA sequences indicate the gene is alternatively spliced at itsmore » 3{prime} end. Northern analysis and RT-PCR indicate that two different sized messages appear to be expressed with the gene expressed in human fetal kidney, intestine, brain, and muscle. The gene is expressed in 77 day human skin, a time when hair follicle formation occurs. Anhidrotic ectodermal dysplasia (EDA) results in the abnormal morphogenesis of hair, teeth and eccrine sweat glands. A positional cloning strategy towards cloning the EDA gene had been used, and deletion and X-autosome translocation patients have been useful in further delimiting the EDA region. The present gene at the DXS732E locus is partially deleted in one EDA patient who does not have other apparent abnormalities. No rearrangements of the gene have been detected in two female X-autosome translocation EDA patients, nor in four additional male patients with submicroscopic molecular deletions.« less
A Single Molecular Beacon Probe Is Sufficient for the Analysis of Multiple Nucleic Acid Sequences

PubMed Central

Gerasimova, Yulia V.; Hayson, Aaron; Ballantyne, Jack; Kolpashchikov, Dmitry M.

2010-01-01

Molecular beacon (MB) probes are dual-labeled hairpin-shaped oligodeoxyribonucleotides that are extensively used for real-time detection of specific RNA/DNA analytes. In the MB probe, the loop fragment is complementary to the analyte: therefore, a unique probe is required for the analysis of each new analyte sequence. The conjugation of an oligonucleotide with two dyes and subsequent purification procedures add to the cost of MB probes, thus reducing their application in multiplex formats. Here we demonstrate how one MB probe can be used for the analysis of an arbitrary nucleic acid. The approach takes advantage of two oligonucleotide adaptor strands, each of which contains a fragment complementary to the analyte and a fragment complementary to an MB probe. The presence of the analyte leads to association of MB probe and the two DNA strands in quadripartite complex. The MB probe fluorescently reports the formation of this complex. In this design, the MB does not bind the analyte directly; therefore, the MB sequence is independent of the analyte. In this study one universal MB probe was used to genotype three human polymorphic sites. This approach promises to reduce the cost of multiplex real-time assays and improve the accuracy of single-nucleotide polymorphism genotyping. PMID:20665615
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

PubMed

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-08-01

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

PubMed

Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

2013-07-01

We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology.
aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

PubMed Central

Kuraku, Shigehiro; Zmasek, Christian M.; Nishimura, Osamu; Katoh, Kazutaka

2013-01-01

We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614
Identification of the PLA2G6 c.1579G>A Missense Mutation in Papillon Dog Neuroaxonal Dystrophy Using Whole Exome Sequencing Analysis

PubMed Central

Tsuboi, Masaya; Watanabe, Manabu; Nibe, Kazumi; Yoshimi, Natsuko; Kato, Akihisa; Sakaguchi, Masahiro; Yamato, Osamu; Tanaka, Miyuu; Kuwamura, Mitsuru; Kushida, Kazuya; Harada, Tomoyuki; Chambers, James Kenn; Sugano, Sumio; Uchida, Kazuyuki; Nakayama, Hiroyuki

2017-01-01

Whole exome sequencing (WES) has become a common tool for identifying genetic causes of human inherited disorders, and it has also recently been applied to canine genome research. We conducted WES analysis of neuroaxonal dystrophy (NAD), a neurodegenerative disease that sporadically occurs worldwide in Papillon dogs. The disease is considered an autosomal recessive monogenic disease, which is histopathologically characterized by severe axonal swelling, known as “spheroids,” throughout the nervous system. By sequencing all eleven DNA samples from one NAD-affected Papillon dog and her parents, two unrelated NAD-affected Papillon dogs, and six unaffected control Papillon dogs, we identified 10 candidate mutations. Among them, three candidates were determined to be “deleterious” by in silico pathogenesis evaluation. By subsequent massive screening by TaqMan genotyping analysis, only the PLA2G6 c.1579G>A mutation had an association with the presence or absence of the disease, suggesting that it may be a causal mutation of canine NAD. As a human homologue of this gene is a causative gene for infantile neuroaxonal dystrophy, this canine phenotype may serve as a good animal model for human disease. The results of this study also indicate that WES analysis is a powerful tool for exploring canine hereditary diseases, especially in rare monogenic hereditary diseases. PMID:28107443
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project

PubMed Central

Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John

2008-01-01

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
Microbial forensics: fiber optic microarray subtyping of Bacillus anthracis

NASA Astrophysics Data System (ADS)

Shepard, Jason R. E.

2009-05-01

The past decade has seen increased development and subsequent adoption of rapid molecular techniques involving DNA analysis for detection of pathogenic microorganisms, also termed microbial forensics. The continued accumulation of microbial sequence information in genomic databases now better positions the field of high-throughput DNA analysis to proceed in a more manageable fashion. The potential to build off of these databases exists as technology continues to develop, which will enable more rapid, cost effective analyses. This wealth of genetic information, along with new technologies, has the potential to better address some of the current problems and solve the key issues involved in DNA analysis of pathogenic microorganisms. To this end, a high density fiber optic microarray has been employed, housing numerous DNA sequences simultaneously for detection of various pathogenic microorganisms, including Bacillus anthracis, among others. Each organism is analyzed with multiple sequences and can be sub-typed against other closely related organisms. For public health labs, real-time PCR methods have been developed as an initial preliminary screen, but culture and growth are still considered the gold standard. Technologies employing higher throughput than these standard methods are better suited to capitalize on the limitless potential garnered from the sequence information. Microarray analyses are one such format positioned to exploit this potential, and our array platform is reusable, allowing repetitive tests on a single array, providing an increase in throughput and decrease in cost, along with a certainty of detection, down to the individual strain level.
Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis

PubMed Central

Miotto, Olivo; Heiny, AT; Tan, Tin Wee; August, J Thomas; Brusic, Vladimir

2008-01-01

Background The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components. Results We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts. Conclusion By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes. PMID:18315849
A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

NASA Astrophysics Data System (ADS)

Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

2006-12-01

We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.
Transcriptomic Analysis of Paulownia Infected by Paulownia Witches'-Broom Phytoplasma

PubMed Central

Zhu, Shui-Fang; Lin, Cai-Li; Tian, Guo-Zhong; Xu, Xia; Zhao, Wen-Jun

2013-01-01

Phytoplasmas are plant pathogenic bacteria that have no cell wall and are responsible for major crop losses throughout the world. Phytoplasma-infected plants show a variety of symptoms and the mechanisms they use to physiologically alter the host plants are of considerable interest, but poorly understood. In this study we undertook a detailed analysis of Paulownia infected by Paulownia witches’-broom (PaWB) Phytoplasma using high-throughput mRNA sequencing (RNA-Seq) and digital gene expression (DGE). RNA-Seq analysis identified 74,831 unigenes, which were subsequently used as reference sequences for DGE analysis of diseased and healthy Paulownia in field grown and tissue cultured plants. Our study revealed that dramatic changes occurred in the gene expression profile of Paulownia after PaWB Phytoplasma infection. Genes encoding key enzymes in cytokinin biosynthesis, such as isopentenyl diphosphate isomerase and isopentenyltransferase, were significantly induced in the infected Paulownia. Genes involved in cell wall biosynthesis and degradation were largely up-regulated and genes related to photosynthesis were down-regulated after PaWB Phytoplasma infection. Our systematic analysis provides comprehensive transcriptomic data about plants infected by Phytoplasma. This information will help further our understanding of the detailed interaction mechanisms between plants and Phytoplasma. PMID:24130859
Mass spectrometric analysis of O-linked oligosaccharides from various recombinant expression systems.

PubMed

Kenny, Diarmuid T; Gaunitz, Stefan; Hayes, Catherine A; Gustafsson, Anki; Sjöblom, Magnus; Holgersson, Jan; Karlsson, Niclas G

2013-01-01

Analysis of O-linked glycosylation is one of the main challenges during structural validation of recombinant glycoproteins. With methods available for N-linked glycosylation in regard to oligosaccharide analysis as well as glycopeptide mapping, there are still challenges for O-linked glycan analysis. Here, we present mass spectrometric methodology for O-linked oligosaccharides released by reductive β-elimination. Using LC-MS and LC-MS(2) with graphitized carbon columns, oligosaccharides are analyzed without derivatization. This approach provides a high-throughput method for screening during clonal selection, as well as product structure verification, without impairing sequencing ability. The protocols are exemplified by analysis of glycoproteins from mammalian cell cultures (CHO cells) as well as insect cells and yeast. The data shows that the method can be successfully applied to both neutral and acidic O-linked oligosaccharides, where sialic acid, hexuronic acid, and sulfate are common substituents. Further characterization of O-glycans can be achieved using permethylation. Permethylation of O-linked oligosaccharides followed by direct infusion into the mass spectrometer provide information about oligosaccharide composition, and subsequent MS (n) experiments can be carried out to elucidate oligosaccharide structure including linkage information and sequence.
Mosaic CREBBP mutation causes overlapping clinical features of Rubinstein-Taybi and Filippi syndromes.

PubMed

de Vries, Tamar I; Monroe, Glen R; van Belzen, Martine J; van der Lans, Christian A; Savelberg, Sanne Mc; Newman, William G; van Haaften, Gijs; Nievelstein, Rutger A; van Haelst, Mieke M

2016-08-01

Rubinstein-Taybi syndrome (RTS, OMIM 180849) and Filippi syndrome (FLPIS, OMIM 272440) are both rare syndromes, with multiple congenital anomalies and intellectual deficit (MCA/ID). We present a patient with intellectual deficit, short stature, bilateral syndactyly of hands and feet, broad thumbs, ocular abnormalities, and dysmorphic facial features. These clinical features suggest both RTS and FLPIS. Initial DNA analysis of DNA isolated from blood did not identify variants to confirm either of these syndrome diagnoses. Whole-exome sequencing identified a homozygous variant in C9orf173, which was novel at the time of analysis. Further Sanger sequencing analysis of FLPIS cases tested negative for CKAP2L variants did not, however, reveal any further variants. Subsequent analysis using DNA isolated from buccal mucosa revealed a mosaic variant in CREBBP. This report highlights the importance of excluding mosaic variants in patients with a strong but atypical clinical presentation of a MCA/ID syndrome if no disease-causing variants can be detected in DNA isolated from blood samples. As the striking syndactyly observed in the present case is typical for FLPIS, we suggest CREBBP analysis in saliva samples for FLPIS syndrome cases in which no causal CKAP2L variant is detected.
Schizosaccharomyces pombe Polysome Profile Analysis and RNA Purification.

PubMed

Wolf, Dieter A; Bähler, Jürg; Wise, Jo Ann

2017-04-03

Polysome profile analysis is widely used by investigators studying the mechanism and regulation of translation. The method described here uses high-velocity centrifugation of whole cell extracts on linear sucrose gradients to separate 40S and 60S ribosomal subunits from 80S monosomes and polysomes. Cycloheximide is included in the lysis buffer to "freeze" polysomes by blocking translation. After centrifugation, the gradient is fractionated and RNA (and/or protein) is prepared from each fraction for subsequent analysis of individual species using northern or western blots. The entire RNA population in each fraction can be analyzed by hybridization to microarrays or by high-throughput RNA sequencing, and the proteins present can be identified by mass spectrometry analysis. © 2017 Cold Spring Harbor Laboratory Press.
Tracking and Motion Analysis of Crack Propagations in Crystals for Molecular Dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsap, L V; Duchaineau, M; Goldgof, D B

2001-05-14

This paper presents a quantitative analysis for a discovery in molecular dynamics. Recent simulations have shown that velocities of crack propagations in crystals under certain conditions can become supersonic, which is contrary to classical physics. In this research, they present a framework for tracking and motion analysis of crack propagations in crystals. It includes line segment extraction based on Canny edge maps, feature selection based on physical properties, and subsequent tracking of primary and secondary wavefronts. This tracking is completely automated; it runs in real time on three 834-image sequences using forty 250 MHZ processors. Results supporting physical observations aremore » presented in terms of both feature tracking and velocity analysis.« less
On the value of Mendelian laws of segregation in families: data quality control, imputation and beyond

PubMed Central

Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.

2014-01-01

When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups “Quality Control” and “Dropping WGS through families using GWAS framework” focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184
Alternate assembly sequence databook for the Tier 2 Bus-1 option of the International Space Station

NASA Technical Reports Server (NTRS)

Brewer, L. M.; Cirillo, W. M.; Cruz, J. N.; Hall, J. B.; Troutman, P. A.; Monell, D. W.; Garn, M. A.; Heck, M. L.; Kumar, R. R.; Llewellyn, C. P.

1995-01-01

The JSC International Space Station program office requested that SSB prepare a databook to document the alternate space station assembly sequence known as Tier 2, which assumes that the Russian participation has been eliminated and that the functions that were supplied by the Russians (propulsion, resupply, initial attitude control, communications, etc.) are now supplied by the U.S. Tier 2 utilizes the Lockheed Bus-l to replace much of the missing Russian functionality. The space station at each stage of its buildup during the Tier 2 assembly sequence is characterized in terms of of properties, functionality, resource balances, operations, logistics, attitude control, microgravity environment and propellant usage. The assembly sequence as analyzed was defined by JSC as a first iteration, with subsequent iterations required to address some of the issues that the analysis in this databook identified. Several significant issues were identified, including: less than desirable orbit lifetimes, shortage of EVA, large flight attitudes, poor microgravity environments, and reboost propellant shortages. Many of these issues can be resolved but at the cost of possible baseline modifications and revisions in the proposed Tier 2 assembly sequence.
Identification of trimannoside-recognizing peptide sequences from a T7 phage display screen using a QCM device.

PubMed

Nishiyama, Kazusa; Takakusagi, Yoichi; Kusayanagi, Tomoe; Matsumoto, Yuki; Habu, Shiori; Kuramochi, Kouji; Sugawara, Fumio; Sakaguchi, Kengo; Takahashi, Hideyo; Natsugari, Hideaki; Kobayashi, Susumu

2009-01-01

Here, we report on the identification of trimannoside-recognizing peptide sequences from a T7 phage display screen using a quartz-crystal microbalance (QCM) device. A trimannoside derivative that can form a self-assembled monolayer (SAM) was synthesized and used for immobilization on the gold electrode surface of a QCM sensor chip. After six sets of one-cycle affinity selection, T7 phage particles displaying PSVGLFTH (8-mer) and SVGLGLGFSTVNCF (14-mer) were found to be enriched at a rate of 17/44, 9/44, respectively, suggesting that these peptides specifically recognize trimannoside. Binding checks using the respective single T7 phage and synthetic peptide also confirmed the specific binding of these sequences to the trimannoside-SAM. Subsequent analysis revealed that these sequences correspond to part of the primary amino acid sequence found in many mannose- or hexose-related proteins. Taken together, these results demonstrate the effectiveness of our T7 phage display environment for affinity selection of binding peptides. We anticipate this screening result will also be extremely useful in the development of inhibitors or drug delivery systems targeting polysaccharides as well as further investigations into the function of carbohydrates in vivo.

RNA-seq analysis of Rubus idaeus cv. Nova: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

PubMed

Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

2014-10-01

Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.
RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

PubMed Central

Xia, Wei; Mason, Annaliese S.; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Background Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. Methodology/Principal Findings To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Conclusions/Significance Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species. PMID:23555859
Detection of integrated papillomavirus sequences by ligation-mediated PCR (DIPS-PCR) and molecular characterization in cervical cancer cells.

PubMed

Luft, F; Klaes, R; Nees, M; Dürst, M; Heilmann, V; Melsheimer, P; von Knebel Doeberitz, M

2001-04-01

Human papillomavirus (HPV) genomes usually persist as episomal molecules in HPV associated preneoplastic lesions whereas they are frequently integrated into the host cell genome in HPV-related cancers cells. This suggests that malignant conversion of HPV-infected epithelia is linked to recombination of cellular and viral sequences. Due to technical limitations, precise sequence information on viral-cellular junctions were obtained only for few cell lines and primary lesions. In order to facilitate the molecular analysis of genomic HPV integration, we established a ligation-mediated PCR assay for the detection of integrated papillomavirus sequences (DIPS-PCR). DIPS-PCR was initially used to amplify genomic viral-cellular junctions from HPV-associated cervical cancer cell lines (C4-I, C4-II, SW756, and HeLa) and HPV-immortalized keratinocyte lines (HPKIA, HPKII). In addition to junctions already reported in public data bases, various new fusion fragments were identified. Subsequently, 22 different viral-cellular junctions were amplified from 17 cervical carcinomas and 1 vulval intraepithelial neoplasia (VIN III). Sequence analysis of each junction revealed that the viral E1 open reading frame (ORF) was fused to cellular sequences in 20 of 22 (91%) cases. Chromosomal integration loci mapped to chromosomes 1 (2n), 2 (3n), 7 (2n), 8 (3n), 10 (1n), 14 (5n), 16 (1n), 17 (2n), and mitochondrial DNA (1n), suggesting random distribution of chromosomal integration sites. Precise sequence information obtained by DIPS-PCR was further used to monitor the monoclonal origin of 4 cervical cancers, 1 case of recurrent premalignant lesions and 1 lymph node metastasis. Therefore, DIPS-PCR might allow efficient therapy control and prediction of relapse in patients with HPV-associated anogenital cancers. Copyright 2001 Wiley-Liss, Inc.
RNA-Seq analysis of Cocos nucifera: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

PubMed

Fan, Haikuo; Xiao, Yong; Yang, Yaodong; Xia, Wei; Mason, Annaliese S; Xia, Zhihui; Qiao, Fei; Zhao, Songlin; Tang, Haoru

2013-01-01

Cocos nucifera (coconut), a member of the Arecaceae family, is an economically important woody palm grown in tropical regions. Despite its agronomic importance, previous germplasm assessment studies have relied solely on morphological and agronomical traits. Molecular biology techniques have been scarcely used in assessment of genetic resources and for improvement of important agronomic and quality traits in Cocos nucifera, mostly due to the absence of available sequence information. To provide basic information for molecular breeding and further molecular biological analysis in Cocos nucifera, we applied RNA-seq technology and de novo assembly to gain a global overview of the Cocos nucifera transcriptome from mixed tissue samples. Using Illumina sequencing, we obtained 54.9 million short reads and conducted de novo assembly to obtain 57,304 unigenes with an average length of 752 base pairs. Sequence comparison between assembled unigenes and released cDNA sequences of Cocos nucifera and Elaeis guineensis indicated that the assembled sequences were of high quality. Approximately 99.9% of unigenes were novel compared to the released coconut EST sequences. Using BLASTX, 68.2% of unigenes were successfully annotated based on the Genbank non-redundant (Nr) protein database. The annotated unigenes were then further classified using the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Our study provides a large quantity of novel genetic information for Cocos nucifera. This information will act as a valuable resource for further molecular genetic studies and breeding in coconut, as well as for isolation and characterization of functional genes involved in different biochemical pathways in this important tropical crop species.
First Report on Circulation of Echinococcus ortleppi in the one Humped Camel (Camelus dromedaries), Sudan

PubMed Central

2013-01-01

Background Echinococcus granulosus (EG) complex, the cause of cystic echinococcosis (CE), infects humans and several other animal species worldwide and hence the disease is of public health importance. Ten genetic variants, or genotypes designated as (G1-G10), are distributed worldwide based on genetic diversity. The objective of this study was to provide some sequence data and phylogeny of EG isolates recovered from the Sudanese one-humped camel (Camelus dromedaries). Fifty samples of hydatid cysts were collected from the one- humped camels (Camelus dromedaries) at Taboul slaughter house, central Sudan. DNAs were extracted from protoscolices and/or associated germinal layers of hydatid cysts using a commercial kit. The mitochondrial NADH dehydrogenase subunit 1 (NADH1) gene and the cytochrome C oxidase subunit 1 (cox1) gene were used as targets for polymerase chain reaction (PCR) amplification. The PCR products were purified and partial sequences were generated. Sequences were further examined by sequence analysis and subsequent phylogeny to compare these sequences to those from known strains of EG circulating globally. Results The identity of the PCR products were confirmed as NADH1 and cox1 nucleotide sequences using the Basic Local Alignment Search Tool (BLAST) of NCBI (National Center for Biotechnology Information, Bethesda, MD). The phylogenetic analysis showed that 98% (n = 49) of the isolates clustered with Echinococcus canadensis genotype 6 (G6), whereas only one isolate (2%) clustered with Echinococcus ortleppi (G5). Conclusions This investigation expands on the existing sequence data generated from EG isolates recovered from camel in the Sudan. The circulation of the cattle genotype (G5) in the one-humped camel is reported here for the first time. PMID:23800362
First report on circulation of Echinococcus ortleppi in the one humped camel (Camelus dromedaries), Sudan.

PubMed

Ahmed, Mohamed E; Eltom, Kamal H; Musa, Nasreen O; Ali, Ibtisam A; Elamin, Fatima M; Grobusch, Martin P; Aradaib, Imadeldin E

2013-06-25

Echinococcus granulosus (EG) complex, the cause of cystic echinococcosis (CE), infects humans and several other animal species worldwide and hence the disease is of public health importance. Ten genetic variants, or genotypes designated as (G1-G10), are distributed worldwide based on genetic diversity. The objective of this study was to provide some sequence data and phylogeny of EG isolates recovered from the Sudanese one-humped camel (Camelus dromedaries). Fifty samples of hydatid cysts were collected from the one- humped camels (Camelus dromedaries) at Taboul slaughter house, central Sudan. DNAs were extracted from protoscolices and/or associated germinal layers of hydatid cysts using a commercial kit. The mitochondrial NADH dehydrogenase subunit 1 (NADH1) gene and the cytochrome C oxidase subunit 1 (cox1) gene were used as targets for polymerase chain reaction (PCR) amplification. The PCR products were purified and partial sequences were generated. Sequences were further examined by sequence analysis and subsequent phylogeny to compare these sequences to those from known strains of EG circulating globally. The identity of the PCR products were confirmed as NADH1 and cox1 nucleotide sequences using the Basic Local Alignment Search Tool (BLAST) of NCBI (National Center for Biotechnology Information, Bethesda, MD). The phylogenetic analysis showed that 98% (n = 49) of the isolates clustered with Echinococcus canadensis genotype 6 (G6), whereas only one isolate (2%) clustered with Echinococcus ortleppi (G5). This investigation expands on the existing sequence data generated from EG isolates recovered from camel in the Sudan. The circulation of the cattle genotype (G5) in the one-humped camel is reported here for the first time.
Isolation of Cryptococcus gattii from a Castanopsis argyrophylla tree hollow (Mai-Kaw), Chiang Mai, Thailand.

PubMed

Khayhan, Kantarawee; Hagen, Ferry; Norkaew, Treepradab; Puengchan, Tanpalang; Boekhout, Teun; Sriburee, Pojana

2017-04-01

The pathogenic yeast Cryptococcus gattii was isolated from a tree hollow of a Castanopsis argyrophylla King ex Hook.f. (Fagaceae) in Chiang Mai, Thailand. Molecular characterization with amplified fragment length polymorphism analysis and multi-locus sequence typing showed that this isolate belonged to genotype AFLP4/VGI representing C. gattii sensu stricto. Subsequent comparison of the environmental isolate with those from clinical samples from Thailand showed that they grouped closely together in a single cluster.
Microbial Diversity of Septic Tank Effluent and a Soil Biomat▿ †

PubMed Central

Tomaras, Jill; Sahl, Jason W.; Siegrist, Robert L.; Spear, John R.

2009-01-01

Microbial diversity of septic tank effluent (STE) and the biomat that is formed as a result of STE infiltration on soil were characterized by 16S rRNA gene sequence analysis. Results indicate that microbial communities are different within control soil, STE, and the biomat and that microbes found in STE are not found in the biomat. The development of a stable soil biomat appears to provide the best on-site water treatment or protection for subsequent groundwater interactions of STE. PMID:19304840
Microbial diversity of septic tank effluent and a soil biomat.

PubMed

Tomaras, Jill; Sahl, Jason W; Siegrist, Robert L; Spear, John R

2009-05-01

Microbial diversity of septic tank effluent (STE) and the biomat that is formed as a result of STE infiltration on soil were characterized by 16S rRNA gene sequence analysis. Results indicate that microbial communities are different within control soil, STE, and the biomat and that microbes found in STE are not found in the biomat. The development of a stable soil biomat appears to provide the best on-site water treatment or protection for subsequent groundwater interactions of STE.
Prescreening of microbial populations for the assessment of sequencing potential.

PubMed

Hanning, Irene B; Ricke, Steven C

2011-01-01

Next-generation sequencing (NGS) is a powerful tool that can be utilized to profile and compare microbial populations. By amplifying a target gene present in all bacteria and subsequently sequencing amplicons, the bacteria genera present in the populations can be identified and compared. In some scenarios, little to no difference may exist among microbial populations being compared in which case a prescreening method would be practical to determine which microbial populations would be suitable for further analysis by NGS. Denaturing density-gradient electrophoresis (DGGE) is relatively cheaper than NGS and the data comparing microbial populations are ready to be viewed immediately after electrophoresis. DGGE follows essentially the same initial methodology as NGS by targeting and amplifying the 16S rRNA gene. However, as opposed to sequencing amplicons, DGGE amplicons are analyzed by electrophoresis. By prescreening microbial populations with DGGE, more efficient use of NGS methods can be accomplished. In this chapter, we outline the protocol for DGGE targeting the same gene (16S rRNA) that would be targeted for NGS to compare and determine differences in microbial populations from a wide range of ecosystems.
The First Isolation and Whole Genome Sequencing of Murray Valley Encephalitis Virus from Cerebrospinal Fluid of a Patient with Encephalitis.

PubMed

Russell, Jessica S; Caly, Leon; Kostecki, Renata; McGuinness, Sarah L; Carter, Glen; Bulach, Dieter; Seemann, Torsten; Stinear, Tim P; Baird, Rob; Catton, Mike; Druce, Julian

2018-06-11

Murray Valley Encephalitis virus (MVEV) is a mosquito-borne Flavivirus. Clinical presentation is rare but severe, with a case fatality rate of 15⁻30%. Here we report a case of MVEV from the cerebrospinal fluid (CSF) of a patient in the Northern Territory in Australia. Initial diagnosis was performed using both MVEV-specific real-time, and Pan- Flavivirus conventional, Polymerase Chain Reaction (PCR), with confirmation by Sanger sequencing. Subsequent isolation, the first from CSF, was conducted in Vero cells and the observed cytopathic effect was confirmed by increasing viral titre in the real-time PCR. Isolation allowed for full genome sequencing using the Scriptseq V2 RNASeq library preparation kit. A consensus genome for VIDRL-MVE was generated and phylogenetic analysis identified it as Genotype 2. This is the first reported isolation, and full genome sequencing of MVEV from CSF. It is also the first time Genotype 2 has been identified in humans. As such, this case has significant implications for public health surveillance, epidemiology, and the understanding of MVEV evolution.
[Molecular cloning and characterization in silico of phospholipase A(2) transcript isolated from Lachesis muta peruvian snake venom].

PubMed

Jimenez, Karim L; Zavaleta, Amparo I; Izaguirre, Victor; Yarleque, Armando; Inga, Rosio R

2010-01-01

Isolate and characterize in silico gene phospholipase A(2) (PLA(2)) isolated from Lachesis muta venom of the Peruvian Amazon. Technique RT-PCR from total RNA was using specific primers, the amplified DNA product was inserted into the pGEM vector for subsequent sequencing. By bioinformatic analysis identified an open reading frame of 414 nucleotides that encoded 138 amino acids including a signal peptide of 16 aminoacids, molecular weight and pI were 13,976 kDa and 5.66 respectively. The aminoacid sequence was called Lm-PLA(2)-Peru, contains an aspartate at position 49, this aminoacid in conjunction with other conserved residues such as Tyr-28, Gly-30, Gly-32, His-48, Tyr52, Asp99 are important for enzymatic activity. The comparison with the amino acid sequence data banks showed of similarity between PLA(2) from Lachesis stenophrys (93%) and other PLA(2) snake venoms and over 80% of other sPLA(2) family Viperidae venoms. A phylogenetic analysis showed that Lm-PLA(2)-Peru grouped with other acidic [Asp(49)] sPLA(2) previously isolated from Bothriechis schlegelii venom showing 89 % nucleotide sequence identity. Finally, the computer modeling indicated that enzyme had the characteristic structure of sPLA(2) group II that consisted of three α-helices, a β-wing, a short helix and a calcium-binding loop. The nucleotide sequence corresponding to the first transcript of gene from PLA(2) cloned of Lachesis muta venom, snake from the Peruvian rainforest.
Characterization of a novel ADAM protease expressed by Pneumocystis carinii.

PubMed

Kennedy, Cassie C; Kottom, Theodore J; Limper, Andrew H

2009-08-01

Pneumocystis species are opportunistic fungal pathogens that cause severe pneumonia in immunocompromised hosts. Recent evidence has suggested that unidentified proteases are involved in Pneumocystis life cycle regulation. Proteolytically active ADAM (named for "a disintegrin and metalloprotease") family molecules have been identified in some fungal organisms, such as Aspergillus fumigatus and Schizosaccharomyces pombe, and some have been shown to participate in life cycle regulation. Accordingly, we sought to characterize ADAM-like molecules in the fungal opportunistic pathogen, Pneumocystis carinii (PcADAM). After an in silico search of the P. carinii genomic sequencing project identified a 329-bp partial sequence with homology to known ADAM proteins, the full-length PcADAM sequence was obtained by PCR extension cloning, yielding a final coding sequence of 1,650 bp. Sequence analysis detected the presence of a typical ADAM catalytic active site (HEXXHXXGXXHD). Expression of PcADAM over the Pneumocystis life cycle was analyzed by Northern blot. Southern and contour-clamped homogenous electronic field blot analysis demonstrated its presence in the P. carinii genome. Expression of PcADAM was observed to be increased in Pneumocystis cysts compared to trophic forms. The full-length gene was subsequently cloned and heterologously expressed in Saccharomyces cerevisiae. Purified PcADAMp protein was proteolytically active in casein zymography, requiring divalent zinc. Furthermore, native PcADAMp extracted directly from freshly isolated Pneumocystis organisms also exhibited protease activity. This is the first report of protease activity attributable to a specific, characterized protein in the clinically important opportunistic fungal pathogen Pneumocystis.
Human papillomavirus type 18 variant lineages in United States populations characterized by sequence analysis of LCR-E6, E2, and L1 regions.

PubMed

Arias-Pulido, Hugo; Peyton, Cheri L; Torrez-Martínez, Norah; Anderson, D Nelson; Wheeler, Cosette M

2005-07-20

While HPV 16 variant lineages have been well characterized, the knowledge about HPV 18 variants is limited. In this study, HPV 18 nucleotide variations in the E2 hinge region were characterized by sequence analysis in 47 control and 51 tumor specimens. Fifty of these specimens were randomly selected for sequencing of an LCR-E6 segment and 20 samples representative of LCR-E6 and E2 sequence variants were examined across the L1 region. A total of 2770 nucleotides per HPV 18 variant genome were considered in this study. HPV 18 variant nucleotides were linked among all gene segments analyzed and grouped into three main branches: Asian-American (AA), European (E), and African (Af). These three branches were equally distributed among controls and cases and when stratified by Hispanic and non-Hispanic ethnicities. Among invasive cervical cancer cases, no significant differences in the three HPV variant branches were observed among ethnic groups or when stratified by histopathology (squamous vs. adenocarcinoma). The Af branch showed the greatest nucleotide variability when compared to the HPV 18 reference sequence and was more closely related to HPV 45 than either AA or E branches. Our data also characterize nucleotide and amino acid variations in the L1 capsid gene among HPV 18 variants, which may be relevant to vaccine strategies and subsequent studies of naturally occurring HPV 18 variants. Several novel HPV 18 nucleotide variations were identified in this study.
Relationships between functional genes in Lactobacillus delbrueckii ssp. bulgaricus isolates and phenotypic characteristics associated with fermentation time and flavor production in yogurt elucidated using multilocus sequence typing.

PubMed

Liu, Wenjun; Yu, Jie; Sun, Zhihong; Song, Yuqin; Wang, Xueni; Wang, Hongmei; Wuren, Tuoya; Zha, Musu; Menghe, Bilige; Heping, Zhang

2016-01-01

Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus) is well known for its worldwide application in yogurt production. Flavor production and acid producing are considered as the most important characteristics for starter culture screening. To our knowledge this is the first study applying functional gene sequence multilocus sequence typing technology to predict the fermentation and flavor-producing characteristics of yogurt-producing bacteria. In the present study, phenotypic characteristics of 35 L. bulgaricus strains were quantified during the fermentation of milk to yogurt and during its subsequent storage; these included fermentation time, acidification rate, pH, titratable acidity, and flavor characteristics (acetaldehyde concentration). Furthermore, multilocus sequence typing analysis of 7 functional genes associated with fermentation time, acid production, and flavor formation was done to elucidate the phylogeny and genetic evolution of the same L. bulgaricus isolates. The results showed that strains significantly differed in fermentation time, acidification rate, and acetaldehyde production. Combining functional gene sequence analysis with phenotypic characteristics demonstrated that groups of strains established using genotype data were consistent with groups identified based on their phenotypic traits. This study has established an efficient and rapid molecular genotyping method to identify strains with good fermentation traits; this has the potential to replace time-consuming conventional methods based on direct measurement of phenotypic traits. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
MAGIC database and interfaces: an integrated package for gene discovery and expression.

PubMed

Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H

2004-01-01

The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.
Glaciotectonic deformation and reinterpretation of the Worth Point stratigraphic sequence: Banks Island, NT, Canada

NASA Astrophysics Data System (ADS)

Vaughan, Jessica M.; England, John H.; Evans, David J. A.

2014-05-01

Hill-hole pairs, comprising an ice-pushed hill and associated source depression, cluster in a belt along the west coast of Banks Island, NT. Ongoing coastal erosion at Worth Point, southwest Banks Island, has exposed a section (6 km long and ˜30 m high) through an ice-pushed hill that was transported ˜ 2 km from a corresponding source depression to the southeast. The exposed stratigraphic sequence is polydeformed and comprises folded and faulted rafts of Early Cretaceous and Late Tertiary bedrock, a prominent organic raft, Quaternary glacial sediments, and buried glacial ice. Three distinct structural domains can be identified within the stratigraphic sequence that represent proximal to distal deformation in an ice-marginal setting. Complex thrust sequences, interfering fold-sets, brecciated bedrock and widespread shear structures superimposed on this ice-marginally deformed sequence record subsequent deformation in a subglacial shear zone. Analysis of cross-cutting relationships within the stratigraphic sequence combined with OSL dating indicate that the Worth Point hill-hole pair was deformed during two separate glaciotectonic events. Firstly, ice sheet advance constructed the hill-hole pair and glaciotectonized the strata ice-marginally, producing a proximal to distal deformation sequence. A glacioisostatically forced marine transgression resulted in extensive reworking of the strata and the deposition of a glaciomarine diamict. A readvance during this initial stage redeformed the strata in a subglacial shear zone, overprinting complex deformation structures and depositing a glaciotectonite ˜20 m thick. Outwash channels that incise the subglacially deformed strata record a deglacial marine regression, whereas aggradation of glaciofluvial sand and gravel infilling the channels record a subsequent marine transgression. Secondly, a later, largely non-erosive ice margin overrode Worth Point, deforming only the most surficial units in the section and depositing a capping till. The investigation of the Worth Point stratigraphic sequence provides the first detailed description of the internal architecture of a polydeformed hill-hole pair, and as such provides an insight into the formation and evolution of an enigmatic landform. Notably, the stratigraphic sequence documents ice-marginal and subglacial glaciotectonics in permafrost terrain, as well as regional glacial and relative sea level histories. The reinterpreted stratigraphy fundamentally rejects the long-established paleoenvironmental history of Worth Point that assumed a simple ‘layer-cake’ stratigraphy including the type-site for an organically rich, preglacial interval (Worth Point Fm).
Bioinformatic Workflows for Generating Complete Plastid Genome Sequences-An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade.

PubMed

Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas

2018-06-21

The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Identification of sequence-related amplified polymorphism markers linked to the red leaf trait in ornamental kale (Brassica oleracea L. var. acephala).

PubMed

Wang, Y S; Liu, Z Y; Li, Y F; Zhang, Y; Yang, X F; Feng, H

2013-04-02

Artistic diversiform leaf color is an important agronomic trait that affects the market value of ornamental kale. In the present study, genetic analysis showed that a single-dominant gene, Re (red leaf), determines the red leaf trait in ornamental kale. An F2 population consisting of 500 individuals from the cross of a red leaf double-haploid line 'D05' with a white leaf double-haploid line 'D10' was analyzed for the red leaf trait. By combining bulked segregant analysis and sequence-related amplified polymorphism technology, we identified 3 markers linked to the Re/re locus. A genetic map of the Re locus was constructed using these sequence-related amplified polymorphism markers. Two of the markers, Me8Em4 and Me8Em17, were located on one side of Re/re at distances of 2.2 and 6.4 cM, whereas the other marker, Me9Em11, was located on the other side of Re/re at a distance of 3.7 cM. These markers could be helpful for the subsequent cloning of the red trait gene and marker-assisted selection in ornamental kale breeding programs.
BBMerge – Accurate paired shotgun read merging via overlap

DOE PAGES

Bushnell, Brian; Rood, Jonathan; Singer, Esther

2017-10-26

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highlymore » sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.« less

Use of life course work-family profiles to predict mortality risk among US women.

PubMed

Sabbath, Erika L; Guevara, Ivan Mejía; Glymour, M Maria; Berkman, Lisa F

2015-04-01

We examined relationships between US women's exposure to midlife work-family demands and subsequent mortality risk. We used data from women born 1935 to 1956 in the Health and Retirement Study to calculate employment, marital, and parenthood statuses for each age between 16 and 50 years. We used sequence analysis to identify 7 prototypical work-family trajectories. We calculated age-standardized mortality rates and hazard ratios (HRs) for mortality associated with work-family sequences, with adjustment for covariates and potentially explanatory later-life factors. Married women staying home with children briefly before reentering the workforce had the lowest mortality rates. In comparison, after adjustment for age, race/ethnicity, and education, HRs for mortality were 2.14 (95% confidence interval [CI] = 1.58, 2.90) among single nonworking mothers, 1.48 (95% CI = 1.06, 1.98) among single working mothers, and 1.36 (95% CI = 1.02, 1.80) among married nonworking mothers. Adjustment for later-life behavioral and economic factors partially attenuated risks. Sequence analysis is a promising exposure assessment tool for life course research. This method permitted identification of certain lifetime work-family profiles associated with mortality risk before age 75 years.
Two-color, 30 second microwave-accelerated Metal-Enhanced Fluorescence DNA assays: a new Rapid Catch and Signal (RCS) technology.

PubMed

Dragan, Anatoliy I; Golberg, Karina; Elbaz, Amit; Marks, Robert; Zhang, Yongxia; Geddes, Chris D

2011-03-07

For analyses of DNA fragment sequences in solution we introduce a 2-color DNA assay, utilizing a combination of the Metal-Enhanced Fluorescence (MEF) effect and microwave-accelerated DNA hybridization. The assay is based on a new "Catch and Signal" technology, i.e. the simultaneous specific recognition of two target DNA sequences in one well by complementary anchor-ssDNAs, attached to silver island films (SiFs). It is shown that fluorescent labels (Alexa 488 and Alexa 594), covalently attached to ssDNA fragments, play the role of biosensor recognition probes, demonstrating strong response upon DNA hybridization, locating fluorophores in close proximity to silver NPs, which is ideal for MEF. Subsequently the emission dramatically increases, while the excited state lifetime decreases. It is also shown that 30s microwave irradiation of wells, containing DNA molecules, considerably (~1000-fold) speeds up the highly selective hybridization of DNA fragments at ambient temperature. The 2-color "Catch and Signal" DNA assay platform can radically expedite quantitative analysis of genome DNA sequences, creating a simple and fast bio-medical platform for nucleic acid analysis. Copyright © 2010 Elsevier B.V. All rights reserved.
BBMerge – Accurate paired shotgun read merging via overlap

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bushnell, Brian; Rood, Jonathan; Singer, Esther

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highlymore » sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.« less
Artificial Immune System Approach for Airborne Vehicle Maneuvering

NASA Technical Reports Server (NTRS)

Kaneshige, John T. (Inventor); Krishnakumar, Kalmanje S. (Inventor)

2014-01-01

A method and system for control of a first aircraft relative to a second aircraft. A desired location and desired orientation are estimated for the first aircraft, relative to the second aircraft, at a subsequent time, t=t2, subsequent to the present time, t=t1, where the second aircraft continues its present velocity during a subsequent time interval, t1.ltoreq.t.ltoreq.t2, or takes evasive action. Action command sequences are examined, and an optimal sequence is chosen to bring the first aircraft to the desired location and desired orientation relative to the second aircraft at time t=t2. The method applies to control of combat aircraft and/or of aircraft in a congested airspace.
Melodic Priming of Motor Sequence Performance: The Role of the Dorsal Premotor Cortex.

PubMed

Stephan, Marianne A; Brown, Rachel; Lega, Carlotta; Penhune, Virginia

2016-01-01

The purpose of this study was to determine whether exposure to specific auditory sequences leads to the induction of new motor memories and to investigate the role of the dorsal premotor cortex (dPMC) in this crossmodal learning process. Fifty-two young healthy non-musicians were familiarized with the sound to key-press mapping on a computer keyboard and tested on their baseline motor performance. Each participant received subsequently either continuous theta burst stimulation (cTBS) or sham stimulation over the dPMC and was then asked to remember a 12-note melody without moving. For half of the participants, the contour of the melody memorized was congruent to a subsequently performed, but never practiced, finger movement sequence (Congruent group). For the other half, the melody memorized was incongruent to the subsequent finger movement sequence (Incongruent group). Hearing a congruent melody led to significantly faster performance of a motor sequence immediately thereafter compared to hearing an incongruent melody. In addition, cTBS speeded up motor performance in both groups, possibly by relieving motor consolidation from interference by the declarative melody memorization task. Our findings substantiate recent evidence that exposure to a movement-related tone sequence can induce specific, crossmodal encoding of a movement sequence representation. They further suggest that cTBS over the dPMC may enhance early offline procedural motor skill consolidation in cognitive states where motor consolidation would normally be disturbed by concurrent declarative memory processes. These findings may contribute to a better understanding of auditory-motor system interactions and have implications for the development of new motor rehabilitation approaches using sound and non-invasive brain stimulation as neuromodulatory tools.
Mapping the pericentric heterochromatin by comparative genomic hybridization analysis and chromosome deletions in Drosophila melanogaster

PubMed Central

He, Bing; Caudy, Amy; Parsons, Lance; Rosebrock, Adam; Pane, Attilio; Raj, Sandeep; Wieschaus, Eric

2012-01-01

Heterochromatin represents a significant portion of eukaryotic genomes and has essential structural and regulatory functions. Its molecular organization is largely unknown due to difficulties in sequencing through and assembling repetitive sequences enriched in the heterochromatin. Here we developed a novel strategy using chromosomal rearrangements and embryonic phenotypes to position unmapped Drosophila melanogaster heterochromatic sequence to specific chromosomal regions. By excluding sequences that can be mapped to the assembled euchromatic arms, we identified sequences that are specific to heterochromatin and used them to design heterochromatin specific probes (“H-probes”) for microarray. By comparative genomic hybridization (CGH) analyses of embryos deficient for each chromosome or chromosome arm, we were able to map most of our H-probes to specific chromosome arms. We also positioned sequences mapped to the second and X chromosomes to finer intervals by analyzing smaller deletions with breakpoints in heterochromatin. Using this approach, we were able to map >40% (13.9 Mb) of the previously unmapped heterochromatin sequences assembled by the whole-genome sequencing effort on arm U and arm Uextra to specific locations. We also identified and mapped 110 kb of novel heterochromatic sequences. Subsequent analyses revealed that sequences located within different heterochromatic regions have distinct properties, such as sequence composition, degree of repetitiveness, and level of underreplication in polytenized tissues. Surprisingly, although heterochromatin is generally considered to be transcriptionally silent, we detected region-specific temporal patterns of transcription in heterochromatin during oogenesis and early embryonic development. Our study provides a useful approach to elucidate the molecular organization and function of heterochromatin and reveals region-specific variation of heterochromatin. PMID:22745230
Principles of Quantitative MR Imaging with Illustrated Review of Applicable Modular Pulse Diagrams.

PubMed

Mills, Andrew F; Sakai, Osamu; Anderson, Stephan W; Jara, Hernan

2017-01-01

Continued improvements in diagnostic accuracy using magnetic resonance (MR) imaging will require development of methods for tissue analysis that complement traditional qualitative MR imaging studies. Quantitative MR imaging is based on measurement and interpretation of tissue-specific parameters independent of experimental design, compared with qualitative MR imaging, which relies on interpretation of tissue contrast that results from experimental pulse sequence parameters. Quantitative MR imaging represents a natural next step in the evolution of MR imaging practice, since quantitative MR imaging data can be acquired using currently available qualitative imaging pulse sequences without modifications to imaging equipment. The article presents a review of the basic physical concepts used in MR imaging and how quantitative MR imaging is distinct from qualitative MR imaging. Subsequently, the article reviews the hierarchical organization of major applicable pulse sequences used in this article, with the sequences organized into conventional, hybrid, and multispectral sequences capable of calculating the main tissue parameters of T1, T2, and proton density. While this new concept offers the potential for improved diagnostic accuracy and workflow, awareness of this extension to qualitative imaging is generally low. This article reviews the basic physical concepts in MR imaging, describes commonly measured tissue parameters in quantitative MR imaging, and presents the major available pulse sequences used for quantitative MR imaging, with a focus on the hierarchical organization of these sequences. © RSNA, 2017.
Molecular characterization of an Akabane virus isolate from West Java, Indonesia

PubMed Central

PURNOMO EDI, Suryo; IBRAHIM, Afif; SUKOCO, Rinto; BUNALI, Lukman; TAGUCHI, Masaji; KATO, Tomoko; YANASE, Tohru; SHIRAFUJI, Hiroaki

2017-01-01

We isolated an arbovirus from bovine blood in Indonesia. The arbovirus was obtained from the plasma of a cow showing no clinical symptoms in West Java in February 2014, and was identified as Akabane virus (AKAV) by AKAV-specific RT-PCR and subsequent sequence analysis. Phylogenetic analysis based on partial S segment indicated the AKAV isolate, WJ-1SA/P/2014, was most closely related with two isolates from Israel and Turkey reported in 2001 and 2015, respectively, and that WJ-1SA/P/2014 isolate belongs to AKAV genogroup Ib. This is the first isolation of AKAV from Indonesia. PMID:28302930
Displacement measurement with nanoscale resolution using a coded micro-mark and digital image correlation

NASA Astrophysics Data System (ADS)

Huang, Wei; Ma, Chengfu; Chen, Yuhang

2014-12-01

A method for simple and reliable displacement measurement with nanoscale resolution is proposed. The measurement is realized by combining a common optical microscopy imaging of a specially coded nonperiodic microstructure, namely two-dimensional zero-reference mark (2-D ZRM), and subsequent correlation analysis of the obtained image sequence. The autocorrelation peak contrast of the ZRM code is maximized with well-developed artificial intelligence algorithms, which enables robust and accurate displacement determination. To improve the resolution, subpixel image correlation analysis is employed. Finally, we experimentally demonstrate the quasi-static and dynamic displacement characterization ability of a micro 2-D ZRM.
[Differentiation of geographic biovariants of smallpox virus by PCR].

PubMed

Babkin, I V; Babkina, I N

2010-01-01

Comparative analysis of amino acid and nucleotides sequences of ORFs located in extended segments of the terminal variable regions in variola virus genome detected a promising locus for viral genotyping according to the geographic origin. This is ORF O1L of VARV. The primers were calculated for synthesis of this ORF fragment by PCR, which makes it possible to distinguish South America-Western Africa genotype from other VARV strains. Subsequent RFLP analysis reliably differentiated Asian strains from African strains (except Western Africa isolates). This method has been tested using 16 VARV strains from various geographic regions. The developed approach is simple, fast and reliable.
Phenotypic and genotypic characterisation of drug-resistant Plasmodium vivax

PubMed Central

Price, Ric N.; Auburn, Sarah; Marfurt, Jutta; Cheng, Qin

2015-01-01

In this review we present recent developments in the analysis of Plasmodium vivax clinical trials and ex vivo drug-susceptibility assays, as well approaches currently being used to identify molecular markers of drug resistance. Clinical trials incorporating the measurement of in vivo drug concentrations and parasite clearance times are needed to detect early signs of resistance. Analysis of P. vivax growth dynamics ex vivo have defined the criteria for acceptable assay thresholds for drug susceptibility testing, and their subsequent interpretation. Genotyping and next-generation sequencing studies in P. vivax field isolates are set to transform our understanding of the molecular mechanisms of drug resistance. PMID:23044287
Non-coding RNAs in virology: an RNA genomics approach.

PubMed

Isaac, Christopher; Patel, Trushar R; Zovoilis, Athanasios

2018-04-01

Advances in sequencing technologies and bioinformatic analysis techniques have greatly improved our understanding of various classes of RNAs and their functions. Despite not coding for proteins, non-coding RNAs (ncRNAs) are emerging as essential biomolecules fundamental for cellular functions and cell survival. Interestingly, ncRNAs produced by viruses not only control the expression of viral genes, but also influence host cell regulation and circumvent host innate immune response. Correspondingly, ncRNAs produced by the host genome can play a key role in host-virus interactions. In this article, we will first discuss a number of types of viral and mammalian ncRNAs associated with viral infections. Subsequently, we also describe the new possibilities and opportunities that RNA genomics and next-generation sequencing technologies provide for studying ncRNAs in virology.
Genetic variability in isolates of Chromobacterium violaceum from pulmonary secretion, water, and soil.

PubMed

Santini, A C; Magalhães, J T; Cascardo, J C M; Corrêa, R X

2016-04-28

Chromobacterium violaceum is a free-living Gram-negative bacillus usually found in the water and soil in tropical regions, which causes infections in humans. Chromobacteriosis is characterized by rapid dissemination and high mortality. The aim of this study was to detect the genetic variability among C. violaceum type strain ATCC 12472, and seven isolates from the environment and one from a pulmonary secretion from a chromobacteriosis patient from Ilhéus, Bahia. The molecular characterization of all samples was performed by polymerase chain reaction (PCR) sequencing and 16S rDNA analysis. Primers specific for two ATCC 12472 pathogenicity genes, hilA and yscD, as well as random amplified polymorphic DNA (RAPD), were used for PCR amplification and comparative sequencing of the products. For a more specific approach, the PCR products of 16S rDNA were digested with restriction enzymes. Seven of the samples, including type-strain ATCC 12472, were amplified by the hilA primers; these were subsequently sequenced. Gene yscD was amplified only in type-strain ATCC 12472. MspI and AluI digestion revealed 16S rDNA polymorphisms. This data allowed the generation of a dendogram for each analysis. The isolates of C. violaceum have variability in random genomic regions demonstrated by RAPD. Also, these isolates have variability in pathogenicity genes, as demonstrated by sequencing and restriction enzyme digestion.
Corona cell RNA sequencing from individual oocytes revealed transcripts and pathways linked to euploid oocyte competence and live birth.

PubMed

Parks, Jason C; Patton, Alyssa L; McCallie, Blair R; Griffin, Darren K; Schoolcraft, William B; Katz-Jaffe, Mandy G

2016-05-01

Corona cells surround the oocyte and maintain a close relationship through transzonal processes and gap junctions, and may be used to assess oocyte competence. In this study, the corona cell transcriptome of individual cumulus oocyte complexes (COCs) was investigated. Isolated corona cells were collected from COCs that developed into euploid blastocysts and were transferred in a subsequent frozen embryo transfer. Ten corona cell samples underwent RNA-sequencing to generate unique gene expression profiles. Live birth was compared with negative implantation after the transfer of a euploid blastocyst using bioinformatics and statistical analysis. Individual corona cell samples produced a mean of 21.2 million sequence reads, and 307 differentially expressed transcrpits (P < 0.05; fold change ≥ 2). Enriched pathway analysis showed Wnt signalling, mitogen-activated protein kinases signalling, focal adhesion and tricarboxylic acid cycle to be affected by implantation outcome. The Wnt/beta-catenin signalling pathway, including genes APC, AXIN and GSK3B, were independently validated by real-time quantitative reverse transcription. Individual, corona cell transcriptome was successfully generated using RNA-sequencing. Key genes and signalling pathways were identified in association with implantation outcome after the transfer of a euploid blastocyst in a frozen embryo transfer. These data could provide novel biomarkers for the non-invasive assessment of embryo viability. Copyright © 2016 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
Evolution of HBV S-gene in the backdrop of HDV co-infection.

PubMed

Baig, Samina; Abidi, Syed H; Azam, Zahid; Majid, Shahid; Khan, Saeed; Khanani, Muhammad R; Ali, Syed

2018-04-16

HBV-HDV co-infected people have a higher chance of developing cirrhosis, fulminant hepatitis, and hepatocellular carcinoma (HCC) compared to those infected only with HBV. The present study was conducted to investigate HBV genotypes and phylogeny among HBV mono-infected and HBV-HDV co-infected patients, as well as analyze mutations in the surface gene of HBV in mono-infected and co-infected patients. A total of 100 blood samples (50 co-infected with HBV and HDV, and 50 mono-infected with HBV only) were collected for this study. HBV DNA was extracted from patient sera and partial surface antigen gene was amplified from HBV genome using polymerase chain reaction. HBV S gene was sequenced from 49 mono-infected and 36 co-infected patients and analyzed to identify HBV genotypes and phylogenetic patterns. Subsequently, HBV S amino acid sequences were analyzed for mutational differences between sequences from mono- and co-infected patients. HBV genotype D was predominantly found in both mono-infected as well as co-infected patients. Phylogenetic analysis showed the divergence of HBV sequences, between mono- and co-infected patients, into two distinct clusters. HBV S gene mutation analysis revealed certain mutations in HBV-HDV co-infected subjects to be distinct from those found in mono-infected patients. This might indicate the evolution of HBV S gene under selection pressures generated from HDV coinfection. © 2018 Wiley Periodicals, Inc.
Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

PubMed Central

Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

2016-01-01

Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely available under a GNU General Public License version 3.0 (GPLv3) at https://github.com/tadkeys/tabsat/ and http://demo.platomics.com/. PMID:27467908
Study of time-lapse processing for dynamic hydrologic conditions. [electronic satellite image analysis console for Earth Resources Technology Satellites imagery

NASA Technical Reports Server (NTRS)

Serebreny, S. M.; Evans, W. E.; Wiegman, E. J.

1974-01-01

The usefulness of dynamic display techniques in exploiting the repetitive nature of ERTS imagery was investigated. A specially designed Electronic Satellite Image Analysis Console (ESIAC) was developed and employed to process data for seven ERTS principal investigators studying dynamic hydrological conditions for diverse applications. These applications include measurement of snowfield extent and sediment plumes from estuary discharge, Playa Lake inventory, and monitoring of phreatophyte and other vegetation changes. The ESIAC provides facilities for storing registered image sequences in a magnetic video disc memory for subsequent recall, enhancement, and animated display in monochrome or color. The most unique feature of the system is the capability to time lapse the imagery and analytic displays of the imagery. Data products included quantitative measurements of distances and areas, binary thematic maps based on monospectral or multispectral decisions, radiance profiles, and movie loops. Applications of animation for uses other than creating time-lapse sequences are identified. Input to the ESIAC can be either digital or via photographic transparencies.
Analyses of pig genomes provide insight into porcine demography and evolution

PubMed Central

Groenen, Martien A. M.; Archibald, Alan L.; Uenishi, Hirohide; Tuggle, Christopher K.; Takeuchi, Yasuhiro; Rothschild, Max F.; Rogel-Gaillard, Claire; Park, Chankyu; Milan, Denis; Megens, Hendrik-Jan; Li, Shengting; Larkin, Denis M.; Kim, Heebal; Frantz, Laurent A. F.; Caccamo, Mario; Ahn, Hyeonju; Aken, Bronwen L.; Anselmo, Anna; Anthon, Christian; Auvil, Loretta; Badaoui, Bouabid; Beattie, Craig W.; Bendixen, Christian; Berman, Daniel; Blecha, Frank; Blomberg, Jonas; Bolund, Lars; Bosse, Mirte; Botti, Sara; Bujie, Zhan; Bystrom, Megan; Capitanu, Boris; Silva, Denise Carvalho; Chardon, Patrick; Chen, Celine; Cheng, Ryan; Choi, Sang-Haeng; Chow, William; Clark, Richard C.; Clee, Christopher; Crooijmans, Richard P. M. A.; Dawson, Harry D.; Dehais, Patrice; De Sapio, Fioravante; Dibbits, Bert; Drou, Nizar; Du, Zhi-Qiang; Eversole, Kellye; Fadista, João; Fairley, Susan; Faraut, Thomas; Faulkner, Geoffrey J.; Fowler, Katie E.; Fredholm, Merete; Fritz, Eric; Gilbert, James G. R.; Giuffra, Elisabetta; Gorodkin, Jan; Griffin, Darren K.; Harrow, Jennifer L.; Hayward, Alexander; Howe, Kerstin; Hu, Zhi-Liang; Humphray, Sean J.; Hunt, Toby; Hornshøj, Henrik; Jeon, Jin-Tae; Jern, Patric; Jones, Matthew; Jurka, Jerzy; Kanamori, Hiroyuki; Kapetanovic, Ronan; Kim, Jaebum; Kim, Jae-Hwan; Kim, Kyu-Won; Kim, Tae-Hun; Larson, Greger; Lee, Kyooyeol; Lee, Kyung-Tai; Leggett, Richard; Lewin, Harris A.; Li, Yingrui; Liu, Wansheng; Loveland, Jane E.; Lu, Yao; Lunney, Joan K.; Ma, Jian; Madsen, Ole; Mann, Katherine; Matthews, Lucy; McLaren, Stuart; Morozumi, Takeya; Murtaugh, Michael P.; Narayan, Jitendra; Nguyen, Dinh Truong; Ni, Peixiang; Oh, Song-Jung; Onteru, Suneel; Panitz, Frank; Park, Eung-Woo; Park, Hong-Seog; Pascal, Geraldine; Paudel, Yogesh; Perez-Enciso, Miguel; Ramirez-Gonzalez, Ricardo; Reecy, James M.; Zas, Sandra Rodriguez; Rohrer, Gary A.; Rund, Lauretta; Sang, Yongming; Schachtschneider, Kyle; Schraiber, Joshua G.; Schwartz, John; Scobie, Linda; Scott, Carol; Searle, Stephen; Servin, Bertrand; Southey, Bruce R.; Sperber, Goran; Stadler, Peter; Sweedler, Jonathan V.; Tafer, Hakim; Thomsen, Bo; Wali, Rashmi; Wang, Jian; Wang, Jun; White, Simon; Xu, Xun; Yerle, Martine; Zhang, Guojie; Zhang, Jianguo; Zhang, Jie; Zhao, Shuhong; Rogers, Jane; Churcher, Carol; Schook, Lawrence B.

2013-01-01

For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ~1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model. PMID:23151582
Genetic analysis of PAX3 for diagnosis of Waardenburg syndrome type I.

PubMed

Matsunaga, Tatsuo; Mutai, Hideki; Namba, Kazunori; Morita, Noriko; Masuda, Sawako

2013-04-01

PAX3 genetic analysis increased the diagnostic accuracy for Waardenburg syndrome type I (WS1). Analysis of the three-dimensional (3D) structure of PAX3 helped verify the pathogenicity of a missense mutation, and multiple ligation-dependent probe amplification (MLPA) analysis of PAX3 increased the sensitivity of genetic diagnosis in patients with WS1. Clinical diagnosis of WS1 is often difficult in individual patients with isolated, mild, or non-specific symptoms. The objective of the present study was to facilitate the accurate diagnosis of WS1 through genetic analysis of PAX3 and to expand the spectrum of known PAX3 mutations. In two Japanese families with WS1, we conducted a clinical evaluation of symptoms and genetic analysis, which involved direct sequencing, MLPA analysis, quantitative PCR of PAX3, and analysis of the predicted 3D structure of PAX3. The normal-hearing control group comprised 92 subjects who had normal hearing according to pure tone audiometry. In one family, direct sequencing of PAX3 identified a heterozygous mutation, p.I59F. Analysis of PAX3 3D structures indicated that this mutation distorted the DNA-binding site of PAX3. In the other family, MLPA analysis and subsequent quantitative PCR detected a large, heterozygous deletion spanning 1759-2554 kb that eliminated 12-18 genes including a whole PAX3 gene.
An investigation of the role of current and future remote sensing data systems in numerical meteorology

NASA Technical Reports Server (NTRS)

Diak, George R.; Smith, William L.

1993-01-01

The goals of this research endeavor have been to develop a flexible and relatively complete framework for the investigation of current and future satellite data sources in numerical meteorology. In order to realistically model how satellite information might be used for these purposes, it is necessary that Observing System Simulation Experiments (OSSEs) be as complete as possible. It is therefore desirable that these experiments simulate in entirety the sequence of steps involved in bringing satellite information from the radiance level through product retrieval to a realistic analysis and forecast sequence. In this project we have worked to make this sequence realistic by synthesizing raw satellite data from surrogate atmospheres, deriving satellite products from these data and subsequently producing analyses and forecasts using the retrieved products. The accomplishments made in 1991 are presented. The emphasis was on examining atmospheric soundings and microphysical products which we expect to produce with the launch of the Advanced Microwave Sounding Unit (AMSU), slated for flight in mid 1994.

GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

PubMed Central

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2016-01-01

Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905
Bacteria of an anaerobic 1,2-dichloropropane-dechlorinating mixed culture are phylogenetically related to those of other anaerobic dechlorinating consortia.

PubMed

Schlötelburg, C; von Wintzingerode, F; Hauck, R; Hegemann, W; Göbel, U B

2000-07-01

A 16S-rDNA-based molecular study was performed to determine the bacterial diversity of an anaerobic, 1,2-dichloropropane-dechlorinating bioreactor consortium derived from sediment of the River Saale, Germany. Total community DNA was extracted and bacterial 16S rRNA genes were subsequently amplified using conserved primers. A clone library was constructed and analysed by sequencing the 16S rDNA inserts of randomly chosen clones followed by dot blot hybridization with labelled polynucleotide probes. The phylogenetic analysis revealed significant sequence similarities of several as yet uncultured bacterial species in the bioreactor to those found in other reductively dechlorinating freshwater consortia. In contrast, no close relationship was obtained with as yet uncultured bacteria found in reductively dechlorinating consortia derived from marine habitats. One rDNA clone showed >97% sequence similarity to Dehalobacter species, known for reductive dechlorination of tri- and tetrachloroethene. These results suggest that reductive dechlorination in microbial freshwater habitats depends upon a specific bacterial community structure.
[Influence of antisense RNA and sequences of viral transactivators traps on RNA synthesis of HTLV-1 virus].

PubMed

Borisenko, A S; Kotus, E V; Kaloshin, A A

2008-01-01

Significant number of scientific publications devoted to inhibition of viral replication by antisense RNA (asRNA) genes shows that this approach is useful for gene therapy of viral infections. To investigate the possibility of suppression of HTLV-1 virus reproduction by asRNA we constructed recombinant plasmids containing asRNA genes against U3 long terminal repeats region and X gene under the control of promoter of myeloproliferative sarcoma virus (MPSV) or without such promoter. Using stable calcium-phosphate transfection method with subsequent selection in the presence of G-418, RaHOS line-based cell clones carrying both asRNA genes and sequences able to bind HTLV-1 transactivator proteins (i.e. "traps" of viral transactivators, TVT) were obtained. Data from dot-hybridization analysis of viral RNA extracted from RaHOS cell clones showed that TVT sequences are able to suppress the viral RNA synthesis on 90% and asRNA against X gene synthesis--on 50%.
Exome sequence reveals mutations in CoA synthase as a cause of neurodegeneration with brain iron accumulation.

PubMed

Dusi, Sabrina; Valletta, Lorella; Haack, Tobias B; Tsuchiya, Yugo; Venco, Paola; Pasqualato, Sebastiano; Goffrini, Paola; Tigano, Marco; Demchenko, Nikita; Wieland, Thomas; Schwarzmayr, Thomas; Strom, Tim M; Invernizzi, Federica; Garavaglia, Barbara; Gregory, Allison; Sanford, Lynn; Hamada, Jeffrey; Bettencourt, Conceição; Houlden, Henry; Chiapparini, Luisa; Zorzi, Giovanna; Kurian, Manju A; Nardocci, Nardo; Prokisch, Holger; Hayflick, Susan; Gout, Ivan; Tiranti, Valeria

2014-01-02

Neurodegeneration with brain iron accumulation (NBIA) comprises a clinically and genetically heterogeneous group of disorders with progressive extrapyramidal signs and neurological deterioration, characterized by iron accumulation in the basal ganglia. Exome sequencing revealed the presence of recessive missense mutations in COASY, encoding coenzyme A (CoA) synthase in one NBIA-affected subject. A second unrelated individual carrying mutations in COASY was identified by Sanger sequence analysis. CoA synthase is a bifunctional enzyme catalyzing the final steps of CoA biosynthesis by coupling phosphopantetheine with ATP to form dephospho-CoA and its subsequent phosphorylation to generate CoA. We demonstrate alterations in RNA and protein expression levels of CoA synthase, as well as CoA amount, in fibroblasts derived from the two clinical cases and in yeast. This is the second inborn error of coenzyme A biosynthesis to be implicated in NBIA. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Analysis of the Type IV Fimbrial-Subunit Gene fimA of Xanthomonas hyacinthi: Application in PCR-Mediated Detection of Yellow Disease in Hyacinths

PubMed Central

van Doorn, J.; Hollinger, T. C.; Oudega, B.

2001-01-01

A sensitive and specific detection method was developed for Xanthomonas hyacinthi; this method was based on amplification of a subsequence of the type IV fimbrial-subunit gene fimA from strain S148. The fimA gene was amplified by PCR with degenerate DNA primers designed by using the N-terminal and C-terminal amino acid sequences of trypsin fragments of FimA. The nucleotide sequence of fimA was determined and compared with the nucleotide sequences coding for the fimbrial subunits in other type IV fimbria-producing bacteria, such as Xanthomonas campestris pv. vesicatoria, Neisseria gonorrhoeae, and Moraxella bovis. In a PCR internal primers JAAN and JARA, designed by using the nucleotide sequences of the variable central and C-terminal region of fimA, amplified a 226-bp DNA fragment in all X. hyacinthi isolates. This PCR was shown to be pathovar specific, as assessed by testing 71 Xanthomonas pathovars and bacterial isolates belonging to other genera, such as Erwinia and Pseudomonas. Southern hybridization experiments performed with the labelled 226-bp DNA amplicon as a probe suggested that there is only one structural type IV fimbrial-gene cluster in X. hyacinthi. Only two Xanthomonas translucens pathovars cross-reacted weakly in PCR. Primers amplifying a subsequence of the fimA gene of X. campestris pv. vesicatoria (T. Ojanen-Reuhs, N. Kalkkinen, B. Westerlund-Wikström, J. van Doorn, K. Haahtela, E.-L. Nurmiaho-Lassila, K. Wengelink, U. Bonas, and T. K. Korhonen, J. Bacteriol. 179: 1280–1290, 1997) were shown to be pathovar specific, indicating that the fimbrial-subunit sequences are more generally applicable in xanthomonads for detection purposes. Under laboratory conditions, approximately 1,000 CFU of X. hyacinthi per ml could be detected. In inoculated leaves of hyacinths the threshold was 5,000 CFU/ml. The results indicated that infected hyacinths with early symptoms could be successfully screened for X. hyacinthi with PCR. PMID:11157222
HERV-W group evolutionary history in non-human primates: characterization of ERV-W orthologs in Catarrhini and related ERV groups in Platyrrhini.

PubMed

Grandi, Nicole; Cadeddu, Marta; Blomberg, Jonas; Mayer, Jens; Tramontano, Enzo

2018-01-19

The genomes of all vertebrates harbor remnants of ancient retroviral infections, having affected the germ line cells during the last 100 million years. These sequences, named Endogenous Retroviruses (ERVs), have been transmitted to the offspring in a Mendelian way, being relatively stable components of the host genome even long after their exogenous counterparts went extinct. Among human ERVs (HERVs), the HERV-W group is of particular interest for our physiology and pathology. A HERV-W provirus in locus 7q21.2 has been coopted during evolution to exert an essential role in placenta, and the group expression has been tentatively linked to Multiple Sclerosis and other diseases. Following up on a detailed analysis of 213 HERV-W insertions in the human genome, we now investigated the ERV-W group genomic spread within primate lineages. We analyzed HERV-W orthologous loci in the genome sequences of 12 non-human primate species belonging to Simiiformes (parvorders Catarrhini and Platyrrhini), Tarsiiformes and to the most primitive Prosimians. Analysis of HERV-W orthologous loci in non-human Catarrhini primates revealed species-specific insertions in the genomes of Chimpanzee (3), Gorilla (4), Orangutan (6), Gibbon (2) and especially Rhesus Macaque (66). Such sequences were acquired in a retroviral fashion and, in the majority of cases, by L1-mediated formation of processed pseudogenes. There were also a number of LTR-LTR homologous recombination events that occurred subsequent to separation of Catarrhini sub-lineages. Moreover, we retrieved 130 sequences in Marmoset and Squirrel Monkeys (family Cebidae, Platyrrhini parvorder), identified as ERV1-1_CJa based on RepBase annotations, which appear closely related to the ERV-W group. Such sequences were also identified in Atelidae and Pitheciidae, representative of the other Platyrrhini families. In contrast, no ERV-W-related sequences were found in genome sequence assemblies of Tarsiiformes and Prosimians. Overall, our analysis now provides a detailed picture of the ERV-W sequences colonization of the primate lineages genomes, revealing the exact dynamics of ERV-W locus formations as well as novel insights into the evolution and origin of the group.
A phylogenetic analysis using full-length viral genomes of South American dengue serotype 3 in consecutive Venezuelan outbreaks reveals novel NS5 mutation

PubMed Central

Schmidt, DJ; Pickett, BE; Camacho, D; Comach, G; Xhaja, K; Lennon, NJ; Rizzolo, K; de Bosch, N; Becerra, A; Nogueira, ML; Mondini, A; da Silva, EV; Vasconcelos, PF; Muñoz-Jordán, JL; Santiago, GA; Ocazionez, R; Gehrke, L; Lefkowitz, EJ; Birren, BW; Henn, MR; Bosch, I

2013-01-01

Dengue virus currently causes 50-100 million infections annually. Comprehensive knowledge about the evolution of Dengue in response to selection pressure is currently unavailable, but would greatly enhance vaccine design efforts. In the current study, we sequenced 187 new dengue virus serotype 3(DENV-3) genotype III whole genomes isolated from Asia and the Americas. We analyzed them together with previously-sequenced isolates to gain a more detailed understanding of the evolutionary adaptations existing in this prevalent American serotype. In order to analyze the phylogenetic dynamics of DENV-3 during outbreak periods; we incorporated datasets of 48 and 11 sequences spanning two major outbreaks in Venezuela during 2001 and 2007-2008 respectively. Our phylogenetic analysis of newly sequenced viruses shows that subsets of genomes cluster primarily by geographic location, and secondarily by time of virus isolation. DENV-3 genotype III sequences from Asia are significantly divergent from those from the Americas due to their geographical separation and subsequent speciation. We measured amino acid variation for the E protein by calculating the Shannon entropy at each position between Asian and American genomes. We found a cluster of 7 amino acid substitutions having high variability within E protein domain III, which has previously been implicated in serotype-specific neutralization escape mutants. No novel mutations were found in the E protein of sequences isolated during either Venezuelan outbreak. Shannon entropy analysis of the NS5 polymerase mature protein revealed that a G374E mutation, in a region that contributes to interferon resistance in other flaviviruses by interfering with JAK-STAT signaling was present in both the Asian and American sequences from the 2007-2008 Venezuelan outbreak, but was absent in the sequences from the 2001 Venezuelan outbreak. In addition to E, several NS5 amino acid changes were unique to the 2007-2008 epidemic in Venezuela and may give additional insight into the adaptive response of DENV-3 at the population level. PMID:21964598
New tyrosinase inhibitory decapeptide: Molecular insights into the role of tyrosine residues.

PubMed

Ochiai, Akihito; Tanaka, Seiya; Imai, Yuta; Yoshida, Hisashi; Kanaoka, Takumi; Tanaka, Takaaki; Taniguchi, Masayuki

2016-06-01

Tyrosinase, a rate-limiting enzyme in melanin biosynthesis, catalyzes the hydroxylation of l-tyrosine to 3,4-dihydroxy-l-phenylalanine (l-dopa) (monophenolase reaction) and the subsequent oxidation of l-dopa to l-dopaquinone (diphenolase reaction). Thus, tyrosinase inhibitors have been proposed as skin-lightening agents; however, many of the existing inhibitors cannot be widely used in the cosmetic industry due to their high cytotoxicity and instability. On the other hand, some tyrosinase inhibitory peptides have been reported as safe. In this study, we found that the peptide TH10, which has a similar sequence to the characterized inhibitory peptide P4, strongly inhibits the monophenolase reaction with a half-maximal inhibitory concentration of 102 μM. Seven of the ten amino acid residues in TH10 were identical to P4; however, TH10 possesses one N-terminal tyrosine, whereas P4 contains three tyrosine residues located at its N-terminus, center, and C-terminus. Subsequent analysis using sequence-shuffled variants indicated that the tyrosine residues located at the N-terminus and center of P4 have little to no contribution to its inhibitory activity. Furthermore, docking simulation analysis of these peptides with mushroom tyrosinase demonstrated that the active tyrosine residue was positioned close to copper ions, suggesting that TH10 and P4 bind to tyrosinase as a substrate analogue. Copyright © 2015 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Genetic diversity and geographical structure of the pitcher plant Nepenthes vieillardii in New Caledonia: A chloroplast DNA haplotype analysis.

PubMed

Kurata, Kaoruko; Jaffré, Tanguy; Setoguchi, Hiroaki

2008-12-01

Among the many species that grow in New Caledonia, the pitcher plant Nepenthes vieillardii (Nepenthaceae) has a high degree of morphological variation. In this study, we present the patterns of genetic differentiation of pitcher plant populations based on chloroplast DNA haplotype analysis using the sequences of five spacers. We analyzed 294 samples from 16 populations covering the entire range of the species, using 4660 bp of sequence. Our analysis identified 17 haplotypes, including one that is widely distributed across the islands, as well as regional and private haplotypes. The greatest haplotype diversity was detected on the eastern coast of the largest island and included several private haplotypes, while haplotype diversity was low in the southern plains region. The parsimony network analysis of the 17 haplotypes suggested that the genetic divergence is the result of long-term isolation of individual populations. Results from a spatial analysis of molecular variance and a cluster analysis suggest that the plants once covered the entire serpentine area of New Caledonia and that subsequent regional fragmentation resulted in the isolation of each population and significantly restricted seed flow. This isolation may have been an important factor in the development of the morphological and genetic variation among pitcher plants in New Caledonia.
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.

PubMed

Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia

2017-03-14

Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Construction and Analysis of Functional Networks in the Gut Microbiome of Type 2 Diabetes Patients.

PubMed

Li, Lianshuo; Wang, Zicheng; He, Peng; Ma, Shining; Du, Jie; Jiang, Rui

2016-10-01

Although networks of microbial species have been widely used in the analysis of 16S rRNA sequencing data of a microbiome, the construction and analysis of a complete microbial gene network are in general problematic because of the large number of microbial genes in metagenomics studies. To overcome this limitation, we propose to map microbial genes to functional units, including KEGG orthologous groups and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) orthologous groups, to enable the construction and analysis of a microbial functional network. We devised two statistical methods to infer pairwise relationships between microbial functional units based on a deep sequencing dataset of gut microbiome from type 2 diabetes (T2D) patients as well as healthy controls. Networks containing such functional units and their significant interactions were constructed subsequently. We conducted a variety of analyses of global properties, local properties, and functional modules in the resulting functional networks. Our data indicate that besides the observations consistent with the current knowledge, this study provides novel biological insights into the gut microbiome associated with T2D. Copyright © 2016. Production and hosting by Elsevier Ltd.
Next-generation sequencing: the future of molecular genetics in poultry production and food safety.

PubMed

Diaz-Sanchez, S; Hanning, I; Pendleton, Sean; D'Souza, Doris

2013-02-01

The era of molecular biology and automation of the Sanger chain-terminator sequencing method has led to discovery and advances in diagnostics and biotechnology. The Sanger methodology dominated research for over 2 decades, leading to significant accomplishments and technological improvements in DNA sequencing. Next-generation high-throughput sequencing (HT-NGS) technologies were developed subsequently to overcome the limitations of this first generation technology that include higher speed, less labor, and lowered cost. Various platforms developed include sequencing-by-synthesis 454 Life Sciences, Illumina (Solexa) sequencing, SOLiD sequencing (among others), and the Ion Torrent semiconductor sequencing technologies that use different detection principles. As technology advances, progress made toward third generation sequencing technologies are being reported, which include Nanopore Sequencing and real-time monitoring of PCR activity through fluorescent resonant energy transfer. The advantages of these technologies include scalability, simplicity, with increasing DNA polymerase performance and yields, being less error prone, and even more economically feasible with the eventual goal of obtaining real-time results. These technologies can be directly applied to improve poultry production and enhance food safety. For example, sequence-based (determination of the gut microbial community, genes for metabolic pathways, or presence of plasmids) and function-based (screening for function such as antibiotic resistance, or vitamin production) metagenomic analysis can be carried out. Gut microbialflora/communities of poultry can be sequenced to determine the changes that affect health and disease along with efficacy of methods to control pathogenic growth. Thus, the purpose of this review is to provide an overview of the principles of these current technologies and their potential application to improve poultry production and food safety as well as public health.
An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins.

PubMed

Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C; Morris, John H; Ferrin, Thomas E; Poole, Leslie B; Fetrow, Jacquelyn S

2017-02-01

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

PubMed Central

Babbitt, Patricia C.; Ferrin, Thomas E.

2017-01-01

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially—MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method’s novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences. PMID:28187133
G-CNV: A GPU-Based Tool for Preparing Data to Detect CNVs with Read-Depth Methods.

PubMed

Manconi, Andrea; Manca, Emanuele; Moscatelli, Marco; Gnocchi, Matteo; Orro, Alessandro; Armano, Giuliano; Milanesi, Luciano

2015-01-01

Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals.
Integrated whole-genome and transcriptome sequence analysis reveals the genetic characteristics of a riboflavin-overproducing Bacillus subtilis.

PubMed

Wang, Guanglu; Shi, Ting; Chen, Tao; Wang, Xiaoyue; Wang, Yongcheng; Liu, Dingyu; Guo, Jiaxin; Fu, Jing; Feng, Lili; Wang, Zhiwen; Zhao, Xueming

2018-06-02

Commercial riboflavin production with Bacillus subtilis has been developed by combining rational and classical strain development for almost two decades, but how an improved riboflavin producer can be created rationally is still not completely understood. In this study, we demonstrate the combined use of integrated genomic and transcriptomic analysis of the genetic basis for riboflavin over-production in B. subtilis. This methodology succeeded in discerning the positive mutations in the mutagenesis derived riboflavin producer B. subtilis 24/pMX45 through whole-genome sequencing and transcriptome sequencing. These included RibC (G199D), ribD + (G+39A), PurA (P242L), CcpN(A44S), YvrH (R222Q) and two nonsense mutations YhcF (R90*) and YwaA (Q68*). Reintroducing these specific mutations into the wild-type strain recovered the riboflavin overproduction phenotype and subsequent metabolic engineering greatly improved riboflavin production, achieving an up to 3.4-fold increase of the riboflavin titer over the sequenced producer. A novel mutation, YvrH (R222Q), involved in a typical two-component regulatory system deregulated the purine de novo synthesis pathway and increased the pool of intracellular purine metabolites, which in turn increased riboflavin production. Taken together, we present a case study of combining genome and transcriptome analysis to elucidate the genetic underpinnings of a complex cellular property, which enabled the transfer of beneficial mutations to engineer a reference strain into an overproducer. Copyright © 2018 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
Asian affinities and continental radiation of the four founding Native American mtDNAs.

PubMed Central

Torroni, A; Schurr, T G; Cabell, M F; Brown, M D; Neel, J V; Larsen, M; Smith, D G; Vullo, C M; Wallace, D C

1993-01-01

The mtDNA variation of 321 individuals from 17 Native American populations was examined by high-resolution restriction endonuclease analysis. All mtDNAs were amplified from a variety of sources by using PCR. The mtDNA of a subset of 38 of these individuals was also analyzed by D-loop sequencing. The resulting data were combined with previous mtDNA data from five other Native American tribes, as well as with data from a variety of Asian populations, and were used to deduce the phylogenetic relationships between mtDNAs and to estimate sequence divergences. This analysis revealed the presence of four haplotype groups (haplogroups A, B, C, and D) in the Amerind, but only one haplogroup (A) in the Na-Dene, and confirmed the independent origins of the Amerinds and the Na-Dene. Further, each haplogroup appeared to have been founded by a single mtDNA haplotype, a result which is consistent with a hypothesized founder effect. Most of the variation within haplogroups was tribal specific, that is, it occurred as tribal private polymorphisms. These observations suggest that the process of tribalization began early in the history of the Amerinds, with relatively little intertribal genetic exchange occurring subsequently. The sequencing of 341 nucleotides in the mtDNA D-loop revealed that the D-loop sequence variation correlated strongly with the four haplogroups defined by restriction analysis, and it indicated that the D-loop variation, like the haplotype variation, arose predominantly after the migration of the ancestral Amerinds across the Bering land bridge. Images Figure 4 PMID:7688932
Molecular cloning, sequence analysis and homology modeling of the first caudata amphibian antifreeze-like protein in axolotl (Ambystoma mexicanum).

PubMed

Zhang, Songyan; Gao, Jiuxiang; Lu, Yiling; Cai, Shasha; Qiao, Xue; Wang, Yipeng; Yu, Haining

2013-08-01

Antifreeze proteins (AFPs) refer to a class of polypeptides that are produced by certain vertebrates, plants, fungi, and bacteria and which permit their survival in subzero environments. In this study, we report the molecular cloning, sequence analysis and three-dimensional structure of the axolotl antifreeze-like protein (AFLP) by homology modeling of the first caudate amphibian AFLP. We constructed a full-length spleen cDNA library of axolotl (Ambystoma mexicanum). An EST having highest similarity (∼42%) with freeze-responsive liver protein Li16 from Rana sylvatica was identified, and the full-length cDNA was subsequently obtained by RACE-PCR. The axolotl antifreeze-like protein sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 93 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein were 10128.6 Da and 8.97, respectively. The molecular characterization of this gene and its deduced protein were further performed by detailed bioinformatics analysis. The three-dimensional structure of current AFLP was predicted by homology modeling, and the conserved residues required for functionality were identified. The homology model constructed could be of use for effective drug design. This is the first report of an antifreeze-like protein identified from a caudate amphibian.
[Study of human immunodeficiency virus transmission chains in Andalusia: analysis from baseline antiretroviral resistance sequences].

PubMed

Pérez-Parra, Santiago; Chueca-Porcuna, Natalia; Álvarez-Estevez, Marta; Pasquau, Juan; Omar, Mohamed; Collado, Antonio; Vinuesa, David; Lozano, Ana Belen; García-García, Federico

2015-11-01

Protease and reverse transcriptase HIV-1 sequences provide useful information for patient clinical management, as well as information on resistance to antiretrovirals. The aim of this study is to evaluate transmission events, transmitted drug resistance, and to georeference subtypes among newly diagnosed patients referred to our center. A study was conducted on 693 patients diagnosed between 2005 and 2012 in Southern Spain. Protease and reverse transcriptase sequences were obtained for resistance to cART analysis with Trugene(®) HIV Genotyping Kit (Siemens, NAD). MEGA 5.2, Neighbor-Joining, ArcGIS and REGA were used for subsequent analysis. The results showed 298 patients clustered into 77 different transmission events. Most of the clusters were formed by pairs (n=49), of men having sex with men (n=26), Spanish (n=37), and below 45 years of age (73.5%). Urban areas from Granada, and the coastal areas of Almeria and Granada showed the greatest subtype heterogeneity. Five clusters were formed by more than 10 patients, and 15 clusters had transmitted drug resistance. The study data demonstrate how the phylogenetic characterization of transmission clusters is a powerful tool to monitor the spread of HIV, and may contribute to design correct preventive measures to minimize it. Copyright © 2015 Elsevier España, S.L.U. y Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
High prevalence of Hepatitis C virus genotype 6 in Vietnam.

PubMed

Pham, Duc Anh; Leuangwutiwong, Pornsawan; Jittmittraphap, Akanitt; Luplertlop, Nattanej; Bach, Hoa Khanh; Akkarathamrongsin, Srunthron; Theamboonlers, Apiradee; Poovorawan, Yong

2009-01-01

This study aimed to update the prevalence of the various Hepatitis C virus genotypes in Vietnamese blood donors. One hundred and three HCV antibody-positive plasma samples were collected from blood donors at the National Institute of Hematology and Blood Transfusion, Hanoi, Vietnam. All specimens were subjected to RT-PCR of the 5' untranslated region (UTR) to confirm the presence of HCV RNA. The core and NS5B regions of thh positive samples were subsequently amplified by RT-PCR followed by direct sequencing and phylogenetic analysis. Seventy out of 103 samples (68.0%) were RNA positive. Core and NS5B were successfully amplified and sequences were obtained for 70 and 65 samples, respectively. Phylogenetic analysis revealed that genotype 6a was the most predominant among Vietnamese blood donors with a prevalence of 37.1% (26/70), followed by genotype 1a at 30.0% (21/70) and genotype 1b at 17.1% (12/70). The prevalence of two other genotype 6 variants, 6e and 61 was 8.6% and 1.4%, respectively. Further analysis of recent studies showed that the geographic distribution of genotype 6 covered mainly southern China and the mainland of Southeast Asia including Vietnam, Laos, Thailand, and Myanmar. The GenBank accession numbers for the sequences reported in this study are FJ768772-FJ768906.

Genetic population structure of marine viral haemorrhagic septicaemia virus (VHSV).

PubMed

Snow, M; Bain, N; Black, J; Taupin, V; Cunningham, C O; King, J A; Skall, H F; Raynard, R S

2004-10-21

The nucleotide sequences of a specific region of the nucleoprotein gene were compared in order to investigate the genetic population structure of marine viral haemorrhagic septicaemia virus (VHSV). Analysis of the sequence from 128 isolates of diverse geographic and host origin renders this the most comprehensive molecular epidemiological study of marine VHSV conducted to date. Phylogenetic analysis of nucleoprotein gene sequences confirmed the existence of the 4 major genotypes previously identified based on N- and subsequent G-gene based analyses. The range of Genotype I included subgroups of isolates associated with rainbow trout aquaculture (Genotype Ia) and those from the Baltic marine environment (Genotype Ib) to emphasise the relatively close genetic relationship between these isolates. The existence of an additional genotype circulating within the Baltic Sea (Genotype II) was also confirmed. Genotype III included marine isolates from around the British Isles in addition to those associated with turbot mariculture, highlighting a continued risk to the development of this industry. Genotype IV consisted of isolates from the marine environment in North America. Taken together, these findings suggest a marine origin of VHSV in rainbow trout aquaculture. The implications of these findings with respect to the future control of VHSV are discussed. The capacity for molecular phylogenetic analysis to resolve complex epidemiological problems is also demonstrated and its likely future importance to disease management issues highlighted.
Performance of amplicon-based next generation DNA sequencing for diagnostic gene mutation profiling in oncopathology.

PubMed

Sie, Daoud; Snijders, Peter J F; Meijer, Gerrit A; Doeleman, Marije W; van Moorsel, Marinda I H; van Essen, Hendrik F; Eijk, Paul P; Grünberg, Katrien; van Grieken, Nicole C T; Thunnissen, Erik; Verheul, Henk M; Smit, Egbert F; Ylstra, Bauke; Heideman, Daniëlle A M

2014-10-01

Next generation DNA sequencing (NGS) holds promise for diagnostic applications, yet implementation in routine molecular pathology practice requires performance evaluation on DNA derived from routine formalin-fixed paraffin-embedded (FFPE) tissue specimens. The current study presents a comprehensive analysis of TruSeq Amplicon Cancer Panel-based NGS using a MiSeq Personal sequencer (TSACP-MiSeq-NGS) for somatic mutation profiling. TSACP-MiSeq-NGS (testing 212 hotspot mutation amplicons of 48 genes) and a data analysis pipeline were evaluated in a retrospective learning/test set approach (n = 58/n = 45 FFPE-tumor DNA samples) against 'gold standard' high-resolution-melting (HRM)-sequencing for the genes KRAS, EGFR, BRAF and PIK3CA. Next, the performance of the validated test algorithm was assessed in an independent, prospective cohort of FFPE-tumor DNA samples (n = 75). In the learning set, a number of minimum parameter settings was defined to decide whether a FFPE-DNA sample is qualified for TSACP-MiSeq-NGS and for calling mutations. The resulting test algorithm revealed 82% (37/45) compliance to the quality criteria and 95% (35/37) concordant assay findings for KRAS, EGFR, BRAF and PIK3CA with HRM-sequencing (kappa = 0.92; 95% CI = 0.81-1.03) in the test set. Subsequent application of the validated test algorithm to the prospective cohort yielded a success rate of 84% (63/75), and a high concordance with HRM-sequencing (95% (60/63); kappa = 0.92; 95% CI = 0.84-1.01). TSACP-MiSeq-NGS detected 77 mutations in 29 additional genes. TSACP-MiSeq-NGS is suitable for diagnostic gene mutation profiling in oncopathology.
An integrated approach to fast and informative morphological vouchering of nematodes for applications in molecular barcoding

PubMed Central

De Ley, Paul; De Ley, Irma Tandingan; Morris, Krystalynne; Abebe, Eyualem; Mundo-Ocampo, Manuel; Yoder, Melissa; Heras, Joseph; Waumann, Dora; Rocha-Olivares, Axayácatl; Jay Burr, A.H; Baldwin, James G; Thomas, W. Kelley

2005-01-01

Molecular surveys of meiofaunal diversity face some interesting methodological challenges when it comes to interstitial nematodes from soils and sediments. Morphology-based surveys are greatly limited in processing speed, while barcoding approaches for nematodes are hampered by difficulties of matching sequence data with traditional taxonomy. Intermediate technology is needed to bridge the gap between both approaches. An example of such technology is video capture and editing microscopy, which consists of the recording of taxonomically informative multifocal series of microscopy images as digital video clips. The integration of multifocal imaging with sequence analysis of the D2D3 region of large subunit (LSU) rDNA is illustrated here in the context of a combined morphological and barcode sequencing survey of marine nematodes from Baja California and California. The resulting video clips and sequence data are made available online in the database NemATOL (http://nematol.unh.edu/). Analyses of 37 barcoded nematodes suggest that these represent at least 32 species, none of which matches available D2D3 sequences in public databases. The recorded multifocal vouchers allowed us to identify most specimens to genus, and will be used to match specimens with subsequent species identifications and descriptions of preserved specimens. Like molecular barcodes, multifocal voucher archives are part of a wider effort at structuring and changing the process of biodiversity discovery. We argue that data-rich surveys and phylogenetic tools for analysis of barcode sequences are an essential component of the exploration of phyla with a high fraction of undiscovered species. Our methods are also directly applicable to other meiofauna such as for example gastrotrichs and tardigrades. PMID:16214752
Rapid micro-scale proteolysis of proteins for MALDI-MS peptide mapping using immobilized trypsin

NASA Astrophysics Data System (ADS)

Gobom, Johan; Nordhoff, Eckhard; Ekman, Rolf; Roepstorff, Peter

1997-12-01

In this study we present a rapid method for tryptic digestion of proteins using micro-columns with enzyme immobilized on perfusion chromatography media. The performance of the method is exemplified with acyl-CoA-binding protein and reduced carbamidomethylated bovine serum albumin. The method proved to be significantly faster and yielded a better sequence coverage and an improved signal-to-noise ratio for the MALDI-MS peptide maps, compared to in-solution- and on-target digestion. Only a single sample transfer step is required, and therefore sample loss due to adsorption to surfaces is reduced, which is a critical issue when handling low picomole to femtomole amounts of proteins. An example is shown with on-column proteolytic digestion and subsequent elution of the digest into a reversed-phase micro-column. This is useful if the sample contains large amounts of salt or is too diluted for MALDI-MS analysis. Furthermore, by step-wise elution from the reversedphase column, a complex digest can be fractionated, which reduces signal suppression and facilitates data interpretation in the subsequent MS-analysis. The method also proved useful for consecutive digestions with enzymes of different cleavage specificity. This is exemplified with on-column tryptic digestion, followed by reversed-phase step-wise elution, and subsequent on-target V8 protease digestion.
The near demise and subsequent revival of classical genetics for investigating Caenorhabditis elegans embryogenesis: RNAi meets next-generation DNA sequencing.

PubMed

Bowerman, Bruce

2011-10-01

Molecular genetic investigation of the early Caenorhabditis elegans embryo has contributed substantially to the discovery and general understanding of the genes, pathways, and mechanisms that regulate and execute developmental and cell biological processes. Initially, worm geneticists relied exclusively on a classical genetics approach, isolating mutants with interesting phenotypes after mutagenesis and then determining the identity of the affected genes. Subsequently, the discovery of RNA interference (RNAi) led to a much greater reliance on a reverse genetics approach: reducing the function of known genes with RNAi and then observing the phenotypic consequences. Now the advent of next-generation DNA sequencing technologies and the ensuing ease and affordability of whole-genome sequencing are reviving the use of classical genetics to investigate early C. elegans embryogenesis.
[Polymorphism of KPI-A genes from plants of the subgenus Potatoe (sect. Petota, Estolonifera and Lycopersicum) and subgenus Solanum].

PubMed

Krinitsyna, A A; Mel'nikova, N V; Belenikin, M S; Poltronieri, P; Santino, A; Kudriavtseva, A V; Savilova, A M; Speranskaia, A S

2013-01-01

Kunitz-type proteinase inhibitor proteins of group A (KPI-A) are involved in the protection of potato plants from pathogens and pests. Although sequences of large number of the KPI-A genes from different species of cultivated potato (Solanum tuberosum subsp. tuberosum) and a few genes from tomato (Solanum lycopersicum) are known to date, information about the allelic diversity of these genes in other species of the genus Solanum is lacking. In our work, the consensus sequences of the KPI-A genes were established in two species of subgenus Potatoe sect. Petota (Solanum tuberosum subsp. andigenum--5 genes and Solanum stoloniferum--2 genes) and in the subgenus Solanum (Solanum nigrum--5 genes) by amplification, cloning, sequencing and subsequent analysis. The determined sequences of KPI-A genes were 97-100% identical to known sequences of the cultivated potato of sect. Petota (cultivated potato Solanum tuberosum subsp. tuberosum) and sect. Etuberosum (S. palustre). The interspecific variability of these genes did not exceed the intraspecific variability for all studied species except Solanum lycopersicum. The distribution of highly variable and conserved sequences in the mature protein-encoding regions was uniform for all investigated KPI-A genes. However, our attempts to amplify the homologous genes using the same primers and the genomes of Solanum dulcamarum, Solanum lycopersicum and Mandragora officinarum resulted in no product formation. Phylogenetic analysis of KPI-A diversity showed that the sequences of the S. lycopersicum form independent cluster, whereas KPI-A of S. nigrum and species of sect. Etuberosum and sect. Petota are closely related and do not form species-specific subclasters. Although Solanum nigrum is resistant to all known races of economically one of the most important diseases of solanaceous plants oomycete Phytophthora infestans aminoacid sequences encoding by KPI-A genes from its genome have nearly or absolutely no differences to the same from genomes of cultivated potatoes involved by P. infestans.
The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

PubMed Central

Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

2007-01-01

Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element) are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the dictionary creator. CRISPRdb is accessible at PMID:17521438
FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise.

PubMed

Hoogenboom, Jerry; van der Gaag, Kristiaan J; de Leeuw, Rick H; Sijen, Titia; de Knijff, Peter; Laros, Jeroen F J

2017-03-01

Massively parallel sequencing (MPS) is on the advent of a broad scale application in forensic research and casework. The improved capabilities to analyse evidentiary traces representing unbalanced mixtures is often mentioned as one of the major advantages of this technique. However, most of the available software packages that analyse forensic short tandem repeat (STR) sequencing data are not well suited for high throughput analysis of such mixed traces. The largest challenge is the presence of stutter artefacts in STR amplifications, which are not readily discerned from minor contributions. FDSTools is an open-source software solution developed for this purpose. The level of stutter formation is influenced by various aspects of the sequence, such as the length of the longest uninterrupted stretch occurring in an STR. When MPS is used, STRs are evaluated as sequence variants that each have particular stutter characteristics which can be precisely determined. FDSTools uses a database of reference samples to determine stutter and other systemic PCR or sequencing artefacts for each individual allele. In addition, stutter models are created for each repeating element in order to predict stutter artefacts for alleles that are not included in the reference set. This information is subsequently used to recognise and compensate for the noise in a sequence profile. The result is a better representation of the true composition of a sample. Using Promega Powerseq™ Auto System data from 450 reference samples and 31 two-person mixtures, we show that the FDSTools correction module decreases stutter ratios above 20% to below 3%. Consequently, much lower levels of contributions in the mixed traces are detected. FDSTools contains modules to visualise the data in an interactive format allowing users to filter data with their own preferred thresholds. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Contribution of reactive and proactive control to children's working memory performance: Insight from item recall durations in response sequence planning.

PubMed

Chevalier, Nicolas; James, Tiffany D; Wiebe, Sandra A; Nelson, Jennifer Mize; Espy, Kimberly Andrews

2014-07-01

The present study addressed whether developmental improvement in working memory span task performance relies upon a growing ability to proactively plan response sequences during childhood. Two hundred thirteen children completed a working memory span task in which they used a touchscreen to reproduce orally presented sequences of animal names. Children were assessed longitudinally at 7 time points between 3 and 10 years of age. Twenty-one young adults also completed the same task. Proactive response sequence planning was assessed by comparing recall durations for the 1st item (preparatory interval) and subsequent items. At preschool age, the preparatory interval was generally shorter than subsequent item recall durations, whereas it was systematically longer during elementary school and in adults. Although children mostly approached the task reactively at preschool, they proactively planned response sequences with increasing efficiency from age 7 on, like adults. These findings clarify the nature of the changes in executive control that support working memory performance with age. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Evolutionary dynamics of selfish DNA explains the abundance distribution of genomic subsequences

PubMed Central

Sheinman, Michael; Ramisch, Anna; Massip, Florian; Arndt, Peter F.

2016-01-01

Since the sequencing of large genomes, many statistical features of their sequences have been found. One intriguing feature is that certain subsequences are much more abundant than others. In fact, abundances of subsequences of a given length are distributed with a scale-free power-law tail, resembling properties of human texts, such as Zipf’s law. Despite recent efforts, the understanding of this phenomenon is still lacking. Here we find that selfish DNA elements, such as those belonging to the Alu family of repeats, dominate the power-law tail. Interestingly, for the Alu elements the power-law exponent increases with the length of the considered subsequences. Motivated by these observations, we develop a model of selfish DNA expansion. The predictions of this model qualitatively and quantitatively agree with the empirical observations. This allows us to estimate parameters for the process of selfish DNA spreading in a genome during its evolution. The obtained results shed light on how evolution of selfish DNA elements shapes non-trivial statistical properties of genomes. PMID:27488939
Sequence stratigraphic principles applied to the Miocene Hawthorn Group, west-central Florida

DOE Office of Scientific and Technical Information (OSTI.GOV)

Norton, V.L.; Randazzo, A.F.

1993-03-01

Sequence boundaries for the Miocene Hawthorn Group in the ROMP 20 drill core from Osprey, Sarasota County, FL were generally delineated by lithologic variations recognized from core slabs, thin section analysis, and geophysical logs. At least six depositional sequences representing third order sea level fluctuations were identified. Depositional environments were determined on the basis of the characteristic lithologic constituents including rip-up clasts, pellets, fossils, laminations, burrow, degree of induration, and grain sorting. The sequence boundaries appear to have formed when the rate of the eustatic fall exceeded basin subsidence rates producing a relative sea level fall at a depositional shorelinemore » break. As a result of the basinward facies shift associated with this sequence type, peritidal facies may directly overlie deeper water facies. Subaerial exposure and erosion can be expected. The sequence of facies representing progressively deeper water depositional environments, followed by a progressive shallowing, were present between bounding surfaces. Among the six sequences recognized, four were clearly delineated as representative of regression, subaerial exposure, and subsequent transgression. Two sequences were less clearly defined and probably represent transitional facies which had exposure surfaces developed. Comparison of the petrologically established sequence stratigraphy with published sea level curves resulted in a strong correlation between the number of sequences recognized and the number of coastal on-lap/off-lap cycles depicted for the early to middle Miocene. This correlation suggests that petrologic examination of core slabs, with supplemental thin section data, can provide useful information regarding the recognition of stratigraphic sequences and relative sea level fluctuations, particularly, in situations where seismic data may not be available.« less
Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

PubMed

Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

1991-06-01

The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.
Novel and canine genotypes of Giardia duodenalis in harbor seals ( Phoca vitulina richardsi).

PubMed

Gaydos, J K; Miller, W A; Johnson, C; Zornetzer, H; Melli, A; Packham, A; Jeffries, S J; Lance, M M; Conrad, P A

2008-12-01

Feces of harbor seals (Phoca vitulina richardsi) and hybrid glaucous-winged/western gulls (Larus glaucescens / occidentalis) from Washington State's inland marine waters were examined for Giardia and Cryptosporidium spp. to determine if genotypes carried by these wildlife species were the same genotypes that commonly infect humans and domestic animals. Using immunomagnetic separation followed by direct fluorescent antibody detection, Giardia spp. cysts were detected in 42% of seal fecal samples (41/97). Giardia-positive samples came from 90% of the sites (9/10) and the prevalence of positive seal fecal samples differed significantly among study sites. Fecal samples collected from seal haulout sites with over 400 animals were 4.7 times more likely to have Giardia spp. cysts than samples collected at smaller haulout sites. In gulls, a single Giardia sp. cyst was detected in 4% of fecal samples (3/78). Cryptosporidium spp. oocysts were not detected in any of the seals or gulls tested. Sequence analysis of a 398 bp segment of G. duodenalis DNA at the glutamate dehydrogenase locus suggested that 11 isolates originating from seals throughout the region were a novel genotype and 3 isolates obtained from a single site in south Puget Sound were the G. duodenalis canine genotype D. Real-time TaqMan PCR amplification and subsequent sequencing of a 52 bp small subunit ribosomal DNA region from novel harbor seal genotype isolates showed sequence homology to canine genotypes C and D. Sequence analysis of the 52 bp small subunit ribosomal DNA products from the 3 canine genotype isolates from seals produced mixed sequences at could not be evaluated.
Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martinez, Antonio D.; Berka, Randy; Henrissat, Bernard

2008-05-01

A major thrust of the white biotechnology movement involves the development of enzyme systems which depolymerize biomass to simple sugars which are subsequently converted to sustainable biofuels (e.g., ethanol) and chemical intermediates. The fungus Trichoderma reesei (syn. Hypocrea jecorina) represents a paradigm for the industrial production of highly efficient cellulases and hemicellulases needed for hydrolysis of biomass polysaccharides. Herein we describe intriguing attributes of the T. reeseigenome in relation to the future of fuel biotechnology. The T. reesei genome sequence was derived using a whole genome shotgun approach combined with finishing work to generate an assembly comprising 89 scaffolds totalingmore » 34 Mbp with few gaps. In total, 9,130 gene models were predicted using a combination of ab initio and sequence similarity-based methods and EST data. Considering the industrial utility and effectiveness of its enzymes, the T. reesei genome surprisingly encodes the fewest cellulases and hemicellulases of any fungus having the ability to hydrolyze plant cell wall polysaccharides and whose genome has been sequenced. Many genes encoding carbohydrate active enzymes are distributed non-randomly in groups or clusters that interestingly lie between regions of synteny with other Sordariomycetes. Additionally, the T. reesei genome contains a multitude of genes encoding biosynthetic pathways for secondary metabolites (possible antibacterial and antifungal compounds) which may promote successful competition and survival in the crowded and competitive soil habitat occupied by T. reesei. Our analysis coupled with the availability of genome sequence data provides a roadmap for construction of enhanced T. reesei strains for industrial applications.« less
A segmentation method for lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise

PubMed Central

Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian

2017-01-01

The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916
The seismic stratigraphy of Okanagan Lake, British Columbia; a record of rapid deglaciation in a deep 'fiord-lake' basin

NASA Astrophysics Data System (ADS)

Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.

1991-09-01

This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.
Exome-wide DNA capture and next generation sequencing in domestic and wild species.

PubMed

Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon

2011-07-05

Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
RNA regulators responding to ribosomal protein S15 are frequent in sequence space

PubMed Central

Slinger, Betty L.; Meyer, Michelle M.

2016-01-01

There are several natural examples of distinct RNA structures that interact with the same ligand to regulate the expression of homologous genes in different organisms. One essential question regarding this phenomenon is whether such RNA regulators are the result of convergent or divergent evolution. Are the RNAs derived from some common ancestor and diverged to the point where we cannot identify the similarity, or have multiple solutions to the same biological problem arisen independently? A key variable in assessing these alternatives is how frequently such regulators arise within sequence space. Ribosomal protein S15 is autogenously regulated via an RNA regulator in many bacterial species; four apparently distinct regulators have been functionally validated in different bacterial phyla. Here, we explore how frequently such regulators arise within a partially randomized sequence population. We find many RNAs that interact specifically with ribosomal protein S15 from Geobacillus kaustophilus with biologically relevant dissociation constants. Furthermore, of the six sequences we characterize, four show regulatory activity in an Escherichia coli reporter assay. Subsequent footprinting and mutagenesis analysis indicates that protein binding proximal to regulatory features such as the Shine–Dalgarno sequence is sufficient to enable regulation, suggesting that regulation in response to S15 is relatively easily acquired. PMID:27580716
Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.

PubMed

Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L

2016-05-01

Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.
Mosaic CREBBP mutation causes overlapping clinical features of Rubinstein–Taybi and Filippi syndromes

PubMed Central

de Vries, Tamar I; R Monroe, Glen; van Belzen, Martine J; van der Lans, Christian A; Savelberg, Sanne MC; Newman, William G; van Haaften, Gijs; Nievelstein, Rutger A; van Haelst, Mieke M

2016-01-01

Rubinstein–Taybi syndrome (RTS, OMIM 180849) and Filippi syndrome (FLPIS, OMIM 272440) are both rare syndromes, with multiple congenital anomalies and intellectual deficit (MCA/ID). We present a patient with intellectual deficit, short stature, bilateral syndactyly of hands and feet, broad thumbs, ocular abnormalities, and dysmorphic facial features. These clinical features suggest both RTS and FLPIS. Initial DNA analysis of DNA isolated from blood did not identify variants to confirm either of these syndrome diagnoses. Whole-exome sequencing identified a homozygous variant in C9orf173, which was novel at the time of analysis. Further Sanger sequencing analysis of FLPIS cases tested negative for CKAP2L variants did not, however, reveal any further variants. Subsequent analysis using DNA isolated from buccal mucosa revealed a mosaic variant in CREBBP. This report highlights the importance of excluding mosaic variants in patients with a strong but atypical clinical presentation of a MCA/ID syndrome if no disease-causing variants can be detected in DNA isolated from blood samples. As the striking syndactyly observed in the present case is typical for FLPIS, we suggest CREBBP analysis in saliva samples for FLPIS syndrome cases in which no causal CKAP2L variant is detected. PMID:26956253

Genome Editing in Human Pluripotent Stem Cells.

PubMed

Carlson-Stevermer, Jared; Saha, Krishanu

2017-01-01

Genome editing in human pluripotent stem cells (hPSCs) enables the generation of reporter lines and knockout cell lines. Zinc finger nucleases, transcription activator-like effector nucleases (TALENs), and CRISPR/Cas9 technology have recently increased the efficiency of proper gene editing by creating double strand breaks (DSB) at defined sequences in the human genome. These systems typically use plasmids to transiently transcribe nucleases within the cell. Here, we describe the process for preparing hPSCs for transient expression of nucleases via electroporation and subsequent analysis to create genetically modified stem cell lines.
Haemophilus parainfluenzae urethritis among homosexual men.

PubMed

Hsu, Meng-Shiuan; Wu, Mei-Yu; Lin, Tsui-Hsien; Liao, Chun-Hsing

2015-08-01

Haemophilus parainfluenzae is a common inhabitant of the human upper respiratory tract of the normal oral microflora. We report three men who had been having unprotected sex with men (MSM) and subsequently acquired H. parainfluenzae urethritis, which was confirmed by 16S rRNA gene sequencing analysis. Two men were treated with ceftriaxone and doxycycline, and the third man was treated with clarithromycin. All three patients responded to treatment. This case series highlights the potential role of H. parainfluenzae as a sexually transmitted genitourinary pathogen. Copyright © 2012. Published by Elsevier B.V.
Use of Landsat Thematic Mapper images in regional correlation of syntectonic strata, Colorado river extensional corridor, California and Arizona

NASA Technical Reports Server (NTRS)

Beratan, K. K.; Blom, R. G.; Crippen, R. E.; Nielson, J. E.

1990-01-01

Enhanced Landsat TM images were used in conjunction with field work to investigate the regional correlation of Miocene rocks in the Colorado River extensional corridor of California and Arizona. Based on field investigations, four sequences of sedimentary and volcanic strata could be recognized in the Mohave Mountains (Arizona) and the eastern Whipple Mountains (California), which display significantly different relative volumes and organization of lithologies. The four sequences were also found to have distinctive appearances on the TM image. The recognition criteria derived from field mapping and image interpretation in the Mohave Mountains and Whipple Mountains were applied to an adjacent area in which stratigraphic affinities were less well known. The results of subsequent field work confirmed the stratigraphic and structural relations suggested by the Tm image analysis.
Violacein-producing Collimonas sp. from the sea surface microlayer of costal waters in Trøndelag, Norway.

PubMed

Hakvåg, Sigrid; Fjaervik, Espen; Klinkenberg, Geir; Borgos, Sven Even F; Josefsen, Kjell D; Ellingsen, Trond E; Zotchev, Sergey B

2009-11-12

A new strain belonging to the genus Collimonas was isolated from the sea surface microlayer off the coast of Trøndelag, Norway. The bacterium, designated Collimonas CT, produced an antibacterial compound active against Micrococcus luteus. Subsequent studies using LC-MS identified this antibacterial compound as violacein, known to be produced by several marine-derived bacteria. Fragments of the violacein biosynthesis genes vioA and vioB were amplified by PCR from the Collimonas CT genome and sequenced. Phylogenetic analysis of these sequences demonstrated close relatedness of the Collimonas CT violacein biosynthetic gene cluster to those in Janthinobacterium lividum and Duganella sp., suggesting relatively recent horizontal gene transfer. Considering diverse biological activities of violacein, Collimonas CT shall be further studied as a potential producer of this compound.
Metallo-β-lactamase-producing Pseudomonas aeruginosa in the Netherlands: the nationwide emergence of a single sequence type.

PubMed

Van der Bij, A K; Van der Zwan, D; Peirano, G; Severin, J A; Pitout, J D D; Van Westreenen, M; Goessens, W H F

2012-09-01

Recently, the first outbreak of clonally related VIM-2 metallo-β-lactamase (MBL)-producing Pseudomonas aeruginosa in a Dutch tertiary-care centre was described. Subsequently, a nationwide surveillance study was performed in 2010-2011, which identified the presence of VIM-2 MBL-producing P. aeruginosa in 11 different hospitals. Genotyping by multiple-locus variable-number tandem-repeat analysis (MLVA) showed that the majority of the 82 MBL-producing isolates found belonged to a single MLVA type (n = 70, 85%), identified as ST111 by multilocus sequence typing (MLST). As MBL-producing isolates cause serious infections that are difficult to treat, the presence of clonally related isolates in various hospitals throughout the Netherlands is of nationwide concern. © 2012 The Authors. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.
Violacein-Producing Collimonas sp. from the Sea Surface Microlayer of Costal Waters in Trøndelag, Norway

PubMed Central

Hakvåg, Sigrid; Fjærvik, Espen; Klinkenberg, Geir; Borgos, Sven Even F.; Josefsen, Kjell D.; Ellingsen, Trond E.; Zotchev, Sergey B.

2009-01-01

A new strain belonging to the genus Collimonas was isolated from the sea surface microlayer off the coast of Trøndelag, Norway. The bacterium, designated Collimonas CT, produced an antibacterial compound active against Micrococcus luteus. Subsequent studies using LC-MS identified this antibacterial compound as violacein, known to be produced by several marine-derived bacteria. Fragments of the violacein biosynthesis genes vioA and vioB were amplified by PCR from the Collimonas CT genome and sequenced. Phylogenetic analysis of these sequences demonstrated close relatedness of the Collimonas CT violacein biosynthetic gene cluster to those in Janthinobacterium lividum and Duganella sp., suggesting relatively recent horizontal gene transfer. Considering diverse biological activities of violacein, Collimonas CT shall be further studied as a potential producer of this compound. PMID:20098599
Autosomal-dominant non-autoimmune hyperthyroidism presenting with neuromuscular symptoms.

PubMed

Elgadi, Aziz; Arvidsson, C-G; Janson, Annika; Marcus, Claude; Costagliola, Sabine; Norgren, Svante

2005-08-01

Neuromuscular presentations are common in thyroid disease, although the mechanism is unclear. In the present study, we investigated the pathogenesis in a boy with autosomal-dominant hyperthyroidism presenting with neuromuscular symptoms. The TSHr gene was investigated by direct sequencing. Functional properties of the mutant TSHr were investigated during transient expression in COS-7 cells. Family members were investigated by clinical and biochemical examinations. Sequence analysis revealed a previously reported heterozygous missense mutation Glycine 431 for Serine in the first transmembrane segment, leading to an increased specific constitutive activity. Three additional affected family members carried the same mutation. There was no indication of autoimmune disorder. All symptoms disappeared upon treatment with thacapzol and L-thyroxine and subsequent subtotal thyroidectomy. The data imply that neuromuscular symptoms can be caused by excessive thyroid hormone levels rather than by autoimmunity.
Combining stress transfer and source directivity: the case of the 2012 Emilia seismic sequence

PubMed Central

Convertito, Vincenzo; Catalli, Flaminia; Emolo, Antonio

2013-01-01

The Emilia seismic sequence (Northern Italy) started on May 2012 and caused 17 casualties, severe damage to dwellings and forced the closure of several factories. The total number of events recorded in one month was about 2100, with local magnitude ranging between 1.0 and 5.9. We investigate potential mechanisms (static and dynamic triggering) that may describe the evolution of the sequence. We consider rupture directivity in the dynamic strain field and observe that, for each main earthquake, its aftershocks and the subsequent large event occurred in an area characterized by higher dynamic strains and corresponding to the dominant rupture direction. We find that static stress redistribution alone is not capable of explaining the locations of subsequent events. We conclude that dynamic triggering played a significant role in driving the sequence. This triggering was also associated with a variation in permeability and a pore pressure increase in an area characterized by a massive presence of fluids. PMID:24177982
Transcriptome Sequencing, and Rapid Development and Application of SNP Markers for the Legume Pod Borer Maruca vitrata (Lepidoptera: Crambidae)

PubMed Central

Margam, Venu M.; Coates, Brad S.; Bayles, Darrell O.; Hellmich, Richard L.; Agunbiade, Tolulope; Seufferheld, Manfredo J.; Sun, Weilin; Kroemer, Jeremy A.; Ba, Malick N.; Binso-Dabire, Clementine L.; Baoua, Ibrahim; Ishiyaku, Mohammad F.; Covas, Fernando G.; Srinivasan, Ramasamy; Armstrong, Joel; Murdock, Larry L.; Pittendrigh, Barry R.

2011-01-01

The legume pod borer, Maruca vitrata (Lepidoptera: Crambidae), is an insect pest species of crops grown by subsistence farmers in tropical regions of Africa. We present the de novo assembly of 3729 contigs from 454- and Sanger-derived sequencing reads for midgut, salivary, and whole adult tissues of this non-model species. Functional annotation predicted that 1320 M. vitrata protein coding genes are present, of which 631 have orthologs within the Bombyx mori gene model. A homology-based analysis assigned M. vitrata genes into a group of paralogs, but these were subsequently partitioned into putative orthologs following phylogenetic analyses. Following sequence quality filtering, a total of 1542 putative single nucleotide polymorphisms (SNPs) were predicted within M. vitrata contig assemblies. Seventy one of 1078 designed molecular genetic markers were used to screen M. vitrata samples from five collection sites in West Africa. Population substructure may be present with significant implications in the insect resistance management recommendations pertaining to the release of biological control agents or transgenic cowpea that express Bacillus thuringiensis crystal toxins. Mutation data derived from transcriptome sequencing is an expeditious and economical source for genetic markers that allow evaluation of ecological differentiation. PMID:21754987
GDAP: a web tool for genome-wide protein disulfide bond prediction.

PubMed

O'Connor, Brian D; Yeates, Todd O

2004-07-01

The Genomic Disulfide Analysis Program (GDAP) provides web access to computationally predicted protein disulfide bonds for over one hundred microbial genomes, including both bacterial and achaeal species. In the GDAP process, sequences of unknown structure are mapped, when possible, to known homologous Protein Data Bank (PDB) structures, after which specific distance criteria are applied to predict disulfide bonds. GDAP also accepts user-supplied protein sequences and subsequently queries the PDB sequence database for the best matches, scans for possible disulfide bonds and returns the results to the client. These predictions are useful for a variety of applications and have previously been used to show a dramatic preference in certain thermophilic archaea and bacteria for disulfide bonds within intracellular proteins. Given the central role these stabilizing, covalent bonds play in such organisms, the predictions available from GDAP provide a rich data source for designing site-directed mutants with more stable thermal profiles. The GDAP web application is a gateway to this information and can be used to understand the role disulfide bonds play in protein stability both in these unusual organisms and in sequences of interest to the individual researcher. The prediction server can be accessed at http://www.doe-mbi.ucla.edu/Services/GDAP.
"Is It Worth Knowing?" Focus Group Participants' Perceived Utility of Genomic Preconception Carrier Screening.

PubMed

Schneider, Jennifer L; Goddard, Katrina A B; Davis, James; Wilfond, Benjamin; Kauffman, Tia L; Reiss, Jacob A; Gilmore, Marian; Himes, Patricia; Lynch, Frances L; Leo, Michael C; McMullen, Carmit

2016-02-01

As genome sequencing technology advances, research is needed to guide decision-making about what results can or should be offered to patients in different clinical settings. We conducted three focus groups with individuals who had prior preconception genetic testing experience to explore perceived advantages and disadvantages of genome sequencing for preconception carrier screening, compared to usual care. Using a discussion guide, a trained qualitative moderator facilitated the audio-recorded focus groups. Sixteen individuals participated. Thematic analysis of transcripts started with a grounded approach and subsequently focused on participants' perceptions of the value of genetic information. Analysis uncovered two orientations toward genomic preconception carrier screening: "certain" individuals desiring all possible screening information; and "hesitant" individuals who were more cautious about its value. Participants revealed valuable information about barriers to screening: fear/anxiety about results; concerns about the method of returning results; concerns about screening necessity; and concerns about partner participation. All participants recommended offering choice to patients to enhance the value of screening and reduce barriers. Overall, two groups of likely users of genome sequencing for preconception carrier screening demonstrated different perceptions of the advantages or disadvantages of screening, suggesting tailored approaches to education, consent, and counseling may be warranted with each group.
Rapid identification of causal mutations in tomato EMS populations via mapping-by-sequencing.

PubMed

Garcia, Virginie; Bres, Cécile; Just, Daniel; Fernandez, Lucie; Tai, Fabienne Wong Jun; Mauxion, Jean-Philippe; Le Paslier, Marie-Christine; Bérard, Aurélie; Brunel, Dominique; Aoki, Koh; Alseekh, Saleh; Fernie, Alisdair R; Fraser, Paul D; Rothan, Christophe

2016-12-01

The tomato is the model species of choice for fleshy fruit development and for the Solanaceae family. Ethyl methanesulfonate (EMS) mutants of tomato have already proven their utility for analysis of gene function in plants, leading to improved breeding stocks and superior tomato varieties. However, until recently, the identification of causal mutations that underlie particular phenotypes has been a very lengthy task that many laboratories could not afford because of spatial and technical limitations. Here, we describe a simple protocol for identifying causal mutations in tomato using a mapping-by-sequencing strategy. Plants displaying phenotypes of interest are first isolated by screening an EMS mutant collection generated in the miniature cultivar Micro-Tom. A recombinant F 2 population is then produced by crossing the mutant with a wild-type (WT; non-mutagenized) genotype, and F 2 segregants displaying the same phenotype are subsequently pooled. Finally, whole-genome sequencing and analysis of allele distributions in the pools allow for the identification of the causal mutation. The whole process, from the isolation of the tomato mutant to the identification of the causal mutation, takes 6-12 months. This strategy overcomes many previous limitations, is simple to use and can be applied in most laboratories with limited facilities for plant culture and genotyping.
The Oxidosqualene Cyclase from the Oomycete Saprolegnia parasitica Synthesizes Lanosterol as a Single Product

PubMed Central

Dahlin, Paul; Srivastava, Vaibhav; Bulone, Vincent; McKee, Lauren S.

2016-01-01

The first committed step of sterol biosynthesis is the cyclisation of 2,3-oxidosqualene to form either lanosterol (LA) or cycloartenol (CA). This is catalyzed by an oxidosqualene cyclase (OSC). LA and CA are subsequently converted into various sterols by a series of enzyme reactions. The specificity of the OSC therefore determines the final composition of the end sterols of an organism. Despite the functional importance of OSCs, the determinants of their specificity are not well understood. In sterol-synthesizing oomycetes, recent bioinformatics, and metabolite analysis suggest that LA is produced. However, this catalytic activity has never been experimentally demonstrated. Here, we show that the OSC of the oomycete Saprolegnia parasitica, a severe pathogen of salmonid fish, has an uncommon sequence in a conserved motif important for specificity. We present phylogenetic analysis revealing that this sequence is common to sterol-synthesizing oomycetes, as well as some plants, and hypothesize as to the evolutionary origin of some microbial sequences. We also demonstrate for the first time that a recombinant form of the OSC from S. parasitica produces LA exclusively. Our data pave the way for a detailed structural characterization of the protein and the possible development of specific inhibitors of oomycete OSCs for disease control in aquaculture. PMID:27881978
Use of Life Course Work–Family Profiles to Predict Mortality Risk Among US Women

PubMed Central

Guevara, Ivan Mejía; Glymour, M. Maria; Berkman, Lisa F.

2015-01-01

Objectives. We examined relationships between US women’s exposure to midlife work–family demands and subsequent mortality risk. Methods. We used data from women born 1935 to 1956 in the Health and Retirement Study to calculate employment, marital, and parenthood statuses for each age between 16 and 50 years. We used sequence analysis to identify 7 prototypical work–family trajectories. We calculated age-standardized mortality rates and hazard ratios (HRs) for mortality associated with work–family sequences, with adjustment for covariates and potentially explanatory later-life factors. Results. Married women staying home with children briefly before reentering the workforce had the lowest mortality rates. In comparison, after adjustment for age, race/ethnicity, and education, HRs for mortality were 2.14 (95% confidence interval [CI] = 1.58, 2.90) among single nonworking mothers, 1.48 (95% CI = 1.06, 1.98) among single working mothers, and 1.36 (95% CI = 1.02, 1.80) among married nonworking mothers. Adjustment for later-life behavioral and economic factors partially attenuated risks. Conclusions. Sequence analysis is a promising exposure assessment tool for life course research. This method permitted identification of certain lifetime work–family profiles associated with mortality risk before age 75 years. PMID:25713976
The Oxidosqualene Cyclase from the Oomycete Saprolegnia parasitica Synthesizes Lanosterol as a Single Product.

PubMed

Dahlin, Paul; Srivastava, Vaibhav; Bulone, Vincent; McKee, Lauren S

2016-01-01

The first committed step of sterol biosynthesis is the cyclisation of 2,3-oxidosqualene to form either lanosterol (LA) or cycloartenol (CA). This is catalyzed by an oxidosqualene cyclase (OSC). LA and CA are subsequently converted into various sterols by a series of enzyme reactions. The specificity of the OSC therefore determines the final composition of the end sterols of an organism. Despite the functional importance of OSCs, the determinants of their specificity are not well understood. In sterol-synthesizing oomycetes, recent bioinformatics, and metabolite analysis suggest that LA is produced. However, this catalytic activity has never been experimentally demonstrated. Here, we show that the OSC of the oomycete Saprolegnia parasitica , a severe pathogen of salmonid fish, has an uncommon sequence in a conserved motif important for specificity. We present phylogenetic analysis revealing that this sequence is common to sterol-synthesizing oomycetes, as well as some plants, and hypothesize as to the evolutionary origin of some microbial sequences. We also demonstrate for the first time that a recombinant form of the OSC from S. parasitica produces LA exclusively. Our data pave the way for a detailed structural characterization of the protein and the possible development of specific inhibitors of oomycete OSCs for disease control in aquaculture.
SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

PubMed

Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

2011-05-05

High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

PubMed Central

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-01-01

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
Weighted LCS

NASA Astrophysics Data System (ADS)

Amir, Amihood; Gotthilf, Zvi; Shalom, B. Riva

The Longest Common Subsequence (LCS) of two strings A and B is a well studied problem having a wide range of applications. When each symbol of the input strings is assigned a positive weight the problem becomes the Heaviest Common Subsequence (HCS) problem. In this paper we consider a different version of weighted LCS on Position Weight Matrices (PWM). The Position Weight Matrix was introduced as a tool to handle a set of sequences that are not identical, yet, have many local similarities. Such a weighted sequence is a 'statistical image' of this set where we are given the probability of every symbol's occurrence at every text location. We consider two possible definitions of LCS on PWM. For the first, we solve the weighted LCS problem of z sequences in time O(zn z + 1). For the second, we prove \\cal{NP}-hardness and provide an approximation algorithm.
Movement initiation-locked activity of the anterior putamen predicts future movement instability in periodic bimanual movement.

PubMed

Aramaki, Yu; Haruno, Masahiko; Osu, Rieko; Sadato, Norihiro

2011-07-06

In periodic bimanual movements, anti-phase-coordinated patterns often change into in-phase patterns suddenly and involuntarily. Because behavior in the initial period of a sequence of cycles often does not show any obvious errors, it is difficult to predict subsequent movement errors in the later period of the cyclical sequence. Here, we evaluated performance in the later period of the cyclical sequence of bimanual periodic movements using human brain activity measured with functional magnetic resonance imaging as well as using initial movement features. Eighteen subjects performed a 30 s bimanual finger-tapping task. We calculated differences in initiation-locked transient brain activity between antiphase and in-phase tapping conditions. Correlation analysis revealed that the difference in the anterior putamen activity during antiphase compared within-phase tapping conditions was strongly correlated with future instability as measured by the mean absolute deviation of the left-hand intertap interval during antiphase movements relative to in-phase movements (r = 0.81). Among the initial movement features we measured, only the number of taps to establish the antiphase movement pattern exhibited a significant correlation. However, the correlation efficient of 0.60 was not high enough to predict the characteristics of subsequent movement. There was no significant correlation between putamen activity and initial movement features. It is likely that initiating unskilled difficult movements requires increased anterior putamen activity, and this activity increase may facilitate the initiation of movement via the basal ganglia-thalamocortical circuit. Our results suggest that initiation-locked transient activity of the anterior putamen can be used to predict future motor performance.
A PARASITOLOGIC AND MOLECULAR SURVEY OF HEPATOZOON CANIS INFECTION IN STRAY DOGS IN NORTHEAST OF IRAN.

PubMed

Barati, Ali; Razmi, Gholamreza

2018-05-15

Canine hepatozoonosis, caused by H. canis, is a tick-borne disease in domestic and wild dogs that is transmitted by ingestion of Rhipicephalus sanguineus ticks. The aim of the study was to detect H. canis in stray dogs in Iran using blood smear examination and molecular techniques. From October 2014 to September 2015, 150 EDTA blood samples were collected from stray dogs in the northeast region of Iran. Blood smears were microscopically examined for the presence of Hepatozoon gamonts; whole blood was evaluated by PCR, with subsequent sequencing and phylogenetic analysis. Hepatozoon spp. Gamonts were observed in the neutrophils of 5/150 (3.3%) blood smears, whereas Hepatozoon spp. 18S rDNA was detected in 12/150 (8.0%) blood samples from stray dogs. There was a good agreement between microscopy and PCR methods. (Kappa= 0.756). The highest rate of infection was seasonally detected in the summer (p<0.05). The difference of frequency of Hepatozoon spp infection was not significant by gender and age factors (p>0.05). The alignment analysis of the sequenced samples showed ≥99% similarity with other nucleotide sequences of Hepatozoon spp. in GenBank. The phylogenetic tree also revealed that the nucleotide sequences in this study were clustered in the H. canis clade and different from the H. felis and H. americanum clades. According to the results, it is concluded that H. canis infection is present among dogs in northeastern region of Iran.

Epitope mapping of the variable repetitive region with the MB antigen of Ureaplasma urealyticum.

PubMed Central

Zheng, X; Lau, K; Frazier, M; Cassell, G H; Watson, H L

1996-01-01

One of the major surface structures of Ureaplasma urealyticum recognized by antibodies of patients during infection is the MB antigen. Previously, we showed by Western blot (immunoblot) analysis that any one of the anti-MB monoclonal antibodies (MAbs) 3B1.5, 5B1.1, and 10C6.6 could block the binding of patient antibodies to MB. Subsequent DNA sequencing revealed that a unique six-amino-acid direct tandem repeat region composed the carboxy two-thirds of this antigen. In the present study, using antibody-reactive peptide scanning of this repeat region, we demonstrated that the amino acids defining the epitopes for MAbs 3B1.5 5B1.1 and 10C6.6 are EQP, GK, and KEQPA, respectively. Peptide scanning analysis of an infected patient's serum antibody response showed that the dominant epitope was defined by the sequence PAGK. Mapping of these continuous epitopes revealed overlap between all MAb and patient polyclonal antibody binding sites, thus explaining the ability of a single MAb to apparently block all polyclonal antibody binding sites. We also show that a single amino acid difference in the sequence of the repeats of serovars 3 and 14 accounts for the lack of reactivity with serovar 14 of two of the serovar 3-specific MAbs. Finally, the data demonstrate the need to obtain the sequences of the mba genes of all serovars before an effective serovar-specific antibody detection method can be developed. PMID:8914774
Single haplotype assembly of the human genome from a hydatidiform mole.

PubMed

Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K

2014-12-01

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.
Single haplotype assembly of the human genome from a hydatidiform mole

PubMed Central

Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.

2014-01-01

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144
Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

PubMed

Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

2016-09-01

S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.
Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma

PubMed Central

2011-01-01

Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocladium virens, teleomorph Hypocrea virens), and a comparison with Trichoderma reesei (teleomorph Hypocrea jecorina). These three Trichoderma species display a remarkable conservation of gene order (78 to 96%), and a lack of active mobile elements probably due to repeat-induced point mutation. Several gene families are expanded in the two mycoparasitic species relative to T. reesei or other ascomycetes, and are overrepresented in non-syntenic genome regions. A phylogenetic analysis shows that T. reesei and T. virens are derived relative to T. atroviride. The mycoparasitism-specific genes thus arose in a common Trichoderma ancestor but were subsequently lost in T. reesei. Conclusions The data offer a better understanding of mycoparasitism, and thus enforce the development of improved biocontrol strains for efficient and environmentally friendly protection of plants. PMID:21501500
Anatomy of major coal successions: Facies analysis and sequence architecture of a brown coal-bearing valley fill to lacustrine tract (Upper Valdarno Basin, Northern Apennines, Italy)

NASA Astrophysics Data System (ADS)

Ielpi, Alessandro

2012-07-01

A late Pliocene incised valley fill to lacustrine succession, which contains an interbedded brown coal seam (< 20 m thick), is examined in terms of facies analysis, physical stratigraphy and sequence architecture. The succession (< 50 m thick) constitutes the first depositional event of the Castelnuovo Synthem, which is the oldest unconformity bounded stratigraphic unit of the nonmarine Upper Valdarno Basin, Northern Apennines (Italy). The integration of field surveys and borehole logs identified the following event sequence: first valley filling stages by coarse alluvial fan and channelised streams; the progressive setting of low gradient floodbasins with shallow floodplain lakes; subsequent major waterlogging and extensive peat mire development; and system drowning and establishment of permanent lacustrine conditions. The deposits are grouped in a set of nested valley fills and are arranged as high-frequency depositional sequences. The sequences are bounded by minor erosive truncations and have distinctive upward trends: lowstand system tract thinning; transgressive system tract thickening; highstand system tract thinning and eventual non-deposition; and the smoothing of along-sequence boundary sub-aerial incisions. Such features fit in with the notion of an idealised model where second-order (high-frequency) fluctuations, modulated by first-order (low-frequency) base-level rising, have short-lived standing + falling phases and prolonged transgressions, respectively. Furthermore, the general sequence architecture reveals how a mixed palustrine-siliciclastic system differs substantially from a purely siliciclastic one. In the transgressive phases, terrigenous starvation induces prevailing peat accumulation, generating abnormally thick transgressive system tracts that eventually come to occupy much of the same transgression-generated accommodation space. In the highstand phases, the development of thick highstand system tracts is then prevented by sediment upstream trapping due to retrogressive fluvial aggradations, probably coupled with low-accommodation settings inherited from the transgressive phases.
FASH: A web application for nucleotides sequence search.

PubMed

Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

2008-05-27

: FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).
Identification of characteristic oligonucleotides in the bacterial 16S ribosomal RNA sequence dataset

NASA Technical Reports Server (NTRS)

Zhang, Zhengdong; Willson, Richard C.; Fox, George E.

2002-01-01

MOTIVATION: The phylogenetic structure of the bacterial world has been intensively studied by comparing sequences of 16S ribosomal RNA (16S rRNA). This database of sequences is now widely used to design probes for the detection of specific bacteria or groups of bacteria one at a time. The success of such methods reflects the fact that there are local sequence segments that are highly characteristic of particular organisms or groups of organisms. It is not clear, however, the extent to which such signature sequences exist in the 16S rRNA dataset. A better understanding of the numbers and distribution of highly informative oligonucleotide sequences may facilitate the design of hybridization arrays that can characterize the phylogenetic position of an unknown organism or serve as the basis for the development of novel approaches for use in bacterial identification. RESULTS: A computer-based algorithm that characterizes the extent to which any individual oligonucleotide sequence in 16S rRNA is characteristic of any particular bacterial grouping was developed. A measure of signature quality, Q(s), was formulated and subsequently calculated for every individual oligonucleotide sequence in the size range of 5-11 nucleotides and for 15mers with reference to each cluster and subcluster in a 929 organism representative phylogenetic tree. Subsequently, the perfect signature sequences were compared to the full set of 7322 sequences to see how common false positives were. The work completed here establishes beyond any doubt that highly characteristic oligonucleotides exist in the bacterial 16S rRNA sequence dataset in large numbers. Over 16,000 15mers were identified that might be useful as signatures. Signature oligonucleotides are available for over 80% of the nodes in the representative tree.
Bifiguratus adelaidae, gen. et sp. nov., a new member of Mucoromycotina in endophytic and soil-dwelling habitats

USDA-ARS?s Scientific Manuscript database

Illumina amplicon sequencing of soil in a temperate pine forest in the southeastern United States detected an abundant, N-responsive fungal genotype of unknown phylogenetic affiliation. Two isolates with ribosomal sequences consistent with that genotype were subsequently obtained in culture. Examina...
Sleep Does Not Enhance Motor Sequence Learning

ERIC Educational Resources Information Center

Rickard, Timothy C.; Cai, Denise J.; Rieth, Cory A.; Jones, Jason; Ard, M. Colin

2008-01-01

Improvements in motor sequence performance have been observed after a delay involving sleep. This finding has been taken as evidence for an active sleep consolidation process that enhances subsequent performance. In a review of this literature, however, the authors observed 4 aspects of data analyses and experimental design that could lead to…
Interactions between Defining, Explaining and Classifying: The Case of Increasing and Decreasing Sequences

ERIC Educational Resources Information Center

Alcock, Lara; Simpson, Adrian

2017-01-01

This paper describes a study in which we investigated relationships between defining mathematical concepts--increasing and decreasing infinite sequences--explaining their meanings and classifying consistently with formal definitions. We explored the effect of defining, explaining or studying a definition on subsequent classification, and the…
Predicting Free Recalls

ERIC Educational Resources Information Center

Laming, Donald

2006-01-01

This article reports some calculations on free-recall data from B. Murdock and J. Metcalfe (1978), with vocal rehearsal during the presentation of a list. Given the sequence of vocalizations, with the stimuli inserted in their proper places, it is possible to predict the subsequent sequence of recalls--the predictions taking the form of a…
Procedural Memory Consolidation in the Performance of Brief Keyboard Sequences

ERIC Educational Resources Information Center

Duke, Robert A.; Davis, Carla M.

2006-01-01

Using two sequential key press sequences, we tested the extent to which subjects' performance on a digital piano keyboard changed between the end of training and retest on subsequent days. We found consistent, significant improvements attributable to sleep-based consolidation effects, indicating that learning continued after the cessation of…
Massively parallel sequencing of 17 commonly used forensic autosomal STRs and amelogenin with small amplicons.

PubMed

Kim, Eun Hye; Lee, Hwan Young; Yang, In Seok; Jung, Sang-Eun; Yang, Woo Ick; Shin, Kyoung-Jin

2016-05-01

The next-generation sequencing (NGS) method has been utilized to analyze short tandem repeat (STR) markers, which are routinely used for human identification purposes in the forensic field. Some researchers have demonstrated the successful application of the NGS system to STR typing, suggesting that NGS technology may be an alternative or additional method to overcome limitations of capillary electrophoresis (CE)-based STR profiling. However, there has been no available multiplex PCR system that is optimized for NGS analysis of forensic STR markers. Thus, we constructed a multiplex PCR system for the NGS analysis of 18 markers (13CODIS STRs, D2S1338, D19S433, Penta D, Penta E and amelogenin) by designing amplicons in the size range of 77-210 base pairs. Then, PCR products were generated from two single-sources, mixed samples and artificially degraded DNA samples using a multiplex PCR system, and were prepared for sequencing on the MiSeq system through construction of a subsequent barcoded library. By performing NGS and analyzing the data, we confirmed that the resultant STR genotypes were consistent with those of CE-based typing. Moreover, sequence variations were detected in targeted STR regions. Through the use of small-sized amplicons, the developed multiplex PCR system enables researchers to obtain successful STR profiles even from artificially degraded DNA as well as STR loci which are analyzed with large-sized amplicons in the CE-based commercial kits. In addition, successful profiles can be obtained from mixtures up to a 1:19 ratio. Consequently, the developed multiplex PCR system, which produces small size amplicons, can be successfully applied to STR NGS analysis of forensic casework samples such as mixtures and degraded DNA samples. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Characterization of the in vitro expressed autoimmune rippling muscle disease immunogenic domain of human titin encoded by TTN exons 248-249

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zelinka, L.; McCann, S.; Budde, J.

2011-08-05

Highlights: {yields} Affinity purification of the autoimmune rippling muscle disease immunogenic domain of titin. {yields} Partial sequence analysis confirms that the peptides is in the I band region of titin. {yields} This region of the human titin shows high degree of homology to mouse titin N2-A. -- Abstract: Autoimmune rippling muscle disease (ARMD) is an autoimmune neuromuscular disease associated with myasthenia gravis (MG). Past studies in our laboratory recognized a very high molecular weight skeletal muscle protein antigen identified by ARMD patient antisera as the titin isoform. These past studies used antisera from ARMD and MG patients as probes tomore » screen a human skeletal muscle cDNA library and several pBluescript clones revealed supporting expression of immunoreactive peptides. This study characterizes the products of subcloning the titin immunoreactive domain into pGEX-3X and the subsequent fusion protein. Sequence analysis of the fusion gene indicates the cloned titin domain (GenBank ID: (EU428784)) is in frame and is derived from a sequence of N2-A spanning the exons 248-250 an area that encodes the fibronectin III domain. PCR and EcoR1 restriction mapping studies have demonstrated that the inserted cDNA is of a size that is predicted by bioinformatics analysis of the subclone. Expression of the fusion protein result in the isolation of a polypeptide of 52 kDa consistent with the predicted inferred amino acid sequence. Immunoblot experiments of the fusion protein, using rippling muscle/myasthenia gravis antisera, demonstrate that only the titin domain is immunoreactive.« less
A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus.

PubMed

Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh

2013-12-13

Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.
The Use of a Combined Bioinformatics Approach to Locate Antibiotic Resistance Genes on Plasmids From Whole Genome Sequences of Salmonella enterica Serovars From Humans in Ghana.

PubMed

Kudirkiene, Egle; Andoh, Linda A; Ahmed, Shahana; Herrero-Fresno, Ana; Dalsgaard, Anders; Obiri-Danso, Kwasi; Olsen, John E

2018-01-01

In the current study, we identified plasmids carrying antimicrobial resistance genes in draft whole genome sequences of 16 selected Salmonella enterica isolates representing six different serovars from humans in Ghana. The plasmids and the location of resistance genes in the genomes were predicted using a combination of PlasmidFinder, ResFinder, plasmidSPAdes and BLAST genomic analysis tools. Subsequently, S1-PFGE was employed for analysis of plasmid profiles. Whole genome sequencing confirmed the presence of antimicrobial resistance genes in Salmonella isolates showing multidrug resistance phenotypically. ESBL, either bla TEM52-B or bla CTX-M15 were present in two cephalosporin resistant isolates of S . Virchow and S . Poona, respectively. The systematic genome analysis revealed the presence of different plasmids in different serovars, with or without insertion of antimicrobial resistance genes. In S . Enteritidis, resistance genes were carried predominantly on plasmids of IncN type, in S . Typhimurium on plasmids of IncFII(S)/IncFIB(S)/IncQ1 type. In S . Virchow and in S . Poona, resistance genes were detected on plasmids of IncX1 and TrfA/IncHI2/IncHI2A type, respectively. The latter two plasmids were described for the first time in these serovars. The combination of genomic analytical tools allowed nearly full mapping of the resistance plasmids in all Salmonella strains analyzed. The results suggest that the improved analytical approach used in the current study may be used to identify plasmids that are specifically associated with resistance phenotypes in whole genome sequences. Such knowledge would allow the development of rapid multidrug resistance tracking tools in Salmonella populations using WGS.
Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks.

PubMed

Smoot, James C; Barbian, Kent D; Van Gompel, Jamie J; Smoot, Laura M; Chaussee, Michael S; Sylva, Gail L; Sturdevant, Daniel E; Ricklefs, Stacy M; Porcella, Stephen F; Parkins, Larye D; Beres, Stephen B; Campbell, David S; Smith, Todd M; Zhang, Qing; Kapur, Vivek; Daly, Judy A; Veasy, L George; Musser, James M

2002-04-02

Acute rheumatic fever (ARF), a sequelae of group A Streptococcus (GAS) infection, is the most common cause of preventable childhood heart disease worldwide. The molecular basis of ARF and the subsequent rheumatic heart disease are poorly understood. Serotype M18 GAS strains have been associated for decades with ARF outbreaks in the U.S. As a first step toward gaining new insight into ARF pathogenesis, we sequenced the genome of strain MGAS8232, a serotype M18 organism isolated from a patient with ARF. The genome is a circular chromosome of 1,895,017 bp, and it shares 1.7 Mb of closely related genetic material with strain SF370 (a sequenced serotype M1 strain). Strain MGAS8232 has 178 ORFs absent in SF370. Phages, phage-like elements, and insertion sequences are the major sources of variation between the genomes. The genomes of strain MGAS8232 and SF370 encode many of the same proven or putative virulence factors. Importantly, strain MGAS8232 has genes encoding many additional secreted proteins involved in human-GAS interactions, including streptococcal pyrogenic exotoxin A (scarlet fever toxin) and two uncharacterized pyrogenic exotoxin homologues, all phage-associated. DNA microarray analysis of 36 serotype M18 strains from diverse localities showed that most regions of variation were phages or phage-like elements. Two epidemics of ARF occurring 12 years apart in Salt Lake City, UT, were caused by serotype M18 strains that were genetically identical, or nearly so. Our analysis provides a critical foundation for accelerated research into ARF pathogenesis and a molecular framework to study the plasticity of GAS genomes.
Genome Sequence and Analysis of the Oral Bacterium Fusobacterium nucleatum Strain ATCC 25586

PubMed Central

Kapatral, Vinayak; Anderson, Iain; Ivanova, Natalia; Reznik, Gary; Los, Tamara; Lykidis, Athanasios; Bhattacharyya, Anamitra; Bartman, Allen; Gardner, Warren; Grechkin, Galina; Zhu, Lihua; Vasieva, Olga; Chu, Lien; Kogan, Yakov; Chaga, Oleg; Goltsman, Eugene; Bernal, Axel; Larsen, Niels; D'Souza, Mark; Walunas, Theresa; Pusch, Gordon; Haselkorn, Robert; Fonstein, Michael; Kyrpides, Nikos; Overbeek, Ross

2002-01-01

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H2S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth. PMID:11889109
Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels

NASA Astrophysics Data System (ADS)

Moghaddasi, Hanieh; Khalifeh, Khosrow; Darooneh, Amir Hossein

2017-01-01

Functional DNA sub-sequences and genome elements are spatially clustered through the genome just as keywords in literary texts. Therefore, some of the methods for ranking words in texts can also be used to compare different DNA sub-sequences. In analogy with the literary texts, here we claim that the distribution of distances between the successive sub-sequences (words) is q-exponential which is the distribution function in non-extensive statistical mechanics. Thus the q-parameter can be used as a measure of words clustering levels. Here, we analyzed the distribution of distances between consecutive occurrences of 16 possible dinucleotides in human chromosomes to obtain their corresponding q-parameters. We found that CG as a biologically important two-letter word concerning its methylation, has the highest clustering level. This finding shows the predicting ability of the method in biology. We also proposed that chromosome 18 with the largest value of q-parameter for promoters of genes is more sensitive to dietary and lifestyle. We extended our study to compare the genome of some selected organisms and concluded that the clustering level of CGs increases in higher evolutionary organisms compared to lower ones.

Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package

PubMed Central

Kumar, Yadhu; Westram, Ralf; Kipfer, Peter; Meier, Harald; Ludwig, Wolfgang

2006-01-01

Background Availability of high-resolution RNA crystal structures for the 30S and 50S ribosomal subunits and the subsequent validation of comparative secondary structure models have prompted the biologists to use three-dimensional structure of ribosomal RNA (rRNA) for evaluating sequence alignments of rRNA genes. Furthermore, the secondary and tertiary structural features of rRNA are highly useful and successfully employed in designing rRNA targeted oligonucleotide probes intended for in situ hybridization experiments. RNA3D, a program to combine sequence alignment information with three-dimensional structure of rRNA was developed. Integration into ARB software package, which is used extensively by the scientific community for phylogenetic analysis and molecular probe designing, has substantially extended the functionality of ARB software suite with 3D environment. Results Three-dimensional structure of rRNA is visualized in OpenGL 3D environment with the abilities to change the display and overlay information onto the molecule, dynamically. Phylogenetic information derived from the multiple sequence alignments can be overlaid onto the molecule structure in a real time. Superimposition of both statistical and non-statistical sequence associated information onto the rRNA 3D structure can be done using customizable color scheme, which is also applied to a textual sequence alignment for reference. Oligonucleotide probes designed by ARB probe design tools can be mapped onto the 3D structure along with the probe accessibility models for evaluation with respect to secondary and tertiary structural conformations of rRNA. Conclusion Visualization of three-dimensional structure of rRNA in an intuitive display provides the biologists with the greater possibilities to carry out structure based phylogenetic analysis. Coupled with secondary structure models of rRNA, RNA3D program aids in validating the sequence alignments of rRNA genes and evaluating probe target sites. Superimposition of the information derived from the multiple sequence alignment onto the molecule dynamically allows the researchers to observe any sequence inherited characteristics (phylogenetic information) in real-time environment. The extended ARB software package is made freely available for the scientific community via . PMID:16672074
MitoRes: a resource of nuclear-encoded mitochondrial genes and their products in Metazoa.

PubMed

Catalano, Domenico; Licciulli, Flavio; Turi, Antonio; Grillo, Giorgio; Saccone, Cecilia; D'Elia, Domenica

2006-01-24

Mitochondria are sub-cellular organelles that have a central role in energy production and in other metabolic pathways of all eukaryotic respiring cells. In the last few years, with more and more genomes being sequenced, a huge amount of data has been generated providing an unprecedented opportunity to use the comparative analysis approach in studies of evolution and functional genomics with the aim of shedding light on molecular mechanisms regulating mitochondrial biogenesis and metabolism. In this context, the problem of the optimal extraction of representative datasets of genomic and proteomic data assumes a crucial importance. Specialised resources for nuclear-encoded mitochondria-related proteins already exist; however, no mitochondrial database is currently available with the same features of MitoRes, which is an update of the MitoNuc database extensively modified in its structure, data sources and graphical interface. It contains data on nuclear-encoded mitochondria-related products for any metazoan species for which this type of data is available and also provides comprehensive sequence datasets (gene, transcript and protein) as well as useful tools for their extraction and export. MitoRes http://www2.ba.itb.cnr.it/MitoRes/ consolidates information from publicly external sources and automatically annotates them into a relational database. Additionally, it also clusters proteins on the basis of their sequence similarity and interconnects them with genomic data. The search engine and sequence management tools allow the query/retrieval of the database content and the extraction and export of sequences (gene, transcript, protein) and related sub-sequences (intron, exon, UTR, CDS, signal peptide and gene flanking regions) ready to be used for in silico analysis. The tool we describe here has been developed to support lab scientists and bioinformaticians alike in the characterization of molecular features and evolution of mitochondrial targeting sequences. The way it provides for the retrieval and extraction of sequences allows the user to overcome the obstacles encountered in the integrative use of different bioinformatic resources and the completeness of the sequence collection allows intra- and interspecies comparison at different biological levels (gene, transcript and protein).
Analysis of p53 gene mutations in human gliomas by polymerase chain reaction-based single-strand conformation polymorphism and DNA sequencing.

PubMed

Sarkar, F H; Kupsky, W J; Li, Y W; Sreepathi, P

1994-03-01

Mutations in the p53 gene have been recognized in brain tumors, and clonal expansion of p53 mutant cells has been shown to be associated with glioma progression. However, studies on the p53 gene have been limited by the need for frozen tissues. We have developed a method utilizing polymerase chain reaction (PCR) for the direct analysis of p53 mutation by single-strand conformation polymorphism (SSCP) and by direct DNA sequencing of the p53 gene using a single 10-microns paraffin-embedded tissue section. We applied this method to screen for p53 gene mutations in exons 5-8 in human gliomas utilizing paraffin-embedded tissues. Twenty paraffin blocks containing tumor were selected from surgical specimens from 17 different adult patients. Tumors included six anaplastic astrocytomas (AAs), nine glioblastomas (GBs), and two mixed malignant gliomas (MMGs). The tissue section on the stained glass slide was used to guide microdissection of an unstained adjacent tissue section to ensure > 90% of the tumor cell population for p53 mutational analysis. Simultaneously, microdissection of the tissue was also carried out to obtain normal tissue from adjacent areas as a control. Mutations in the p53 gene were identified in 3 of 17 (18%) patients by PCR-SSCP analysis and subsequently confirmed by PCR-based DNA sequencing. Mutations in exon 5 resulting in amino acid substitution were found in one thalamic AA (codon 158, CGC > CTT: Arg > Leu) and one cerebral hemispheric GB (codon 151, CCG > CTG: Pro > Leu).(ABSTRACT TRUNCATED AT 250 WORDS)
Cloning, Phylogenetic Analysis, and Distribution of Free Fatty Acid Receptor GPR120 Expression along the Gastrointestinal Tract of Housing versus Grazing Kid Goats.

PubMed

Ran, Tao; Li, Hengzhi; Liu, Yong; Zhou, Chuanshe; Tang, Shaoxun; Han, Xuefeng; Wang, Min; He, Zhixiong; Kang, Jinghe; Yan, Qiongxian; Tan, Zhiliang; Beauchemin, Karen A

2016-03-23

G-protein-coupled receptor 120 (GPR120) is reported as a long-chain fatty acid (LCFA) receptor that elicits free fatty acid (FFA) regulation on metabolism homeostasis. The study aimed to clone the gpr120 gene of goats (g-GPR120) and subsequently investigate phylogenetic analysis and tissue distribution throughout the digestive tracts of kid goats, as well as the effect of housing versus grazing (H vs G) feeding systems on GPR120 expression. Partial coding sequence (CDS) of g-GPR120 was cloned and submitted to NCBI (accession no. KU161270 ). Phylogenetic analysis revealed that g-GPR120 shared higher homology in both mRNA and amino acid sequences for ruminants than nonruminants. Immunochemistry, real-time PCR, and Western blot analysis showed that g-GPR120 was expressed throughout the digestive tracts of goats. The expression of g-GPR120 was affected by feeding system and age, with greater expression of g-GPR120 in the G group. It was concluded that the g-GPR120-mediated LCFA chemosensing mechanism is widely present in the tongue and gastrointestinal tract of goats and that its expression can be affected by feeding system and age.
Revisiting Robustness and Evolvability: Evolution in Weighted Genotype Spaces

PubMed Central

Partha, Raghavendran; Raman, Karthik

2014-01-01

Robustness and evolvability are highly intertwined properties of biological systems. The relationship between these properties determines how biological systems are able to withstand mutations and show variation in response to them. Computational studies have explored the relationship between these two properties using neutral networks of RNA sequences (genotype) and their secondary structures (phenotype) as a model system. However, these studies have assumed every mutation to a sequence to be equally likely; the differences in the likelihood of the occurrence of various mutations, and the consequence of probabilistic nature of the mutations in such a system have previously been ignored. Associating probabilities to mutations essentially results in the weighting of genotype space. We here perform a comparative analysis of weighted and unweighted neutral networks of RNA sequences, and subsequently explore the relationship between robustness and evolvability. We show that assuming an equal likelihood for all mutations (as in an unweighted network), underestimates robustness and overestimates evolvability of a system. In spite of discarding this assumption, we observe that a negative correlation between sequence (genotype) robustness and sequence evolvability persists, and also that structure (phenotype) robustness promotes structure evolvability, as observed in earlier studies using unweighted networks. We also study the effects of base composition bias on robustness and evolvability. Particularly, we explore the association between robustness and evolvability in a sequence space that is AU-rich – sequences with an AU content of 80% or higher, compared to a normal (unbiased) sequence space. We find that evolvability of both sequences and structures in an AU-rich space is lesser compared to the normal space, and robustness higher. We also observe that AU-rich populations evolving on neutral networks of phenotypes, can access less phenotypic variation compared to normal populations evolving on neutral networks. PMID:25390641
Identification of a novel MIP frameshift mutation associated with congenital cataract in a Chinese family by whole-exome sequencing and functional analysis.

PubMed

Long, Xigui; Huang, Yanru; Tan, Hu; Li, Zhuo; Zhang, Rui; Linpeng, Siyuan; Lv, Weigang; Cao, Yingxi; Li, Haoxian; Liang, Desheng; Wu, Lingqian

2018-04-26

To detect the underlying pathogenesis of congenital cataract in a four-generation Chinese family. Whole-exome sequencing (WES) of family members (III:4, IV:4, and IV:6) was performed. Sanger sequencing and bioinformatics analysis were subsequently conducted. Full-length WT-MIP or K228fs-MIP fused to HA markers at the N-terminal was transfected into HeLa cells. Next, quantitative real-time PCR, western blotting and immunofluorescence confocal laser scanning were performed. The age of onset for nonsyndromic cataracts in male patients was by 1-year old, earlier than for female patients, who exhibited onset at adulthood. A novel c.682_683delAA (p.K228fs230X) mutation in main intrinsic protein (MIP) cosegregated with the cataract phenotype. The instability index and unfolded states for truncated MIP were predicted to increase by bioinformatics analysis. The mRNA transcription level of K228fs-MIP was reduced compared with that of WT-MIP, and K228fs-MIP protein expression was also lower than that of WT-MIP. Immunofluorescence images showed that WT-MIP principally localized to the plasma membrane, whereas the mutant protein was trapped in the cytoplasm. Our study generated genetic and primary functional evidence for a novel c.682_683delAA mutation in MIP that expands the variant spectrum of MIP and help us better understand the molecular basis of cataract.
Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

PubMed Central

2011-01-01

Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510
Further Confirmation of Germline Glioma Risk Variant rs78378222 in TP53 and Its Implication in Tumor Tissues via Integrative Analysis of TCGA Data

PubMed Central

Wang, Zhaoming; Rajaraman, Preetha; Melin, Beatrice S.; Chung, Charles C.; Zhang, Weijia; McKean-Cowdin, Roberta; Michaud, Dominique; Yeager, Meredith; Ahlbom, Anders; Albanes, Demetrius; Andersson, Ulrika; Beane Freeman, Laura E.; Buring, Julie E.; Butler, Mary Ann; Carreón, Tania; Feychting, Maria; Gapstur, Susan M.; Gaziano, J. Michael; Giles, Graham G.; Hallmans, Goran; Henriksson, Roger; Hoffman-Bolton, Judith; Inskip, Peter D.; Kitahara, Cari M.; Le Marchand, Loic; Linet, Martha S.; Li, Shengchao; Peters, Ulrike; Purdue, Mark P.; Rothman, Nathaniel; Ruder, Avima M.; Sesso, Howard D.; Severi, Gianluca; Stampfer, Meir; Stevens, Victoria L.; Visvanathan, Kala; Wang, Sophia S.; White, Emily; Zeleniuch-Jacquotte, Anne; Hoover, Robert; Fraumeni, Joseph F.; Chatterjee, Nilanjan; Hartge, Patricia; Chanock, Stephen J.

2016-01-01

We confirmed strong association of rs78378222:A>C (per allele odds ratio [OR] = 3.14; P = 6.48 × 10−11), a germline rare single-nucleotide polymorphism (SNP) in TP53, via imputation of a genome-wide association study of glioma (1,856 cases and 4,955 controls). We subsequently performed integrative analyses on the Cancer Genome Atlas (TCGA) data for GBM (glioblastoma multiforme) and LUAD (lung adenocarcinoma). Based on SNP data, we imputed genotypes for rs78378222 and selected individuals carrying rare risk allele (C). Using RNA sequencing data, we observed aberrant transcripts with ~3 kb longer than normal for those individuals. Using exome sequencing data, we further showed that loss of haplotype carrying common protective allele (A) occurred somatically in GBM but not in LUAD. Our bioinformatic analysis suggests rare risk allele (C) disrupts mRNA termination, and an allelic loss of a genomic region harboring common protective allele (A) occurs during tumor initiation or progression for glioma. PMID:25907361
The first determination of Trichuris sp. from roe deer by amplification and sequenation of the ITS1-5.8S-ITS2 segment of ribosomal DNA.

PubMed

Salaba, O; Rylková, K; Vadlejch, J; Petrtýl, M; Scháňková, S; Brožová, A; Jankovská, I; Jebavý, L; Langrová, I

2013-03-01

Trichuris nematodes were isolated from roe deer (Capreolus capreolus). At first, nematodes were determined using morphological and biometrical methods. Subsequently genomic DNA was isolated and the ITS1-5.8S-ITS2 segment from ribosomal DNA (RNA) was amplified and sequenced using PCR techniques. With u sing morphological and biometrical methods, female nematodes were identified as Trichuris globulosa, and the only male was identified as Trichuris ovis. The females were classified into four morphotypes. However, analysis of the internal transcribed spacers (ITS1-5.8S-ITS2) of specimens did not confirm this classification. Moreover, the female individuals morphologically determined as T. globulosa were molecularly identified as Trichuris discolor. In the case of the only male molecular analysis match the result of the molecular identification. Furthermore, a comparative phylogenetic study was carried out with the ITS1 and ITS2 sequences of the Trichuris species from various hosts. A comparison of biometric information from T. discolor individuals from this study was also conducted.
Using the QCM Biosensor-Based T7 Phage Display Combined with Bioinformatics Analysis for Target Identification of Bioactive Small Molecule.

PubMed

Takakusagi, Yoichi; Takakusagi, Kaori; Sugawara, Fumio; Sakaguchi, Kengo

2018-01-01

Identification of target proteins that directly bind to bioactive small molecule is of great interest in terms of clarifying the mode of action of the small molecule as well as elucidating the biological phenomena at the molecular level. Of the experimental technologies available, T7 phage display allows comprehensive screening of small molecule-recognizing amino acid sequence from the peptide libraries displayed on the T7 phage capsid. Here, we describe the T7 phage display strategy that is combined with quartz-crystal microbalance (QCM) biosensor for affinity selection platform and bioinformatics analysis for small molecule-recognizing short peptides. This method dramatically enhances efficacy and throughput of the screening for small molecule-recognizing amino acid sequences without repeated rounds of selection. Subsequent execution of bioinformatics programs allows combinatorial and comprehensive target protein discovery of small molecules with its binding site, regardless of protein sample insolubility, instability, or inaccessibility of the fixed small molecules to internally located binding site on larger target proteins when conventional proteomics approaches are used.
Genome-Wide Analysis of A-to-I RNA Editing.

PubMed

Savva, Yiannis A; Laurent, Georges St; Reenan, Robert A

2016-01-01

Adenosine (A)-to-inosine (I) RNA editing is a fundamental posttranscriptional modification that ensures the deamination of A-to-I in double-stranded (ds) RNA molecules. Intriguingly, the A-to-I RNA editing system is particularly active in the nervous system of higher eukaryotes, altering a plethora of noncoding and coding sequences. Abnormal RNA editing is highly associated with many neurological phenotypes and neurodevelopmental disorders. However, the molecular mechanisms underlying RNA editing-mediated pathogenesis still remain enigmatic and have attracted increasing attention from researchers. Over the last decade, methods available to perform genome-wide transcriptome analysis, have evolved rapidly. Within the RNA editing field researchers have adopted next-generation sequencing technologies to identify RNA-editing sites within genomes and to elucidate the underlying process. However, technical challenges associated with editing site discovery have hindered efforts to uncover comprehensive editing site datasets, resulting in the general perception that the collections of annotated editing sites represent only a small minority of the total number of sites in a given organism, tissue, or cell type of interest. Additionally to doubts about sensitivity, existing RNA-editing site lists often contain high percentages of false positives, leading to uncertainty about their validity and usefulness in downstream studies. An accurate investigation of A-to-I editing requires properly validated datasets of editing sites with demonstrated and transparent levels of sensitivity and specificity. Here, we describe a high signal-to-noise method for RNA-editing site detection using single-molecule sequencing (SMS). With this method, authentic RNA-editing sites may be differentiated from artifacts. Machine learning approaches provide a procedure to improve upon and experimentally validate sequencing outcomes through use of computationally predicted, iterative feedback loops. Subsequent use of extensive Sanger sequencing validations can generate accurate editing site lists. This approach has broad application and accurate genome-wide editing analysis of various tissues from clinical specimens or various experimental organisms is now a possibility.
Comparison of Molecular Typing Methods Useful for Detecting Clusters of Campylobacter jejuni and C. coli Isolates through Routine Surveillance

PubMed Central

Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King

2012-01-01

Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562
Genotyping and Molecular Identification of Date Palm Cultivars Using Inter-Simple Sequence Repeat (ISSR) Markers.

PubMed

Ayesh, Basim M

2017-01-01

Molecular markers are credible for the discrimination of genotypes and estimation of the extent of genetic diversity and relatedness in a set of genotypes. Inter-simple sequence repeat (ISSR) markers rapidly reveal high polymorphic fingerprints and have been used frequently to determine the genetic diversity among date palm cultivars. This chapter describes the application of ISSR markers for genotyping of date palm cultivars. The application involves extraction of genomic DNA from the target cultivars with reliable quality and quantity. Subsequently the extracted DNA serves as a template for amplification of genomic regions flanked by inverted simple sequence repeats using a single primer. The similarity of each pair of samples is measured by calculating the number of mono- and polymorphic bands revealed by gel electrophoresis. Matrices constructed for similarity and genetic distance are used to build a phylogenetic tree and cluster analysis, to determine the molecular relatedness of cultivars. The protocol describes 3 out of 9 tested primers consistently amplified 31 loci in 6 date palm cultivars, with 28 polymorphic loci.
Grapevine fleck virus-like viruses in Vitis.

PubMed

Sabanadzovic, S; Abou-Ghanem, N; Castellano, M A; Digiaro, M; Martelli, G P

2000-01-01

Two sets of degenerate primers for the specific amplification of 572-575 nt and 386 nt segments of the methyltransferase and RNA- dependent RNA polymerase cistrons of members of the genera Tymovirus and Marafivirus and of the unassigned virus Grapevine fleck virus (GFkV) were designed on the basis of available sequences. These primers were used for amplifying and subsequent cloning and sequencing part of the open reading frame 1 of the genome of GFkV, Grapevine asteroid mosaic-associated virus (GAMaV) and of another previously unreported virus, for which the name Grapevine red globe virus (GRGV) is proposed. Computer-assisted analysis of the amplified genome portions showed that the three grapevine viruses are phylogenetically related with one another and with sequenced tymoviruses and marafiviruses. The relationships with tymoviruses was confirmed by the type of ultrastructural modifications induced in the host cells. RdRp-specific degenerate primers were successfully used for the aspecific detection of the three viruses in crude grapevine sap extracts. Specific virus identification was obtained with RT-PCR using antisense virus-specific primers.
African Swine Fever Virus Isolate, Georgia, 2007

PubMed Central

Rowlands, Rebecca J.; Michaud, Vincent; Heath, Livio; Hutchings, Geoff; Oura, Chris; Vosloo, Wilna; Dwarka, Rahana; Onashvili, Tinatin; Albina, Emmanuel

2008-01-01

African swine fever (ASF) is widespread in Africa but is rarely introduced to other continents. In June 2007, ASF was confirmed in the Caucasus region of Georgia, and it has since spread to neighboring countries. DNA fragments amplified from the genome of the isolates from domestic pigs in Georgia in 2007 were sequenced and compared with other ASF virus (ASFV) isolates to establish the genotype of the virus. Sequences were obtained from 4 genome regions, including part of the gene B646L that encodes the p72 capsid protein, the complete E183L and CP204L genes, which encode the p54 and p30 proteins and the variable region of the B602L gene. Analysis of these sequences indicated that the Georgia 2007 isolate is closely related to isolates belonging to genotype II, which is circulating in Mozambique, Madagascar, and Zambia. One possibility for the spread of disease to Georgia is that pigs were fed ASFV-contaminated pork brought in on ships and, subsequently, the disease was disseminated throughout the region. PMID:19046509
Whole-exome sequencing, without prior linkage, identifies a mutation in LAMB3 as a cause of dominant hypoplastic amelogenesis imperfecta.

PubMed

Poulter, James A; El-Sayed, Walid; Shore, Roger C; Kirkham, Jennifer; Inglehearn, Chris F; Mighell, Alan J

2014-01-01

The conventional approach to identifying the defective gene in a family with an inherited disease is to find the disease locus through family studies. However, the rapid development and decreasing cost of next generation sequencing facilitates a more direct approach. Here, we report the identification of a frameshift mutation in LAMB3 as a cause of dominant hypoplastic amelogenesis imperfecta (AI). Whole-exome sequencing of three affected family members and subsequent filtering of shared variants, without prior genetic linkage, sufficed to identify the pathogenic variant. Simultaneous analysis of multiple family members confirms segregation, enhancing the power to filter the genetic variation found and leading to rapid identification of the pathogenic variant. LAMB3 encodes a subunit of Laminin-5, one of a family of basement membrane proteins with essential functions in cell growth, movement and adhesion. Homozygous LAMB3 mutations cause junctional epidermolysis bullosa (JEB) and enamel defects are seen in JEB cases. However, to our knowledge, this is the first report of dominant AI due to a LAMB3 mutation in the absence of JEB.
Analysis of Microbe-Associated Molecular Pattern-Responsive Synthetic Promoters with the Parsley Protoplast System.

PubMed

Kanofsky, Konstantin; Lehmeyer, Mona; Schulze, Jutta; Hehl, Reinhard

2016-01-01

Plants recognize pathogens by microbe-associated molecular patterns (MAMPs) and subsequently induce an immune response. The regulation of gene expression during the immune response depends largely on cis-sequences conserved in promoters of MAMP-responsive genes. These cis-sequences can be analyzed by constructing synthetic promoters linked to a reporter gene and by testing these constructs in transient expression systems. Here, the use of the parsley (Petroselinum crispum) protoplast system for analyzing MAMP-responsive synthetic promoters is described. The synthetic promoter consists of four copies of a potential MAMP-responsive cis-sequence cloned upstream of a minimal promoter and the uidA reporter gene. The reporter plasmid contains a second reporter gene, which is constitutively expressed and hence eliminates the requirement of a second plasmid used as a transformation control. The reporter plasmid is transformed into parsley protoplasts that are elicited by the MAMP Pep25. The MAMP responsiveness is validated by comparing the reporter gene activity from MAMP-treated and untreated cells and by normalizing reporter gene activity using the constitutively expressed reporter gene.
Whole-exome sequencing for diagnosis of hereditary ichthyosis.

PubMed

Sitek, J C; Kulseth, M A; Rypdal, K B; Skodje, T; Sheng, Y; Retterstøl, L

2018-02-14

Hereditary ichthyosis constitutes a diverse group of cornification disorders. Identification of the molecular cause facilitates optimal patient care. We wanted to estimate the diagnostic yield of applying whole-exome sequencing (WES) in the routine genetic workup of inherited ichthyosis. During a 3-year-period, all ichthyosis patients, except X-linked and mild vulgar ichthyosis, consecutively admitted to a university hospital clinic were offered WES with subsequent analysis of ichthyosis-related genes as a first-line genetic investigation. Clinical and molecular data have been collected retrospectively. Genetic variants causative for the ichthyosis were identified in 27 of 34 investigated patients (79.4%). In all, 31 causative mutations across 13 genes were disclosed, including 12 novel variants. TGM1 was the most frequently mutated gene, accounting for 43.7% of patients suffering from autosomal recessive congenital ichthyosis (ARCI). Whole-exome sequencing appears an effective tool in disclosing the molecular cause of patients with hereditary ichthyosis seen in clinical practice and should be considered a first-tier genetic test in these patients. © 2018 European Academy of Dermatology and Venereology.
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

PubMed Central

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-01-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.

PubMed

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-09-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.

Analysis of BAC-end sequences (BESs) and development of BES-SSR markers for genetic mapping and hybrid purity assessment in pigeonpea (Cajanus spp.)

PubMed Central

2011-01-01

Background Pigeonpea [Cajanus cajan (L.) Millsp.] is an important legume crop of rainfed agriculture. Despite of concerted research efforts directed to pigeonpea improvement, stagnated productivity of pigeonpea during last several decades may be accounted to prevalence of various biotic and abiotic constraints and the situation is exacerbated by availability of inadequate genomic resources to undertake any molecular breeding programme for accelerated crop improvement. With the objective of enhancing genomic resources for pigeonpea, this study reports for the first time, large scale development of SSR markers from BAC-end sequences and their subsequent use for genetic mapping and hybridity testing in pigeonpea. Results A set of 88,860 BAC (bacterial artificial chromosome)-end sequences (BESs) were generated after constructing two BAC libraries by using HindIII (34,560 clones) and BamHI (34,560 clones) restriction enzymes. Clustering based on sequence identity of BESs yielded a set of >52K non-redundant sequences, comprising 35 Mbp or >4% of the pigeonpea genome. These sequences were analyzed to develop annotation lists and subdivide the BESs into genome fractions (e.g., genes, retroelements, transpons and non-annotated sequences). Parallel analysis of BESs for microsatellites or simple sequence repeats (SSRs) identified 18,149 SSRs, from which a set of 6,212 SSRs were selected for further analysis. A total of 3,072 novel SSR primer pairs were synthesized and tested for length polymorphism on a set of 22 parental genotypes of 13 mapping populations segregating for traits of interest. In total, we identified 842 polymorphic SSR markers that will have utility in pigeonpea improvement. Based on these markers, the first SSR-based genetic map comprising of 239 loci was developed for this previously uncharacterized genome. Utility of developed SSR markers was also demonstrated by identifying a set of 42 markers each for two hybrids (ICPH 2671 and ICPH 2438) for genetic purity assessment in commercial hybrid breeding programme. Conclusion In summary, while BAC libraries and BESs should be useful for genomics studies, BES-SSR markers, and the genetic map should be very useful for linking the genetic map with a future physical map as well as for molecular breeding in pigeonpea. PMID:21447154
Insights into the fold organization of TIM barrel from interaction energy based structure networks.

PubMed

Vijayabaskar, M S; Vishveshwara, Saraswathi

2012-01-01

There are many well-known examples of proteins with low sequence similarity, adopting the same structural fold. This aspect of sequence-structure relationship has been extensively studied both experimentally and theoretically, however with limited success. Most of the studies consider remote homology or "sequence conservation" as the basis for their understanding. Recently "interaction energy" based network formalism (Protein Energy Networks (PENs)) was developed to understand the determinants of protein structures. In this paper we have used these PENs to investigate the common non-covalent interactions and their collective features which stabilize the TIM barrel fold. We have also developed a method of aligning PENs in order to understand the spatial conservation of interactions in the fold. We have identified key common interactions responsible for the conservation of the TIM fold, despite high sequence dissimilarity. For instance, the central beta barrel of the TIM fold is stabilized by long-range high energy electrostatic interactions and low-energy contiguous vdW interactions in certain families. The other interfaces like the helix-sheet or the helix-helix seem to be devoid of any high energy conserved interactions. Conserved interactions in the loop regions around the catalytic site of the TIM fold have also been identified, pointing out their significance in both structural and functional evolution. Based on these investigations, we have developed a novel network based phylogenetic analysis for remote homologues, which can perform better than sequence based phylogeny. Such an analysis is more meaningful from both structural and functional evolutionary perspective. We believe that the information obtained through the "interaction conservation" viewpoint and the subsequently developed method of structure network alignment, can shed new light in the fields of fold organization and de novo computational protein design.
Global DNA methylation analysis using methyl-sensitive amplification polymorphism (MSAP).

PubMed

Yaish, Mahmoud W; Peng, Mingsheng; Rothstein, Steven J

2014-01-01

DNA methylation is a crucial epigenetic process which helps control gene transcription activity in eukaryotes. Information regarding the methylation status of a regulatory sequence of a particular gene provides important knowledge of this transcriptional control. DNA methylation can be detected using several methods, including sodium bisulfite sequencing and restriction digestion using methylation-sensitive endonucleases. Methyl-Sensitive Amplification Polymorphism (MSAP) is a technique used to study the global DNA methylation status of an organism and hence to distinguish between two individuals based on the DNA methylation status determined by the differential digestion pattern. Therefore, this technique is a useful method for DNA methylation mapping and positional cloning of differentially methylated genes. In this technique, genomic DNA is first digested with a methylation-sensitive restriction enzyme such as HpaII, and then the DNA fragments are ligated to adaptors in order to facilitate their amplification. Digestion using a methylation-insensitive isoschizomer of HpaII, MspI is used in a parallel digestion reaction as a loading control in the experiment. Subsequently, these fragments are selectively amplified by fluorescently labeled primers. PCR products from different individuals are compared, and once an interesting polymorphic locus is recognized, the desired DNA fragment can be isolated from a denaturing polyacrylamide gel, sequenced and identified based on DNA sequence similarity to other sequences available in the database. We will use analysis of met1, ddm1, and atmbd9 mutants and wild-type plants treated with a cytidine analogue, 5-azaC, or zebularine to demonstrate how to assess the genetic modulation of DNA methylation in Arabidopsis. It should be noted that despite the fact that MSAP is a reliable technique used to fish for polymorphic methylated loci, its power is limited to the restriction recognition sites of the enzymes used in the genomic DNA digestion.
The evolutionary implications of knox-I gene duplications in conifers: correlated evidence from phylogeny, gene mapping, and analysis of functional divergence.

PubMed

Guillet-Claude, Carine; Isabel, Nathalie; Pelgas, Betty; Bousquet, Jean

2004-12-01

Class I knox genes code for transcription factors that play an essential role in plant growth and development as central regulators of meristem cell identity. Based on the analysis of new cDNA sequences from various tissues and genomic DNA sequences, we identified a highly diversified group of class I knox genes in conifers. Phylogenetic analyses of complete amino acid sequences from various seed plants indicated that all conifer sequences formed a monophyletic group. Within conifers, four subgroups here named genes KN1 to KN4 were well delineated, each regrouping pine and spruce sequences. KN4 was sister group to KN3, which was sister group to KN1 and KN2. Genetic mapping on the genomes of two divergent Picea species indicated that KN1 and KN2 are located close to each other on the same linkage group, whereas KN3 and KN4 mapped on different linkage groups, correlating the more ancient divergence of these two genes. The proportion of synonymous and nonsynonymous substitutions suggested intense purifying selection for the four genes. However, rates of substitution per year indicated an evolution in two steps: faster rates were noted after gene duplications, followed subsequently by lower rates. Positive directional selection was detected for most of the internal branches harboring an accelerated rate of evolution. In addition, many sites with highly significant amino acid rate shift were identified between these branches. However, the tightly linked KN1 and KN2 did not diverge as much from each other. The implications of the correlation between phylogenetic, structural, and functional information are discussed in relation to the diversification of the knox-I gene family in conifers.
Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

PubMed Central

Pal Choudhury, Pabitra

2017-01-01

Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850
A first linkage map and downy mildew resistance QTL discovery for sweet basil (Ocimum basilicum) facilitated by double digestion restriction site associated DNA sequencing (ddRADseq).

PubMed

Pyne, Robert; Honig, Josh; Vaiciunas, Jennifer; Koroch, Adolfina; Wyenandt, Christian; Bonos, Stacy; Simon, James

2017-01-01

Limited understanding of sweet basil (Ocimum basilicum L.) genetics and genome structure has reduced efficiency of breeding strategies. This is evidenced by the rapid, worldwide dissemination of basil downy mildew (Peronospora belbahrii) in the absence of resistant cultivars. In an effort to improve available genetic resources, expressed sequence tag simple sequence repeat (EST-SSR) and single nucleotide polymorphism (SNP) markers were developed and used to genotype the MRI x SB22 F2 mapping population, which segregates for response to downy mildew. SNP markers were generated from genomic sequences derived from double digestion restriction site associated DNA sequencing (ddRADseq). Disomic segregation was observed in both SNP and EST-SSR markers providing evidence of an O. basilicum allotetraploid genome structure and allowing for subsequent analysis of the mapping population as a diploid intercross. A dense linkage map was constructed using 42 EST-SSR and 1,847 SNP markers spanning 3,030.9 cM. Multiple quantitative trait loci (QTL) model (MQM) analysis identified three QTL that explained 37-55% of phenotypic variance associated with downy mildew response across three environments. A single major QTL, dm11.1 explained 21-28% of phenotypic variance and demonstrated dominant gene action. Two minor QTL dm9.1 and dm14.1 explained 5-16% and 4-18% of phenotypic variance, respectively. Evidence is provided for an additive effect between the two minor QTL and the major QTL dm11.1 increasing downy mildew susceptibility. Results indicate that ddRADseq-facilitated SNP and SSR marker genotyping is an effective approach for mapping the sweet basil genome.
A first linkage map and downy mildew resistance QTL discovery for sweet basil (Ocimum basilicum) facilitated by double digestion restriction site associated DNA sequencing (ddRADseq)

PubMed Central

Honig, Josh; Vaiciunas, Jennifer; Koroch, Adolfina; Wyenandt, Christian; Bonos, Stacy; Simon, James

2017-01-01

Limited understanding of sweet basil (Ocimum basilicum L.) genetics and genome structure has reduced efficiency of breeding strategies. This is evidenced by the rapid, worldwide dissemination of basil downy mildew (Peronospora belbahrii) in the absence of resistant cultivars. In an effort to improve available genetic resources, expressed sequence tag simple sequence repeat (EST-SSR) and single nucleotide polymorphism (SNP) markers were developed and used to genotype the MRI x SB22 F2 mapping population, which segregates for response to downy mildew. SNP markers were generated from genomic sequences derived from double digestion restriction site associated DNA sequencing (ddRADseq). Disomic segregation was observed in both SNP and EST-SSR markers providing evidence of an O. basilicum allotetraploid genome structure and allowing for subsequent analysis of the mapping population as a diploid intercross. A dense linkage map was constructed using 42 EST-SSR and 1,847 SNP markers spanning 3,030.9 cM. Multiple quantitative trait loci (QTL) model (MQM) analysis identified three QTL that explained 37–55% of phenotypic variance associated with downy mildew response across three environments. A single major QTL, dm11.1 explained 21–28% of phenotypic variance and demonstrated dominant gene action. Two minor QTL dm9.1 and dm14.1 explained 5–16% and 4–18% of phenotypic variance, respectively. Evidence is provided for an additive effect between the two minor QTL and the major QTL dm11.1 increasing downy mildew susceptibility. Results indicate that ddRADseq-facilitated SNP and SSR marker genotyping is an effective approach for mapping the sweet basil genome. PMID:28922359
Object detection and tracking system

DOEpatents

Ma, Tian J.

2017-05-30

Methods and apparatuses for analyzing a sequence of images for an object are disclosed herein. In a general embodiment, the method identifies a region of interest in the sequence of images. The object is likely to move within the region of interest. The method divides the region of interest in the sequence of images into sections and calculates signal-to-noise ratios for a section in the sections. A signal-to-noise ratio for the section is calculated using the section in the image, a prior section in a prior image to the image, and a subsequent section in a subsequent image to the image. The signal-to-noise ratios are for potential velocities of the object in the section. The method also selects a velocity from the potential velocities for the object in the section using a potential velocity in the potential velocities having a highest signal-to-noise ratio in the signal-to-noise ratios.
Middle Pleistocene protein sequences from the rhinoceros genus Stephanorhinus and the phylogeny of extant and extinct Middle/Late Pleistocene Rhinocerotidae

PubMed Central

Smith, Geoff M.; Hutson, Jarod M.; Kindler, Lutz; Garcia-Moreno, Alejandro; Villaluenga, Aritza; Turner, Elaine

2017-01-01

Background Ancient protein sequences are increasingly used to elucidate the phylogenetic relationships between extinct and extant mammalian taxa. Here, we apply these recent developments to Middle Pleistocene bone specimens of the rhinoceros genus Stephanorhinus. No biomolecular sequence data is currently available for this genus, leaving phylogenetic hypotheses on its evolutionary relationships to extant and extinct rhinoceroses untested. Furthermore, recent phylogenies based on Rhinocerotidae (partial or complete) mitochondrial DNA sequences differ in the placement of the Sumatran rhinoceros (Dicerorhinus sumatrensis). Therefore, studies utilising ancient protein sequences from Middle Pleistocene contexts have the potential to provide further insights into the phylogenetic relationships between extant and extinct species, including Stephanorhinus and Dicerorhinus. Methods ZooMS screening (zooarchaeology by mass spectrometry) was performed on several Late and Middle Pleistocene specimens from the genus Stephanorhinus, subsequently followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) to obtain ancient protein sequences from a Middle Pleistocene Stephanorhinus specimen. We performed parallel analysis on a Late Pleistocene woolly rhinoceros specimen and extant species of rhinoceroses, resulting in the availability of protein sequence data for five extant species and two extinct genera. Phylogenetic analysis additionally included all extant Perissodactyla genera (Equus, Tapirus), and was conducted using Bayesian (MrBayes) and maximum-likelihood (RAxML) methods. Results Various ancient proteins were identified in both the Middle and Late Pleistocene rhinoceros samples. Protein degradation and proteome complexity are consistent with an endogenous origin of the identified proteins. Phylogenetic analysis of informative proteins resolved the Perissodactyla phylogeny in agreement with previous studies in regards to the placement of the families Equidae, Tapiridae, and Rhinocerotidae. Stephanorhinus is shown to be most closely related to the genera Coelodonta and Dicerorhinus. The protein sequence data further places the Sumatran rhino in a clade together with the genus Rhinoceros, opposed to forming a clade with the black and white rhinoceros species. Discussion The first biomolecular dataset available for Stephanorhinus places this genus together with the extinct genus Coelodonta and the extant genus Dicerorhinus. This is in agreement with morphological studies, although we are unable to resolve the order of divergence between these genera based on the protein sequences available. Our data supports the placement of the genus Dicerorhinus in a clade together with extant Rhinoceros species. Finally, the availability of protein sequence data for both extinct European rhinoceros genera allows future investigations into their geographic distribution and extinction chronologies. PMID:28316883
Middle Pleistocene protein sequences from the rhinoceros genus Stephanorhinus and the phylogeny of extant and extinct Middle/Late Pleistocene Rhinocerotidae.

PubMed

Welker, Frido; Smith, Geoff M; Hutson, Jarod M; Kindler, Lutz; Garcia-Moreno, Alejandro; Villaluenga, Aritza; Turner, Elaine; Gaudzinski-Windheuser, Sabine

2017-01-01

Ancient protein sequences are increasingly used to elucidate the phylogenetic relationships between extinct and extant mammalian taxa. Here, we apply these recent developments to Middle Pleistocene bone specimens of the rhinoceros genus Stephanorhinus . No biomolecular sequence data is currently available for this genus, leaving phylogenetic hypotheses on its evolutionary relationships to extant and extinct rhinoceroses untested. Furthermore, recent phylogenies based on Rhinocerotidae (partial or complete) mitochondrial DNA sequences differ in the placement of the Sumatran rhinoceros ( Dicerorhinus sumatrensis ). Therefore, studies utilising ancient protein sequences from Middle Pleistocene contexts have the potential to provide further insights into the phylogenetic relationships between extant and extinct species, including Stephanorhinus and Dicerorhinus . ZooMS screening (zooarchaeology by mass spectrometry) was performed on several Late and Middle Pleistocene specimens from the genus Stephanorhinus , subsequently followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) to obtain ancient protein sequences from a Middle Pleistocene Stephanorhinus specimen. We performed parallel analysis on a Late Pleistocene woolly rhinoceros specimen and extant species of rhinoceroses, resulting in the availability of protein sequence data for five extant species and two extinct genera. Phylogenetic analysis additionally included all extant Perissodactyla genera ( Equus , Tapirus ), and was conducted using Bayesian (MrBayes) and maximum-likelihood (RAxML) methods. Various ancient proteins were identified in both the Middle and Late Pleistocene rhinoceros samples. Protein degradation and proteome complexity are consistent with an endogenous origin of the identified proteins. Phylogenetic analysis of informative proteins resolved the Perissodactyla phylogeny in agreement with previous studies in regards to the placement of the families Equidae, Tapiridae, and Rhinocerotidae. Stephanorhinus is shown to be most closely related to the genera Coelodonta and Dicerorhinus . The protein sequence data further places the Sumatran rhino in a clade together with the genus Rhinoceros , opposed to forming a clade with the black and white rhinoceros species. The first biomolecular dataset available for Stephanorhinus places this genus together with the extinct genus Coelodonta and the extant genus Dicerorhinus . This is in agreement with morphological studies, although we are unable to resolve the order of divergence between these genera based on the protein sequences available. Our data supports the placement of the genus Dicerorhinus in a clade together with extant Rhinoceros species. Finally, the availability of protein sequence data for both extinct European rhinoceros genera allows future investigations into their geographic distribution and extinction chronologies.
A case report of Fanconi anemia diagnosed by genetic testing followed by prenatal diagnosis.

PubMed

Lee, Hwa Jeen; Park, Seungman; Kang, Hyoung Jin; Jun, Jong Kwan; Lee, Jung Ae; Lee, Dong Soon; Park, Sung Sup; Seong, Moon-Woo

2012-09-01

Fanconi anemia (FA) is a rare genetic disorder affecting multiple body systems. Genetic testing, including prenatal testing, is a prerequisite for the diagnosis of many clinical conditions. However, genetic testing is complicated for FA because there are often many genes that are associated with its development, and large deletions, duplications, or sequence variations are frequently found in some of these genes. This study describes successful genetic testing for molecular diagnosis, and subsequent prenatal diagnosis, of FA in a patient and his family in Korea. We analyzed all exons and flanking regions of the FANCA, FANCC, and FANCG genes for mutation identification and subsequent prenatal diagnosis. Multiplex ligation-dependent probe amplification analysis was performed to detect large deletions or duplications in the FANCA gene. Molecular analysis revealed two mutations in the FANCA gene: a frameshift mutation c.2546delC and a novel splice-site mutation c.3627-1G>A. The FANCA mutations were separately inherited from each parent, c.2546delC was derived from the father, whereas c.3627-1G>A originated from the mother. The amniotic fluid cells were c.3627-1G>A heterozygotes, suggesting that the fetus was unaffected. This is the first report of genetic testing that was successfully applied to molecular diagnosis of a patient and subsequent prenatal diagnosis of FA in a family in Korea.
Genomic Epidemiology of Vibrio cholerae O1 Associated with Floods, Pakistan, 2010

PubMed Central

Shah, Muhammad Ali; Mutreja, Ankur; Thomson, Nicholas; Baker, Stephen; Parkhill, Julian; Dougan, Gordon; Bokhari, Habib

2014-01-01

In August 2010, Pakistan experienced major floods and a subsequent cholera epidemic. To clarify the population dynamics and transmission of Vibrio cholerae in Pakistan, we sequenced the genomes of all V. cholerae O1 El Tor isolates and compared the sequences to a global collection of 146 V. cholerae strains. Within the global phylogeny, all isolates from Pakistan formed 2 new subclades (PSC-1 and PSC-2), lying in the third transmission wave of the seventh-pandemic lineage that could be distinguished by signature deletions and their antimicrobial susceptibilities. Geographically, PSC-1 isolates originated from the coast, whereas PSC-2 isolates originated from inland areas flooded by the Indus River. Single-nucleotide polymorphism accumulation analysis correlated river flow direction with the spread of PSC-2. We found at least 2 sources of cholera in Pakistan during the 2010 epidemic and illustrate the value of a global genomic data bank in contextualizing cholera outbreaks. PMID:24378019
Genomic epidemiology of Vibrio cholerae O1 associated with floods, Pakistan, 2010.

PubMed

Shah, Muhammad Ali; Mutreja, Ankur; Thomson, Nicholas; Baker, Stephen; Parkhill, Julian; Dougan, Gordon; Bokhari, Habib; Wren, Brendan W

2014-01-01

In August 2010, Pakistan experienced major floods and a subsequent cholera epidemic. To clarify the population dynamics and transmission of Vibrio cholerae in Pakistan, we sequenced the genomes of all V. cholerae O1 El Tor isolates and compared the sequences to a global collection of 146 V. cholerae strains. Within the global phylogeny, all isolates from Pakistan formed 2 new subclades (PSC-1 and PSC-2), lying in the third transmission wave of the seventh-pandemic lineage that could be distinguished by signature deletions and their antimicrobial susceptibilities. Geographically, PSC-1 isolates originated from the coast, whereas PSC-2 isolates originated from inland areas flooded by the Indus River. Single-nucleotide polymorphism accumulation analysis correlated river flow direction with the spread of PSC-2. We found at least 2 sources of cholera in Pakistan during the 2010 epidemic and illustrate the value of a global genomic data bank in contextualizing cholera outbreaks.
The genetic architecture of long QT syndrome: A critical reappraisal.

PubMed

Giudicessi, John R; Wilde, Arthur A M; Ackerman, Michael J

2018-03-30

Collectively, the completion of the Human Genome Project and subsequent development of high-throughput next-generation sequencing methodologies have revolutionized genomic research. However, the rapid sequencing and analysis of thousands upon thousands of human exomes and genomes has taught us that most genes, including those known to cause heritable cardiovascular disorders such as long QT syndrome, harbor an unexpected background rate of rare, and presumably innocuous, non-synonymous genetic variation. In this Review, we aim to reappraise the genetic architecture underlying both the acquired and congenital forms of long QT syndrome by examining how the clinical phenotype associated with and background genetic variation in long QT syndrome-susceptibility genes impacts the clinical validity of existing gene-disease associations and the variant classification and reporting strategies that serve as the foundation for diagnostic long QT syndrome genetic testing. Copyright © 2018 Elsevier Inc. All rights reserved.
High quality draft genome sequence and analysis of Pontibacter roseus type strain SRC-1T (DSM 17521T) isolated from muddy waters of a drainage system in Chandigarh, India

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mukherjee, Supratim; Lapidus, Alla; Shapiro, Nicole

2015-01-01

Pontibacter roseus Suresh et al 2006 is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1T along with its complete genome sequence and annotation from a culture of DSM 17521T. The 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50more » RNA genes and is a part of Genomic encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project.« less
Inferring action structure and causal relationships in continuous sequences of human action.

PubMed

Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare

2015-02-01

In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.
High quality draft genome sequence and analysis of Pontibacter roseus type strain SRC-1T (DSM 17521T) isolated from muddy waters of a drainage system in Chandigarh, India

DOE PAGES

Mukherjee, Supratim; Lapidus, Alla; Shapiro, Nicole; ...

2015-02-09

Pontibacter roseus is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1 T along with its complete genome sequence and annotation from a culture of DSM 17521 T. In conclusion, the 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50more » RNA genes and is a part of Genomic Encyclopedia of Type Strains: KMG-I project.« less
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

PubMed Central

Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

2007-01-01

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
Characterizing the rapid spread of porcine epidemic diarrhea virus (PEDV) through an animal food manufacturing facility.

PubMed

Schumacher, Loni L; Huss, Anne R; Cochrane, Roger A; Stark, Charles R; Woodworth, Jason C; Bai, Jianfa; Poulsen, Elizabeth G; Chen, Qi; Main, Rodger G; Zhang, Jianqiang; Gauger, Phillip C; Ramirez, Alejandro; Derscheid, Rachel J; Magstadt, Drew M; Dritz, Steve S; Jones, Cassandra K

2017-01-01

New regulatory and consumer demands highlight the importance of animal feed as a part of our national food safety system. Porcine epidemic diarrhea virus (PEDV) is the first viral pathogen confirmed to be widely transmissible in animal food. Because the potential for viral contamination in animal food is not well characterized, the objectives of this study were to 1) observe the magnitude of virus contamination in an animal food manufacturing facility, and 2) investigate a proposed method, feed sequencing, to decrease virus decontamination on animal food-contact surfaces. A U.S. virulent PEDV isolate was used to inoculate 50 kg swine feed, which was mixed, conveyed, and discharged into bags using pilot-scale feed manufacturing equipment. Surfaces were swabbed and analyzed for the presence of PEDV RNA by quantitative real-time polymerase chain reaction (qPCR). Environmental swabs indicated complete contamination of animal food-contact surfaces (0/40 vs. 48/48, positive baseline samples/total baseline samples, positive subsequent samples/total subsequent samples, respectively; P < 0.05) and near complete contamination of non-animal food-contact surfaces (0/24 vs. 16/18, positive baseline samples/total baseline samples, positive subsequent samples/total subsequent samples, respectively; P < 0.05). Flushing animal food-contact surfaces with low-risk feed is commonly used to reduce cross-contamination in animal feed manufacturing. Thus, four subsequent 50 kg batches of virus-free swine feed were manufactured using the same system to test its impact on decontaminating animal food-contact surfaces. Even after 4 subsequent sequences, animal food-contact surfaces retained viral RNA (28/33 positive samples/total samples), with conveying system being more contaminated than the mixer. A bioassay to test infectivity of dust from animal food-contact surfaces failed to produce infectivity. This study demonstrates the potential widespread viral contamination of surfaces in an animal food manufacturing facility and the difficulty of removing contamination using conventional feed sequencing, which underscores the importance for preventing viruses from entering and contaminating such facilities.
Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.

PubMed

Girardot, Charles; Scholtalbers, Jelle; Sauer, Sajoscha; Su, Shu-Yi; Furlong, Eileen E M

2016-10-08

The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je . Je can also be easily installed in Galaxy through the Galaxy toolshed.

A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

PubMed

Overvoorde, P J; Chao, W S; Grimes, H D

1997-06-20

Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.
Effectiveness of a cloning and sequencing exercise on student learning with subsequent publication in the National Center for Biotechnology Information GenBank.

PubMed

Lau, Joann M; Robinson, David L

2009-01-01

With rapid advances in biotechnology and molecular biology, instructors are challenged to not only provide undergraduate students with hands-on experiences in these disciplines but also to engage them in the "real-world" scientific process. Two common topics covered in biotechnology or molecular biology courses are gene-cloning and bioinformatics, but to provide students with a continuous laboratory-based research experience in these techniques is difficult. To meet these challenges, we have partnered with Bio-Rad Laboratories in the development of the "Cloning and Sequencing Explorer Series," which combines wet-lab experiences (e.g., DNA extraction, polymerase chain reaction, ligation, transformation, and restriction digestion) with bioinformatics analysis (e.g., evaluation of DNA sequence quality, sequence editing, Basic Local Alignment Search Tool searches, contig construction, intron identification, and six-frame translation) to produce a sequence publishable in the National Center for Biotechnology Information GenBank. This 6- to 8-wk project-based exercise focuses on a pivotal gene of glycolysis (glyceraldehyde-3-phosphate dehydrogenase), in which students isolate, sequence, and characterize the gene from a plant species or cultivar not yet published in GenBank. Student achievement was evaluated using pre-, mid-, and final-test assessments, as well as with a survey to assess student perceptions. Student confidence with basic laboratory techniques and knowledge of bioinformatics tools were significantly increased upon completion of this hands-on exercise.
Synteny of Prunus and other model plant species

PubMed Central

Jung, Sook; Jiwan, Derick; Cho, Ilhyung; Lee, Taein; Abbott, Albert; Sosinski, Bryon; Main, Dorrie

2009-01-01

Background Fragmentary conservation of synteny has been reported between map-anchored Prunus sequences and Arabidopsis. With the availability of genome sequence for fellow rosid I members Populus and Medicago, we analyzed the synteny between Prunus and the three model genomes. Eight Prunus BAC sequences and map-anchored Prunus sequences were used in the comparison. Results We found a well conserved synteny across the Prunus species – peach, plum, and apricot – and Populus using a set of homologous Prunus BACs. Conversely, we could not detect any synteny with Arabidopsis in this region. Other peach BACs also showed extensive synteny with Populus. The syntenic regions detected were up to 477 kb in Populus. Two syntenic regions between Arabidopsis and these BACs were much shorter, around 10 kb. We also found syntenic regions that are conserved between the Prunus BACs and Medicago. The array of synteny corresponded with the proposed whole genome duplication events in Populus and Medicago. Using map-anchored Prunus sequences, we detected many syntenic blocks with several gene pairs between Prunus and Populus or Arabidopsis. We observed a more complex network of synteny between Prunus-Arabidopsis, indicative of multiple genome duplication and subsequence gene loss in Arabidopsis. Conclusion Our result shows the striking microsynteny between the Prunus BACs and the genome of Populus and Medicago. In macrosynteny analysis, more distinct Prunus regions were syntenic to Populus than to Arabidopsis. PMID:19208249
The complete sequence of the mitochondrial genome of the African Penguin (Spheniscus demersus).

PubMed

Labuschagne, Christiaan; Kotzé, Antoinette; Grobler, J Paul; Dalton, Desiré L

2014-01-15

The complete mitochondrial genome of the African Penguin (Spheniscus demersus) was sequenced. The molecule was sequenced via next generation sequencing and primer walking. The size of the genome is 17,346 bp in length. Comparison with the mitochondrial DNA of two other penguin genomes that have so far been reported was conducted namely; Little blue penguin (Eudyptula minor) and the Rockhopper penguin (Eudyptes chrysocome). This analysis made it possible to identify common penguin mitochondrial DNA characteristics. The S. demersus mtDNA genome is very similar, both in composition and length to both the E. chrysocome and E. minor genomes. The gene content of the African penguin mitochondrial genome is typical of vertebrates and all three penguin species have the standard gene order originally identified in the chicken. The control region for S. demersus is located between tRNA-Glu and tRNA-Phe and all three species of penguins contain two sets of similar repeats with varying copy numbers towards the 3' end of the control region, accounting for the size variance. This is the first report of the complete nucleotide sequence for the mitochondrial genome of the African penguin, S. demersus. These results can be subsequently used to provide information for penguin phylogenetic studies and insights into the evolution of genomes. © 2013 Elsevier B.V. All rights reserved.
Structural analysis of key gap junction domains--Lessons from genome data and disease-linked mutants.

PubMed

Bai, Donglin

2016-02-01

A gap junction (GJ) channel is formed by docking of two GJ hemichannels and each of these hemichannels is a hexamer of connexins. All connexin genes have been identified in human, mouse, and rat genomes and their homologous genes in many other vertebrates are available in public databases. The protein sequences of these connexins align well with high sequence identity in the same connexin across different species. Domains in closely related connexins and several residues in all known connexins are also well-conserved. These conserved residues form signatures (also known as sequence logos) in these domains and are likely to play important biological functions. In this review, the sequence logos of individual connexins, groups of connexins with common ancestors, and all connexins are analyzed to visualize natural evolutionary variations and the hot spots for human disease-linked mutations. Several gap junction domains are homologous, likely forming similar structures essential for their function. The availability of a high resolution Cx26 GJ structure and the subsequently-derived homology structure models for other connexin GJ channels elevated our understanding of sequence logos at the three-dimensional GJ structure level, thus facilitating the understanding of how disease-linked connexin mutants might impair GJ structure and function. This knowledge will enable the design of complementary variants to rescue disease-linked mutants. Copyright © 2015 Elsevier Ltd. All rights reserved.
High resolution melting (HRM) analysis of DNA--its role and potential in food analysis.

PubMed

Druml, Barbara; Cichna-Markl, Margit

2014-09-01

DNA based methods play an increasing role in food safety control and food adulteration detection. Recent papers show that high resolution melting (HRM) analysis is an interesting approach. It involves amplification of the target of interest in the presence of a saturation dye by the polymerase chain reaction (PCR) and subsequent melting of the amplicons by gradually increasing the temperature. Since the melting profile depends on the GC content, length, sequence and strand complementarity of the product, HRM analysis is highly suitable for the detection of single-base variants and small insertions or deletions. The review gives an introduction into HRM analysis, covers important aspects in the development of an HRM analysis method and describes how HRM data are analysed and interpreted. Then we discuss the potential of HRM analysis based methods in food analysis, i.e. for the identification of closely related species and cultivars and the identification of pathogenic microorganisms. Copyright © 2014 Elsevier Ltd. All rights reserved.
Pushing the Limits of Imagination: Mental Practice for Learning Sequences

ERIC Educational Resources Information Center

Wohldmann, Erica L.; Healy, Alice F.; Bourne, Lyle E., Jr.

2007-01-01

In 2 experiments, the efficacy of motor imagery for learning to type number sequences was examined. Adults practiced typing 4-digit numbers. Then, during subsequent training, they either typed in the same or a different location, imagined typing, merely looked at each number, or performed an irrelevant task. Repetition priming (faster responses…
Deconstructing Learning in Science--Young Children's Responses to a Classroom Sequence on Evaporation.

ERIC Educational Resources Information Center

Tytler, Russell; Peterson, Suzanne

2001-01-01

Tracks five-year-old children's ideas by a range of means during and subsequent to a classroom sequence on evaporation. Explores the relationship between social and individual perspectives on learning, and questions some assumptions underlying conceptual change research. Analyzes the children's explanations of various evaporation phenomena over…
Programming and Reprogramming Sequence Timing Following High and Low Contextual Interference Practice

ERIC Educational Resources Information Center

Wright, David L.; Magnuson, Curt E.; Black, Charles B.

2005-01-01

Individuals practiced two unique discrete sequence production tasks that differed in their relative time profile in either a blocked or random practice schedule. Each participant was subsequently administered a "precuing" protocol to examine the cost of initially compiling or modifying the plan for an upcoming movement's relative timing. The…
ATLAS, an integrated structural analysis and design system. Volume 1: ATLAS user's guide

NASA Technical Reports Server (NTRS)

Dreisbach, R. L. (Editor)

1979-01-01

Some of the many analytical capabilities provided by the ATLAS Version 4.0 System in the logical sequence are described in which model-definition data are prepared and the subsequent computer job is executed. The example data presented and the fundamental technical considerations that are highlighted can be used as guides during the problem solving process. This guide does not describe the details of the ATLAS capabilities, but provides an introduction to the new user of ATLAS to the level at which the complete array of capabilities described in the ATLAS User's Manual can be exploited fully.
HiCUP: pipeline for mapping and processing Hi-C data.

PubMed

Wingett, Steven; Ewels, Philip; Furlan-Magaril, Mayra; Nagano, Takashi; Schoenfelder, Stefan; Fraser, Peter; Andrews, Simon

2015-01-01

HiCUP is a pipeline for processing sequence data generated by Hi-C and Capture Hi-C (CHi-C) experiments, which are techniques used to investigate three-dimensional genomic organisation. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also produces an easy-to-interpret yet detailed quality control (QC) report that assists in refining experimental protocols for future studies. The software is freely available and has already been used for processing Hi-C and CHi-C data in several recently published peer-reviewed studies.
Isolation and molecular characterization of an H5N1 swine influenza virus in China in 2015.

PubMed

Wu, Haibo; Yang, Fan; Lu, Rufeng; Xu, Lihua; Liu, Fumin; Peng, Xiuming; Wu, Nanping

2018-03-01

In 2015, an H5N1 influenza virus was isolated from a pig in Zhejiang Province, Eastern China. This strain was characterized by whole-genome sequencing with subsequent phylogenetic analysis. Phylogenetic analysis showed that all segments from this strain belonged to clade 2.3.2 and that it had received its genes from poultry influenza viruses in China. A Glu627Lys mutation associated with pathogenicity was observed in the PB2 protein. This strain was moderately pathogenic in mice and was able to replicate without prior adaptation. These results suggest that active surveillance of swine influenza should be used as an early warning system for influenza outbreaks in mammals.
Mutation analysis of the MECP2 gene in patients of Slavic origin with Rett syndrome: novel mutations and polymorphisms.

PubMed

Zahorakova, Daniela; Rosipal, Robert; Hadac, Jan; Zumrova, Alena; Bzduch, Vladimir; Misovicova, Nadezda; Baxova, Alice; Zeman, Jiri; Martasek, Pavel

2007-01-01

Rett syndrome (RTT), an X-linked dominant neurodevelopmental disorder in females, is caused mainly by de novo mutations in the methyl-CpG-binding protein 2 gene (MECP2). Here we report mutation analysis of the MECP2 gene in 87 patients with RTT from the Czech and Slovak Republics, and Ukraine. The patients, all girls, with classical RTT were investigated for mutations using bi-directional DNA sequencing and conformation sensitive gel electrophoresis analysis of the coding sequence and exon/intron boundaries of the MECP2 gene. Restriction fragment length polymorphism analysis was performed to confirm the mutations that cause the creation or abolition of the restriction site. Mutation-negative cases were subsequently examined by multiple ligation-dependent probe amplification (MLPA) to identify large deletions. Mutation screening revealed 31 different mutations in 68 patients and 12 non-pathogenic polymorphisms. Six mutations have not been previously published: two point mutations (323T>A, 904C>T), three deletions (189_190delGA, 816_832del17, 1069delAGC) and one deletion/inversion (1063_1236del174;1189_1231inv43). MLPA analysis revealed large deletions in two patients. The detection rate was 78.16%. Our results confirm the high frequency of MECP2 mutations in females with RTT and provide data concerning the mutation heterogeneity in the Slavic population.
Developing a post-fire flood chronology and recurrence probability from alluvial stratigraphy in the Buffalo Creek watershed, Colorado, USA

USGS Publications Warehouse

Elliott, J.G.; Parker, R.S.

2001-01-01

Stratigraphic and geomorphic evidence indicate floods that occur soon after forest fires have been intermittent but common events in many mountainous areas during the past several thousand years. The magnitude and recurrence of these post-fire flood events reflects the joint probability between the recurrence of fires and the recurrence of subsequent rainfall events of varying magnitude and intensity. Following the May 1996 Buffalo Creek, Colorado, forest fire, precipitation amounts and intensities that generated very little surface runoff outside of the burned area resulted in severe hillslope erosion, floods, and streambed sediment entrainment in the rugged, severely burned, 48 km2 area. These floods added sediment to many existing alluvial fans, while simultaneously incising other fans and alluvial deposits. Incision of older fans revealed multiple sequences of fluvially transported sandy gravel that grade upward into charcoal-rich, loamy horizons. We interpret these sequences to represent periods of high sediment transport and aggradation during floods, followed by intervals of quiescence and relative stability in the watershed until a subsequent fire occurred. An alluvial sequence near the mouth of a tributary draining a 0??82 km2 area indicated several previous post-fire flood cycles in the watershed. Dendrochronologic and radiocarbon ages of material in this deposit span approximately 2900 years, and define three aggradational periods. The three general aggradational periods are separated by intervals of approximately nine to ten centuries and reflect a 'millennium-scale' geomorphic response to a closely timed sequence of events: severe and intense, watershed-scale, stand-replacing fires and subsequent rainstorms and flooding. Millennium-scale aggradational units at the study site may have resulted from a scenario in which the initial runoff from the burned watershed transported and deposited large volumes of sediment on downstream alluvial surfaces and tributary fans. Subsequent storm runoff may have produced localized incision and channelization, preventing additional vertical aggradation on the sampled alluvial deposit for several centuries. Two of the millennium-scale aggradational periods at the study site consist of multiple gravel and loam sequences with similar radiocarbon ages. These closely dated sequences may reflect a 'multidecade-scale' geomorphic response to more frequent, but aerially limited and less severe fires, followed by rainstorms of relatively common recurrence. Published in 2001 by John Wiley and Sons, Ltd.
Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species identifies mobile introns and accessory genes as main sources of genome size variability.

PubMed

Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C

2012-12-12

The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population studies. Despite the conservation of the core genes, the mitochondrial genomes of Aspergillus and Penicillium species examined here exhibit significant amount of interspecies variation. Most of this variation can be attributed to accessory genes and mobile introns, presumably acquired by horizontal gene transfer of mitochondrial plasmids and intron homing.
Genome sequence and analysis of a stress-tolerant, wild-derived strain of Saccharomyces cerevisiae used in biofuels research

DOE Office of Scientific and Technical Information (OSTI.GOV)

McIlwain, Sean J.; Peris, Davis; Sardi, Maria

The genome sequences of more than 100 strains of the yeast Saccharomyces cerevisiae have been published. Unfortunately, most of these genome assemblies contain dozens to hundreds of gaps at repetitive sequences, including transposable elements, tRNAs, and subtelomeric regions, which is where novel genes generally reside. Relatively few strains have been chosen for genome sequencing based on their biofuel production potential, leaving an additional knowledge gap. Here, we describe the nearly complete genome sequence of GLBRCY22-3 (Y22-3), a strain of S. cerevisiae derived from the stress-tolerant wild strain NRRL YB-210 and subsequently engineered for xylose metabolism. After benchmarking several genome assemblymore » approaches, we developed a pipeline to integrate Pacific Biosciences (PacBio) and Illumina sequencing data and achieved one of the highest quality genome assemblies for any S. cerevisiae strain. Specifically, the contig N50 is 693 kbp, and the sequences of most chromosomes, the mitochondrial genome, and the 2-micron plasmid are complete. Our annotation predicts 92 genes that are not present in the reference genome of the laboratory strain S288c, over 70% of which were expressed. We predicted functions for 43 of these genes, 28 of which were previously uncharacterized and unnamed. Remarkably, many of these genes are predicted to be involved in stress tolerance and carbon metabolism and are shared with a Brazilian bioethanol production strain, even though the strains differ dramatically at most genetic loci. Lastly, the Y22-3 genome sequence provides an exceptionally high-quality resource for basic and applied research in bioenergy and genetics.« less
Genome sequence and analysis of a stress-tolerant, wild-derived strain of Saccharomyces cerevisiae used in biofuels research

DOE PAGES

McIlwain, Sean J.; Peris, Davis; Sardi, Maria; ...

2016-04-20

The genome sequences of more than 100 strains of the yeast Saccharomyces cerevisiae have been published. Unfortunately, most of these genome assemblies contain dozens to hundreds of gaps at repetitive sequences, including transposable elements, tRNAs, and subtelomeric regions, which is where novel genes generally reside. Relatively few strains have been chosen for genome sequencing based on their biofuel production potential, leaving an additional knowledge gap. Here, we describe the nearly complete genome sequence of GLBRCY22-3 (Y22-3), a strain of S. cerevisiae derived from the stress-tolerant wild strain NRRL YB-210 and subsequently engineered for xylose metabolism. After benchmarking several genome assemblymore » approaches, we developed a pipeline to integrate Pacific Biosciences (PacBio) and Illumina sequencing data and achieved one of the highest quality genome assemblies for any S. cerevisiae strain. Specifically, the contig N50 is 693 kbp, and the sequences of most chromosomes, the mitochondrial genome, and the 2-micron plasmid are complete. Our annotation predicts 92 genes that are not present in the reference genome of the laboratory strain S288c, over 70% of which were expressed. We predicted functions for 43 of these genes, 28 of which were previously uncharacterized and unnamed. Remarkably, many of these genes are predicted to be involved in stress tolerance and carbon metabolism and are shared with a Brazilian bioethanol production strain, even though the strains differ dramatically at most genetic loci. Lastly, the Y22-3 genome sequence provides an exceptionally high-quality resource for basic and applied research in bioenergy and genetics.« less
Carrot Juice Fermentations as Man-Made Microbial Ecosystems Dominated by Lactic Acid Bacteria.

PubMed

Wuyts, Sander; Van Beeck, Wannes; Oerlemans, Eline F M; Wittouck, Stijn; Claes, Ingmar J J; De Boeck, Ilke; Weckx, Stefan; Lievens, Bart; De Vuyst, Luc; Lebeer, Sarah

2018-06-15

Spontaneous vegetable fermentations, with their rich flavors and postulated health benefits, are regaining popularity. However, their microbiology is still poorly understood, therefore raising concerns about food safety. In addition, such spontaneous fermentations form interesting cases of man-made microbial ecosystems. Here, samples from 38 carrot juice fermentations were collected through a citizen science initiative, in addition to three laboratory fermentations. Culturing showed that Enterobacteriaceae were outcompeted by lactic acid bacteria (LAB) between 3 and 13 days of fermentation. Metabolite-target analysis showed that lactic acid and mannitol were highly produced, as well as the biogenic amine cadaverine. High-throughput 16S rRNA gene sequencing revealed that mainly species of Leuconostoc and Lactobacillus (as identified by 8 and 20 amplicon sequence variants [ASVs], respectively) mediated the fermentations in subsequent order. The analyses at the DNA level still detected a high number of Enterobacteriaceae , but their relative abundance was low when RNA-based sequencing was performed to detect presumptive metabolically active bacterial cells. In addition, this method greatly reduced host read contamination. Phylogenetic placement indicated a high LAB diversity, with ASVs from nine different phylogenetic groups of the Lactobacillus genus complex. However, fermentation experiments with isolates showed that only strains belonging to the most prevalent phylogenetic groups preserved the fermentation dynamics. The carrot juice fermentation thus forms a robust man-made microbial ecosystem suitable for studies on LAB diversity and niche specificity. IMPORTANCE The usage of fermented food products by professional chefs is steadily growing worldwide. Meanwhile, this interest has also increased at the household level. However, many of these artisanal food products remain understudied. Here, an extensive microbial analysis was performed of spontaneous fermented carrot juices which are used as nonalcoholic alternatives for wine in a Belgian Michelin star restaurant. Samples were collected through an active citizen science approach with 38 participants, in addition to three laboratory fermentations. Identification of the main microbial players revealed that mainly species of Leuconostoc and Lactobacillus mediated the fermentations in subsequent order. In addition, a high diversity of lactic acid bacteria was found; however, fermentation experiments with isolates showed that only strains belonging to the most prevalent lactic acid bacteria preserved the fermentation dynamics. Finally, this study showed that the usage of RNA-based 16S rRNA amplicon sequencing greatly reduces host read contamination. Copyright © 2018 American Society for Microbiology.
Molecular diversity of arbuscular mycorrhizal fungi and their distribution patterns related to host-plants and habitats in a hot and arid ecosystem, southwest China.

PubMed

Li, Ling-Fei; Li, Tao; Zhang, Yan; Zhao, Zhi-Wei

2010-03-01

The communities of arbuscular mycorrhizal fungi (AMF) colonizing the roots of Bothriochloa pertusa, Cajanus cajan and Heteropogon contortus in a fallow land (FL) and an undisturbed land (UL) were characterized. The large subunit rDNA genes of AMF from roots were amplified and cloned. A total of 2353 clones were screened by restriction fragment length polymorphism, and 428 clones were subsequently sequenced. A total of 393 AMF sequences, which were grouped into 100 operational taxonomic units, were obtained. Phylogenetic analysis revealed that the AMF sequences belonged to Glomus, Acaulospora and Scutellospora, and that Glomus was the dominant genus. Of the 393 AMF sequences, 81% were novel. The diversity of AMF colonizing the same plant species was higher in the UL than in the FL, which confirmed strongly from the molecular evidence that soil disturbance reduced AMF population and species richness. The results revealed that AMF communities were significantly different among host-plant species and between the two habitats. The similarity of AMF communities colonizing different plant species within a habitat was higher than that of the same plant species from different habitats. The molecular evidence supported our previous hypothesis based on morphological analyses that AMF communities were more influenced by habitats compared with host preference.
Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

PubMed

McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

2014-08-01

Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event. Copyright © 2014 by the Genetics Society of America.

Identification of a novel MYO7A mutation in Usher syndrome type 1.

PubMed

Cheng, Ling; Yu, Hongsong; Jiang, Yan; He, Juan; Pu, Sisi; Li, Xin; Zhang, Li

2018-01-05

Usher syndrome (USH) is an autosomal recessive disease characterized by deafness and retinitis pigmentosa. In view of the high phenotypic and genetic heterogeneity in USH, performing genetic screening with traditional methods is impractical. In the present study, we carried out targeted next-generation sequencing (NGS) to uncover the underlying gene in an USH family (2 USH patients and 15 unaffected relatives). One hundred and thirty-five genes associated with inherited retinal degeneration were selected for deep exome sequencing. Subsequently, variant analysis, Sanger validation and segregation tests were utilized to identify the disease-causing mutations in this family. All affected individuals had a classic USH type I (USH1) phenotype which included deafness, vestibular dysfunction and retinitis pigmentosa. Targeted NGS and Sanger sequencing validation suggested that USH1 patients carried an unreported splice site mutation, c.5168+1G>A, as a compound heterozygous mutation with c.6070C>T (p.R2024X) in the MYO7A gene. A functional study revealed decreased expression of the MYO7A gene in the individuals carrying heterozygous mutations. In conclusion, targeted next-generation sequencing provided a comprehensive and efficient diagnosis for USH1. This study revealed the genetic defects in the MYO7A gene and expanded the spectrum of clinical phenotypes associated with USH1 mutations.
rpoB gene mutations among Mycobacterium tuberculosis isolates from extrapulmonary sites.

PubMed

Khosravi, Azar Dokht; Meghdadi, Hossein; Ghadiri, Ata A; Alami, Ameneh; Sina, Amir Hossein; Mirsaeidi, Mehdi

2018-03-01

The aim of this study was to analyze mutations occurring in the rpoB gene of Mycobacterium tuberculosis (MTB) isolates from clinical samples of extrapulmonary tuberculosis (EPTB). Seventy formalin-fixed, paraffin-embedded samples and fresh tissue samples from confirmed EPTB cases were analyzed. Nested PCR based on the rpoB gene was performed on the extracted DNAs, combined with cloning and subsequent sequencing. Sixty-seven (95.7%) samples were positive for nester PCR. Sequence analysis of the 81 bp region of the rpoB gene demonstrated mutations in 41 (61.2%) of 67 sequenced samples. Several point mutations including deletion mutations at codons 510, 512, 513 and 515, with 45% and 51% of the mutations in codons 512 and 513 respectively were seen, along with 26% replacement mutations at codons 509, 513, 514, 518, 520, 524 and 531. The most common alteration was Gln → His, at codon 513, presented in 30 (75.6%) isolates. This study demonstrated sequence alterations in codon 513 of the 81 bp region of the rpoB gene as the most common mutation occurred in 75.6% of molecularly confirmed rifampin-resistant strains. In addition, simultaneous mutation at codons 512 and 513 was demonstrated in 34.3% of the isolates. © 2018 APMIS. Published by John Wiley & Sons Ltd.
Hairpin Bisulfite Sequencing: Synchronous Methylation Analysis on Complementary DNA Strands of Individual Chromosomes.

PubMed

Giehr, Pascal; Walter, Jörn

2018-01-01

The accurate and quantitative detection of 5-methylcytosine is of great importance in the field of epigenetics. The method of choice is usually bisulfite sequencing because of the high resolution and the possibility to combine it with next generation sequencing. Nevertheless, also this method has its limitations. Following the bisulfite treatment DNA strands are no longer complementary such that in a subsequent PCR amplification the DNA methylation patterns information of only one of the two DNA strand is preserved. Several years ago Hairpin Bisulfite sequencing was developed as a method to obtain the pattern information on complementary DNA strands. The method requires fragmentation (usually by enzymatic cleavage) of genomic DNA followed by a covalent linking of both DNA strands through ligation of a short DNA hairpin oligonucleotide to both strands. The ligated covalently linked dsDNA products are then subjected to a conventional bisulfite treatment during which all unmodified cytosines are converted to uracils. During the treatment the DNA is denatured forming noncomplementary ssDNA circles. These circles serve as a template for a locus specific PCR to amplify chromosomal patterns of the region of interest. As a result one ends up with a linearized product, which contains the methylation information of both complementary DNA strands.
Discovery of novel virus sequences in an isolated and threatened bat species, the New Zealand lesser short-tailed bat (Mystacina tuberculata)

PubMed Central

Wang, Jing; Moore, Nicole E.; Murray, Zak L.; McInnes, Kate; White, Daniel J.; Tompkins, Daniel M.

2015-01-01

Bats harbour a diverse array of viruses, including significant human pathogens. Extensive metagenomic studies of material from bats, in particular guano, have revealed a large number of novel or divergent viral taxa that were previously unknown. New Zealand has only two extant indigenous terrestrial mammals, which are both bats, Mystacina tuberculata (the lesser short-tailed bat) and Chalinolobus tuberculatus (the long-tailed bat). Until the human introduction of exotic mammals, these species had been isolated from all other terrestrial mammals for over 1 million years (potentially over 16 million years for M. tuberculata). Four bat guano samples were collected from M. tuberculata roosts on the isolated offshore island of Whenua hou (Codfish Island) in New Zealand. Metagenomic analysis revealed that this species still hosts a plethora of divergent viruses. Whilst the majority of viruses detected were likely to be of dietary origin, some putative vertebrate virus sequences were identified. Papillomavirus, polyomavirus, calicivirus and hepevirus were found in the metagenomic data and subsequently confirmed using independent PCR assays and sequencing. The new hepevirus and calicivirus sequences may represent new genera within these viral families. Our findings may provide an insight into the origins of viral families, given their detection in an isolated host species. PMID:25900137
Discovery of novel virus sequences in an isolated and threatened bat species, the New Zealand lesser short-tailed bat (Mystacina tuberculata).

PubMed

Wang, Jing; Moore, Nicole E; Murray, Zak L; McInnes, Kate; White, Daniel J; Tompkins, Daniel M; Hall, Richard J

2015-08-01

Bats harbour a diverse array of viruses, including significant human pathogens. Extensive metagenomic studies of material from bats, in particular guano, have revealed a large number of novel or divergent viral taxa that were previously unknown. New Zealand has only two extant indigenous terrestrial mammals, which are both bats, Mystacina tuberculata (the lesser short-tailed bat) and Chalinolobus tuberculatus (the long-tailed bat). Until the human introduction of exotic mammals, these species had been isolated from all other terrestrial mammals for over 1 million years (potentially over 16 million years for M. tuberculata). Four bat guano samples were collected from M. tuberculata roosts on the isolated offshore island of Whenua hou (Codfish Island) in New Zealand. Metagenomic analysis revealed that this species still hosts a plethora of divergent viruses. Whilst the majority of viruses detected were likely to be of dietary origin, some putative vertebrate virus sequences were identified. Papillomavirus, polyomavirus, calicivirus and hepevirus were found in the metagenomic data and subsequently confirmed using independent PCR assays and sequencing. The new hepevirus and calicivirus sequences may represent new genera within these viral families. Our findings may provide an insight into the origins of viral families, given their detection in an isolated host species.
Discovery of three novel coccidian parasites infecting California sea lions (Zalophus californianus), with evidence of sexual replication and interspecies pathogenicity.

PubMed

Colegrove, Kathleen M; Grigg, Michael E; Carlson-Bremer, Daphne; Miller, Robin H; Gulland, Frances M D; Ferguson, David J P; Rejmanek, Daniel; Barr, Bradd C; Nordhausen, Robert; Melli, Ann C; Conrad, Patricia A

2011-10-01

Enteric protozoal infection was identified in 5 stranded California sea lions (Zalophus californianus). Microscopically, the apical cytoplasm of distal jejunal enterocytes contained multiple stages of coccidian parasites, including schizonts with merozoites and spherical gametocytes, which were morphologically similar to coccidians. By histopathology, organisms appeared to be confined to the intestine and accompanied by only mild enteritis. Using electron microscopy, both sexual (microgametocytes, macrogamonts) and asexual (schizonts, merozoites) coccidian stages were identified in enterocytes within parasitophorous vacuoles, consistent with apicomplexan development in a definitive host. Serology was negative for tissue cyst-forming coccidians, and immunohistochemistry for Toxoplasma gondii was inconclusive and negative for Neospora caninum and Sarcocystis neurona. Analysis of ITS-1 gene sequences amplified from frozen or formalin-fixed paraffin-embedded intestinal sections identified DNA sequences with closest homology to Neospora sp. (80%); these novel sequences were referred to as belonging to coccidian parasites "A," "B," and "C." Subsequent molecular analyses completed on a neonatal harbor seal (Phoca vitulina) with protozoal lymphadenitis, hepatitis, myocarditis, and encephalitis showed that it was infected with a coccidian parasite bearing the "C" sequence type. Our results indicate that sea lions likely serve as definitive hosts for 3 newly described coccidian parasites, at least 1 of which is pathogenic in a marine mammal intermediate host species.
Validation of the GILLS score for tongue-lip adhesion in Robin sequence patients.

PubMed

Abramowicz, Shelly; Bacic, Janine D; Mulliken, John B; Rogers, Gary F

2012-03-01

The GILLS score consists of gastroesophageal reflux disease, preoperative intubation, late surgical intervention, low birth weight, and syndromic diagnosis. The purpose of this study was to test the validity of the GILLS score in predicting success of tongue-lip adhesion (TLA) in managing Robin sequence. Infants with Robin sequence were included in the study if they had a TLA for airway compromise subsequent to formulation of the GILLS scoring system, that is, they were not included in the original GILLS analysis. The patients were prospectively considered based on the presence of the 5 factors that constitute the GILLS score. A score of ≤ 2 predicts success of TLA. Twenty patients met the inclusion criteria. Tongue-lip adhesion managed the compromised airway in 18 (90%) of 20 patients. Overall, the GILLS score had a sensitivity of 83%, specificity of 50%, positive predictive value of 94%, and negative predictive value of 25%. The GILLS score accurately predicts a successful outcome for TLA in infants with Robin sequence. For infants with a score of 2 or less, TLA is the procedure of choice. Infants with a GILLS score of 3 or greater were 5 times more likely to fail TLA than those with a score of 2 or less. In these patients, other methods of managing the airway should be considered.
Grasshopper, a long terminal repeat (LTR) retroelement in the phytopathogenic fungus Magnaporthe grisea.

PubMed

Dobinson, K F; Harris, R E; Hamer, J E

1993-01-01

The fungal phytopathogen Magnaporthe grisea parasitizes a wide variety of gramineous hosts. In the course of investigating the genetic relationship between pathogen genotype and host specificity we identified a retroelement that is present in some strains of M. grisea that infect finger millet and goosegrass (members of the plant genus Eleusine). The element, designated grasshopper (grh), is present in multiple copies and dispersed throughout the genome. DNA sequence analysis showed that grasshopper contains 198 base pair direct, long terminal repeats (LTRs) with features characteristic of retroviral and retrotransposon LTRs. Within the element we identified an open reading frame with sequences homologous to the reverse transcriptase, RNaseH, and integrase domains of retroelement pol genes. Comparison of the open reading frame with sequences from other retroelements showed that grh is related to the gypsy family of retrotransposons. Comparisons of the distribution of the grasshopper element with other dispersed repeated DNA sequences in M. grisea indicated that grasshopper was present in a broadly dispersed subgroup of Eleusine pathogens, suggesting that the element was acquired subsequent to the evolution of this host-specific form. We present arguments that the amplification of different retroelements within populations of M. grisea is a consequence of the clonal organization of the fungal populations.
Enhancing Next-Generation Sequencing-Guided Cancer Care Through Cognitive Computing.

PubMed

Patel, Nirali M; Michelini, Vanessa V; Snell, Jeff M; Balu, Saianand; Hoyle, Alan P; Parker, Joel S; Hayward, Michele C; Eberhard, David A; Salazar, Ashley H; McNeillie, Patrick; Xu, Jia; Huettner, Claudia S; Koyama, Takahiko; Utro, Filippo; Rhrissorrakrai, Kahn; Norel, Raquel; Bilal, Erhan; Royyuru, Ajay; Parida, Laxmi; Earp, H Shelton; Grilley-Olson, Juneko E; Hayes, D Neil; Harvey, Stephen J; Sharpless, Norman E; Kim, William Y

2018-02-01

Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human "molecular tumor boards" (MTBs). The purpose of this study was to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB. One thousand eighteen patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Using a WfG-curated actionable gene list, we identified additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. The majority of these additional genomic events were considered actionable based upon their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 month prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 minutes per case. These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials. The results of this study demonstrate that the interpretation and actionability of somatic next-generation sequencing results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing can significantly improve patient care by providing a fast, cost-effective, and comprehensive approach for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. © AlphaMed Press 2017.
Gene Editing Vectors for Studying Nicotinic Acetylcholine Receptors in Cholinergic Transmission.

PubMed

Peng, Can; Yan, Yijin; Kim, Veronica J; Engle, Staci E; Berry, Jennifer N; McIntosh, J Michael; Neve, Rachael L; Drenan, Ryan M

2018-05-19

Nicotinic acetylcholine receptors (nAChRs), prototype members of the cys-loop ligand gated ion channel family, are key mediators of cholinergic transmission in the central nervous system. Despite their importance, technical gaps exist in our ability to dissect the function of individual subunits in the brain. To overcome these barriers, we designed CRISPR/Cas9 small guide RNA sequences (sgRNAs) for production of loss-of-function alleles in mouse nAChR genes. These sgRNAs were validated in vitro via deep sequencing. We subsequently targeted candidate nAChR genes in vivo by creating herpes simplex virus (HSV) vectors delivering sgRNAs and Cas9 expression to mouse brain. Production of loss-of-function insertions or deletions (indels) by these "all-in-one" HSV vectors was confirmed using brain slice patch clamp electrophysiology coupled with pharmacological analysis. Next, we developed a scheme for cell type-specific gene editing in mouse brain. Knockin mice expressing Cas9 in a Cre-dependent manner were validated using viral microinjections and genetic crosses to common Cre-driver mouse lines. We subsequently confirmed functional Cas9 activity by targeting the ubiquitous neuronal protein, NeuN, using adeno associated virus (AAV) delivery of sgRNAs. Finally, the mouse β2 nAChR gene was successfully targeted in dopamine transporter (DAT) positive neurons via CRISPR/Cas9. The sgRNA sequences and viral vectors, including our scheme for Cre-dependent gene editing, should be generally useful to the scientific research community. These tools could lead to new discoveries related to the function of nAChRs in neurotransmission and behavioral processes. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Modified electrokinetic sample injection method in chromatography and electrophoresis analysis

DOEpatents

Davidson, J. Courtney; Balch, Joseph W.

2001-01-01

A sample injection method for horizontal configured multiple chromatography or electrophoresis units, each containing a number of separation/analysis channels, that enables efficient introduction of analyte samples. This method for loading when taken in conjunction with horizontal microchannels allows much reduced sample volumes and a means of sample stacking to greatly reduce the concentration of the sample. This reduction in the amount of sample can lead to great cost savings in sample preparation, particularly in massively parallel applications such as DNA sequencing. The essence of this method is in preparation of the input of the separation channel, the physical sample introduction, and subsequent removal of excess material. By this method, sample volumes of 100 nanoliter to 2 microliters have been used successfully, compared to the typical 5 microliters of sample required by the prior separation/analysis method.
Systemic Edwardsiella tarda infection in a Western African lungfish (Protopterus annectens) with cytologic observation of heterophil projections.

PubMed

Rousselet, Estelle; Stacy, Nicole I; Rotstein, David S; Waltzek, Tom B; Griffin, Matt J; Francis-Floyd, Ruth

2018-06-08

This report describes a case of systemic bacterial infection caused by Edwardsiella tarda in a Western African lungfish (Protopterus annectens) exposed to poor environmental and husbandry conditions. The fish presented with a large, external ulcerative lesion and died 2 weeks after developing anorexia. Histological evaluation revealed multifocal areas of necrosis and heterophilic and histiocytic inflammation throughout multiple tissues. Gram stain identified small numbers of intra- and extracellular monomorphic Gram-negative 1 to 2 μm rod-shaped bacilli. Cytology of lung granuloma, kidney and testes imprints identified heterophilic inflammation with phagocytosis of small monomorphic bacilli and some heterophils exhibiting cytoplasmic projections indicative of heterophil extracellular traps (HETs). Initial phenotypic analysis of isolates from coelomic fluid cultures identified E. tarda. Subsequent molecular analysis of spleen, liver and intestine DNA using an E. tarda-specific endpoint PCR assay targeting the bacterial fimbrial subunit yielded a 115 bp band. Sequencing and BLASTN search revealed the sequence was identical (76/76) to E. tarda strain FL95-01 (GenBank acc. CP011359) and displayed 93% sequence identity (66/71) to Edwardsiella hoshinae strain ATCC 35051 (GenBank acc. CP011359). This is the first report of systemic edwardsiellosis in a lungfish with concurrent cytologically identified structures suggestive of HETs. © 2018 John Wiley & Sons Ltd.
Response of the hepatic transcriptome to aflatoxin B1 in domestic turkey (Meleagris gallopavo).

PubMed

Monson, Melissa S; Settlage, Robert E; McMahon, Kevin W; Mendoza, Kristelle M; Rawal, Sumit; El-Nezami, Hani S; Coulombe, Roger A; Reed, Kent M

2014-01-01

Dietary exposure to aflatoxin B1 (AFB1) is detrimental to avian health and leads to major economic losses for the poultry industry. AFB1 is especially hepatotoxic in domestic turkeys (Meleagris gallopavo), since these birds are unable to detoxify AFB1 by glutathione-conjugation. The impacts of AFB1 on the turkey hepatic transcriptome and the potential protection from pretreatment with a Lactobacillus-based probiotic mixture were investigated through RNA-sequencing. Animals were divided into four treatment groups and RNA was subsequently recovered from liver samples. Four pooled RNA-seq libraries were sequenced to produce over 322 M reads totaling 13.8 Gb of sequence. Approximately 170,000 predicted transcripts were de novo assembled, of which 803 had significant differential expression in at least one pair-wise comparison between treatment groups. Functional analysis linked many of the transcripts significantly affected by AFB1 exposure to cancer, apoptosis, the cell cycle or lipid regulation. Most notable were transcripts from the genes encoding E3 ubiquitin-protein ligase Mdm2, osteopontin, S-adenosylmethionine synthase isoform type-2, and lipoprotein lipase. Expression was modulated by the probiotics, but treatment did not completely mitigate the effects of AFB1. Genes identified through transcriptome analysis provide candidates for further study of AFB1 toxicity and targets for efforts to improve the health of domestic turkeys exposed to AFB1.
Uprobe: a genome-wide universal probe resource for comparative physical mapping in vertebrates.

PubMed

Kellner, Wendy A; Sullivan, Robert T; Carlson, Brian H; Thomas, James W

2005-01-01

Interspecies comparisons are important for deciphering the functional content and evolution of genomes. The expansive array of >70 public vertebrate genomic bacterial artificial chromosome (BAC) libraries can provide a means of comparative mapping, sequencing, and functional analysis of targeted chromosomal segments that is independent and complementary to whole-genome sequencing. However, at the present time, no complementary resource exists for the efficient targeted physical mapping of the majority of these BAC libraries. Universal overgo-hybridization probes, designed from regions of sequenced genomes that are highly conserved between species, have been demonstrated to be an effective resource for the isolation of orthologous regions from multiple BAC libraries in parallel. Here we report the application of the universal probe design principal across entire genomes, and the subsequent creation of a complementary probe resource, Uprobe, for screening vertebrate BAC libraries. Uprobe currently consists of whole-genome sets of universal overgo-hybridization probes designed for screening mammalian or avian/reptilian libraries. Retrospective analysis, experimental validation of the probe design process on a panel of representative BAC libraries, and estimates of probe coverage across the genome indicate that the majority of all eutherian and avian/reptilian genes or regions of interest can be isolated using Uprobe. Future implementation of the universal probe design strategy will be used to create an expanded number of whole-genome probe sets that will encompass all vertebrate genomes.
Papio cynocephalus endogenous retrovirus among old world monkeys: evidence for coevolution and ancient cross-species transmissions.

PubMed

Mang, R; Maas, J; van Der Kuyl, A C; Goudsmit, J

2000-02-01

To study the evolutionary history of Papio cynocephalus endogenous retrovirus (PcEV), we analyzed the distribution and genetic characteristics of PcEV among 17 different species of primates. The viral pol-env and long terminal repeat and untranslated region (LTR-UTR) sequences could be recovered from all Old World species of the papionin tribe, which includes baboons, macaques, geladas, and mangabeys, but not from the New World monkeys and hominoids we tested. The Old World genera Cercopithecus and Miopithecus hosted either a PcEV variant with an incomplete genome or a virus with substantial mismatches in the LTR-UTR. A complete PcEV was found in the genome of Colobus guereza-but not in Colobus badius-with a copy number of 44 to 61 per diploid genome, comparable to that seen in papionins, and with a sequence most closely related to a virus of the papionin tribe. Analysis of evolutionary distances among PcEV sequences for synonymous and nonsynonymous sites indicated that purifying selection was operational during PcEV evolution. Phylogenetic analysis suggested that possibly two subtypes of PcEV entered the germ line of a common ancestor of the papionins and subsequently coevolved with their hosts. One strain of PcEV was apparently transmitted from a papionin ancestor to an ancestor of the central African lowland C. guereza.
Papio cynocephalus Endogenous Retrovirus among Old World Monkeys: Evidence for Coevolution and Ancient Cross-Species Transmissions

PubMed Central

Mang, Rui; Maas, Jolanda; van der Kuyl, Antoinette C.; Goudsmit, Jaap

2000-01-01

To study the evolutionary history of Papio cynocephalus endogenous retrovirus (PcEV), we analyzed the distribution and genetic characteristics of PcEV among 17 different species of primates. The viral pol-env and long terminal repeat and untranslated region (LTR-UTR) sequences could be recovered from all Old World species of the papionin tribe, which includes baboons, macaques, geladas, and mangabeys, but not from the New World monkeys and hominoids we tested. The Old World genera Cercopithecus and Miopithecus hosted either a PcEV variant with an incomplete genome or a virus with substantial mismatches in the LTR-UTR. A complete PcEV was found in the genome of Colobus guereza—but not in Colobus badius—with a copy number of 44 to 61 per diploid genome, comparable to that seen in papionins, and with a sequence most closely related to a virus of the papionin tribe. Analysis of evolutionary distances among PcEV sequences for synonymous and nonsynonymous sites indicated that purifying selection was operational during PcEV evolution. Phylogenetic analysis suggested that possibly two subtypes of PcEV entered the germ line of a common ancestor of the papionins and subsequently coevolved with their hosts. One strain of PcEV was apparently transmitted from a papionin ancestor to an ancestor of the central African lowland C. guereza. PMID:10627573
Biclustering as a method for RNA local multiple sequence alignment.

PubMed

Wang, Shu; Gutell, Robin R; Miranker, Daniel P

2007-12-15

Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/
Incomplete Timothy syndrome secondary to a mosaic mutation of the CACNA1C gene diagnosed using next-generation sequencing.

PubMed

Baurand, Amandine; Falcon-Eicher, Sylvie; Laurent, Gabriel; Villain, Elisabeth; Bonnet, Caroline; Thauvin-Robinet, Christel; Jacquot, Caroline; Eicher, Jean-Christophe; Gourraud, Jean-Baptiste; Schmitt, Sébastien; Bézieau, Stéphane; Giraud, Mathilde; Dumont, Solenne; Kuentz, Paul; Probst, Vincent; Burguet, Antoine; Kyndt, Florence; Faivre, Laurence

2017-02-01

Autosomal dominant genetic diseases can occur de novo and in the form of somatic mosaicism, which can give rise to a less severe phenotype, and make diagnosis more difficult given the sensitivity limits of the methods used. We report the case of female child with a history of surgery for syndactyly of the hands and feet, who was admitted at 6 years of age to a pediatric intensive care unit following cardiac arrest. The electrocardiogram (ECG) showed a long QT interval that on occasions reached 500 ms. Despite the absence of facial dysmorphism and the presence of normal psychomotor development, a diagnosis of Timothy syndrome was made given the association of syndactyly and the ECG features. Sanger sequencing of the CACNA1C gene, followed by sequencing of the genes KCNQ1, KCNH2, KCNE1, KCNE2, were negative. The subsequent analysis of a panel of genes responsible for hereditary cardiac rhythm disorders using Haloplex technology revealed a recurrent mosaic p.Gly406Arg missense mutation of the CACNA1C gene in 18% of the cells. This mosaicism can explain the negative Sanger analysis and the less complete phenotype in this patient. Given the other cases in the literature, mosaic mutations in Timothy syndrome appear more common than previously thought. This case demonstrates the importance of using next-generation sequencing to identify mosaic mutations when the clinical picture supports a specific mutation that is not identified using conventional testing. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Andersen, G.L.; He, Z.; DeSantis, T.Z.

Microarrays have proven to be a useful and high-throughput method to provide targeted DNA sequence information for up to many thousands of specific genetic regions in a single test. A microarray consists of multiple DNA oligonucleotide probes that, under high stringency conditions, hybridize only to specific complementary nucleic acid sequences (targets). A fluorescent signal indicates the presence and, in many cases, the abundance of genetic regions of interest. In this chapter we will look at how microarrays are used in microbial ecology, especially with the recent increase in microbial community DNA sequence data. Of particular interest to microbial ecologists, phylogeneticmore » microarrays are used for the analysis of phylotypes in a community and functional gene arrays are used for the analysis of functional genes, and, by inference, phylotypes in environmental samples. A phylogenetic microarray that has been developed by the Andersen laboratory, the PhyloChip, will be discussed as an example of a microarray that targets the known diversity within the 16S rRNA gene to determine microbial community composition. Using multiple, confirmatory probes to increase the confidence of detection and a mismatch probe for every perfect match probe to minimize the effect of cross-hybridization by non-target regions, the PhyloChip is able to simultaneously identify any of thousands of taxa present in an environmental sample. The PhyloChip is shown to reveal greater diversity within a community than rRNA gene sequencing due to the placement of the entire gene product on the microarray compared with the analysis of up to thousands of individual molecules by traditional sequencing methods. A functional gene array that has been developed by the Zhou laboratory, the GeoChip, will be discussed as an example of a microarray that dynamically identifies functional activities of multiple members within a community. The recent version of GeoChip contains more than 24,000 50mer oligonucleotide probes and covers more than 10,000 gene sequences in 150 gene categories involved in carbon, nitrogen, sulfur, and phosphorus cycling, metal resistance and reduction, and organic contaminant degradation. GeoChip can be used as a generic tool for microbial community analysis, and also link microbial community structure to ecosystem functioning. Examples of the application of both arrays in different environmental samples will be described in the two subsequent sections.« less
Genetic and structural analyses of cytochrome P450 hydroxylases in sex hormone biosynthesis: Sequential origin and subsequent coevolution.

PubMed

Goldstone, Jared V; Sundaramoorthy, Munirathinam; Zhao, Bin; Waterman, Michael R; Stegeman, John J; Lamb, David C

2016-01-01

Biosynthesis of steroid hormones in vertebrates involves three cytochrome P450 hydroxylases, CYP11A1, CYP17A1 and CYP19A1, which catalyze sequential steps in steroidogenesis. These enzymes are conserved in the vertebrates, but their origin and existence in other chordate subphyla (Tunicata and Cephalochordata) have not been clearly established. In this study, selected protein sequences of CYP11A1, CYP17A1 and CYP19A1 were compiled and analyzed using multiple sequence alignment and phylogenetic analysis. Our analyses show that cephalochordates have sequences orthologous to vertebrate CYP11A1, CYP17A1 or CYP19A1, and that echinoderms and hemichordates possess CYP11-like but not CYP19 genes. While the cephalochordate sequences have low identity with the vertebrate sequences, reflecting evolutionary distance, the data show apparent origin of CYP11 prior to the evolution of CYP19 and possibly CYP17, thus indicating a sequential origin of these functionally related steroidogenic CYPs. Co-occurrence of the three CYPs in early chordates suggests that the three genes may have coevolved thereafter, and that functional conservation should be reflected in functionally important residues in the proteins. CYP19A1 has the largest number of conserved residues while CYP11A1 sequences are less conserved. Structural analyses of human CYP11A1, CYP17A1 and CYP19A1 show that critical substrate binding site residues are highly conserved in each enzyme family. The results emphasize that the steroidogenic pathways producing glucocorticoids and reproductive steroids are several hundred million years old and that the catalytic structural elements of the enzymes have been conserved over the same period of time. Analysis of these elements may help to identify when precursor functions linked to these enzymes first arose. Copyright © 2015 Elsevier Inc. All rights reserved.

Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation.

PubMed

Simmons, Sheri L; Dibartolo, Genevieve; Denef, Vincent J; Goltsman, Daniela S Aliaga; Thelen, Michael P; Banfield, Jillian F

2008-07-22

Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination.
Transcriptome analysis of Houttuynia cordata Thunb. by Illumina paired-end RNA sequencing and SSR marker discovery.

PubMed

Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

2014-01-01

Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10(-5)), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus.
Modeling backbone flexibility to achieve sequence diversity: The design of novel alpha-helical ligands for Bcl-xL

PubMed Central

Fu, Xiaoran; Apgar, James R.; Keating, Amy E.

2007-01-01

Computational protein design can be used to select sequences that are compatible with a fixed-backbone template. This strategy has been used in numerous instances to engineer novel proteins. However, the fixed-backbone assumption severely restricts the sequence space that is accessible via design. For challenging problems, such as the design of functional proteins, this may not be acceptable. In this paper, we present a method for introducing backbone flexibility into protein design calculations and apply it to the design of diverse helical BH3 ligands that bind to the anti-apoptotic protein Bcl-xL, a member of the Bcl-2 protein family. We demonstrate how normal mode analysis can be used to sample different BH3 backbones, and show that this leads to a larger and more diverse set of low-energy solutions than can be achieved using a native high-resolution Bcl-xL complex crystal structure as a template. We tested several of the designed solutions experimentally and found that this approach worked well when normal mode calculations were used to deform a native BH3 helix structure, but less well when they were used to deform an idealized helix. A subsequent round of design and testing identified a likely source of the problem as inadequate sampling of the helix pitch. In all, we tested seventeen designed BH3 peptide sequences, including several point mutants. Of these, eight bound well to Bcl-xL and four others showed weak but detectable binding. The successful designs showed a diversity of sequences that would have been difficult or impossible to achieve using only a fixed backbone. Thus, introducing backbone flexibility via normal mode analysis effectively broadened the set of sequences identified by computational design, and provided insight into positions important for binding Bcl-xL. PMID:17597151
Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation

PubMed Central

Denef, Vincent J; Goltsman, Daniela S. Aliaga; Thelen, Michael P; Banfield, Jillian F

2008-01-01

Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination. PMID:18651792
Insufficient Chunk Concatenation May Underlie Changes in Sleep-Dependent Consolidation of Motor Sequence Learning in Older Adults

ERIC Educational Resources Information Center

Bottary, Ryan; Sonni, Akshata; Wright, David; Spencer, Rebecca M. C.

2016-01-01

Sleep enhances motor sequence learning (MSL) in young adults by concatenating subsequences ("chunks") formed during skill acquisition. To examine whether this process is reduced in aging, we assessed performance changes on the MSL task following overnight sleep or daytime wake in healthy young and older adults. Young adult performance…
Analysis Method for Non-Nominal First Acquisition

NASA Technical Reports Server (NTRS)

Sieg, Detlef; Mugellesi-Dow, Roberta

2007-01-01

First this paper describes a method how the trajectory of the launcher can be modelled for the contingency analysis without having much information about the launch vehicle itself. From a dense sequence of state vectors a velocity profile is derived which is sufficiently accurate to enable the Flight Dynamics Team to integrate parts of the launcher trajectory on its own and to simulate contingency cases by modifying the velocity profile. Then the paper focuses on the thorough visibility analysis which has to follow the contingency case or burn performance simulations. In the ideal case it is possible to identify a ground station which is able to acquire the satellite independent from the burn performance. The correlations between the burn performance and the pointing at subsequent ground stations are derived with the aim of establishing simple guidelines which can be applied quickly and which significantly improve the chance of acquisition at subsequent ground stations. In the paper the method is applied to the Soyuz/Fregat launch with the MetOp satellite. Overall the paper shows that the launcher trajectory modelling with the simulation of contingency cases in connection with a ground station visibility analysis leads to a proper selection of ground stations and acquisition methods. In the MetOp case this ensured successful contact of all ground stations during the first hour after separation without having to rely on any early orbit determination result or state vector update.
Common and Rare EGFR and KRAS Mutations in a Dutch Non-Small-Cell Lung Cancer Population and Their Clinical Outcome

PubMed Central

Kerner, Gerald S. M. A.; Schuuring, Ed; Sietsma, Johanna; Hiltermann, Thijo J. N.; Pieterman, Remge M.; de Leede, Gerard P. J.; van Putten, John W. G.; Liesker, Jeroen; Renkema, Tineke E. J.; van Hengel, Peter; Platteel, Inge; Timens, Wim; Groen, Harry J. M.

2013-01-01

Introduction In randomly assigned studies with EGFR TKI only a minor proportion of patients with NSCLC have genetically profiled biopsies. Guidelines provide evidence to perform EGFR and KRAS mutation analysis in non-squamous NSCLC. We explored tumor biopsy quality offered for mutation testing, different mutations distribution, and outcome with EGFR TKI. Patient and Methods Clinical data from 8 regional hospitals were studied for patient and tumor characteristics, treatment and overall survival. Biopsies sent to the central laboratory were evaluated for DNA quality and subsequently analyzed for mutations in exons 18–21 of EGFR and exon 2 of KRAS by bidirectional sequence analysis. Results Tumors from 442 subsequent patients were analyzed. For 74 patients (17%) tumors were unsuitable for mutation analysis. Thirty-eight patients (10.9%) had EGFR mutations with 79% known activating mutations. One hundred eight patients (30%) had functional KRAS mutations. The mutation spectrum was comparable to the Cosmic database. Following treatment in the first or second line with EGFR TKI median overall survival for patients with EGFR (n = 14), KRAS (n = 14) mutations and wild type EGFR/KRAS (n = 31) was not reached, 20 and 9 months, respectively. Conclusion One out of every 6 tumor samples was inadequate for mutation analysis. Patients with EGFR activating mutations treated with EGFR-TKI have the longest survival. PMID:23922984
A Review of Subsequence Time Series Clustering

PubMed Central

Teh, Ying Wah

2014-01-01

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies. PMID:25140332
A review of subsequence time series clustering.

PubMed

Zolhavarieh, Seyedjamal; Aghabozorgi, Saeed; Teh, Ying Wah

2014-01-01

Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.
The effects of rest interval length manipulation of the first upper-body resistance exercise in sequence on acute performance of subsequent exercises in men and women.

PubMed

Ratamess, Nicholas A; Chiarello, Christina M; Sacco, Anthony J; Hoffman, Jay R; Faigenbaum, Avery D; Ross, Ryan E; Kang, Jie

2012-11-01

The purpose of the present study was to investigate the effects of manipulating rest interval (RI) length of the first upper-body exercise in sequence on subsequent resistance exercise performance. Twenty-two men and women with at least 1 year of resistance training experience performed resistance exercise protocols on 3 occasions in random order. Each protocol consisted of performing 4 barbell upper-body exercises in the same sequence (bench press, incline bench press, shoulder press, and bent-over row) for 3 sets of up to 10 repetitions with 75% of 1 repetition maximum. Bench press RIs were 1, 2, or 3 minutes, whereas other exercises were performed with a standard 2-minute rest interval. The number of repetitions completed, average power, and velocity for each set of each exercise were recorded. Gender differences were observed during the bench press and incline press as women performed significantly (p ≤ 0.05) more repetitions than men during all RIs. The magnitude of decline in velocity and power over 3 sets of the bench press and incline press was significantly higher in men than women. Manipulation of RI length during the bench press did not affect performance of the remaining exercises in men. However, significantly more repetitions were performed by women during the first set of the incline press using 3-minute rest interval than 1-minute rest interval. In men and women, performance of the incline press and shoulder press was compromised compared with baseline performances. Manipulation of RI length of the first exercise affected performance of only the first set of 1 subsequent exercise in women. All RIs led to comparable levels of fatigue in men, indicating that reductions in load are necessary for subsequent exercises performed in sequence that stress similar agonist muscle groups when 10 repetitions are desired.
A Sensitive TLRH Targeted Imaging Technique for Ultrasonic Molecular Imaging

PubMed Central

Hu, Xiaowen; Zheng, Hairong; Kruse, Dustin E.; Sutcliffe, Patrick; Stephens, Douglas N.; Ferrara, Katherine W.

2010-01-01

The primary goals of ultrasound molecular imaging are the detection and imaging of ultrasound contrast agents (microbubbles), which are bound to specific vascular surface receptors. Imaging methods that can sensitively and selectively detect and distinguish bound microbubbles from freely circulating microbubbles (free microbubbles) and surrounding tissue are critically important for the practical application of ultrasound contrast molecular imaging. Microbubbles excited by low frequency acoustic pulses emit wide-band echoes with a bandwidth extending beyond 20 MHz; we refer to this technique as TLRH (transmission at a low frequency and reception at a high frequency). Using this wideband, transient echo, we have developed and implemented a targeted imaging technique incorporating a multi-frequency co-linear array and the Siemens Antares® imaging system. The multi-frequency co-linear array integrates a center 5.4 MHz array, used to receive echoes and produce radiation force, and two outer 1.5 MHz arrays used to transmit low frequency incident pulses. The targeted imaging technique makes use of an acoustic radiation force sub-sequence to enhance accumulation and a TLRH imaging sub-sequence to detect bound microbubbles. The radiofrequency (RF) data obtained from the TLRH imaging sub-sequence are processsed to separate echo signatures between tissue, free microbubbles, and bound microbubbles. By imaging biotin-coated microbubbles targeted to avidin-coated cellulose tubes, we demonstrate that the proposed method has a high contrast-to-tissue ratio (up to 34 dB) and a high sensitivity to bound microbubbles (with the ratio of echoes from bound microbubbles versus free microbubbles extending up to 23 dB). The effects of the imaging pulse acoustic pressure, the radiation force sub-sequence and the use of various slow-time filters on the targeted imaging quality are studied. The TLRH targeted imaging method is demonstrated in this study to provide sensitive and selective detection of bound microbubbles for ultrasound molecularly-targeted imaging. PMID:20178897
Environmental RNAi in herbivorous insects.

PubMed

Ivashuta, Sergey; Zhang, Yuanji; Wiggins, B Elizabeth; Ramaseshadri, Partha; Segers, Gerrit C; Johnson, Steven; Meyer, Steve E; Kerstetter, Randy A; McNulty, Brian C; Bolognesi, Renata; Heck, Gregory R

2015-05-01

Environmental RNAi (eRNAi) is a sequence-specific regulation of endogenous gene expression in a receptive organism by exogenous double-stranded RNA (dsRNA). Although demonstrated under artificial dietary conditions and via transgenic plant presentations in several herbivorous insects, the magnitude and consequence of exogenous dsRNA uptake and the role of eRNAi remains unknown under natural insect living conditions. Our analysis of coleopteran insects sensitive to eRNAi fed on wild-type plants revealed uptake of plant endogenous long dsRNAs, but not small RNAs. Subsequently, the dsRNAs were processed into 21 nt siRNAs by insects and accumulated in high quantities in insect cells. No accumulation of host plant-derived siRNAs was observed in lepidopteran larvae that are recalcitrant to eRNAi. Stability of ingested dsRNA in coleopteran larval gut followed by uptake and transport from the gut to distal tissues appeared to be enabling factors for eRNAi. Although a relatively large number of distinct coleopteran insect-processed plant-derived siRNAs had sequence complementarity to insect transcripts, the vast majority of the siRNAs were present in relatively low abundance, and RNA-seq analysis did not detect a significant effect of plant-derived siRNAs on insect transcriptome. In summary, we observed a broad genome-wide uptake of plant endogenous dsRNA and subsequent processing of ingested dsRNA into 21 nt siRNAs in eRNAi-sensitive insects under natural feeding conditions. In addition to dsRNA stability in gut lumen and uptake, dosage of siRNAs targeting a given insect transcript is likely an important factor in order to achieve measurable eRNAi-based regulation in eRNAi-competent insects that lack an apparent silencing amplification mechanism. © 2015 Ivashuta et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
On the Regularities of the Polar Profiles of Proteins Related to Ebola Virus Infection and their Functional Domains.

PubMed

Polanco, Carlos; Samaniego Mendoza, José Lino; Buhse, Thomas; Uversky, Vladimir N; Bañuelos Chao, Ingrid Paola; Bañuelos Cedano, Marcela Angola; Tavera, Fernando Michel; Tavera, Daniel Michel; Falconi, Manuel; Ponce de León, Abelardo Vela

2018-03-06

The number of fatalities and economic losses caused by the Ebola virus infection across the planet culminated in the havoc that occurred between August and November 2014. However, little is known about the molecular protein profile of this devastating virus. This work represents a thorough bioinformatics analysis of the regularities of charge distribution (polar profiles) in two groups of proteins and their functional domains associated with Ebola virus disease: Ebola virus proteins and Human proteins interacting with Ebola virus. Our analysis reveals that a fragment exists in each of these proteins-one named the "functional domain"-with the polar profile similar to the polar profile of the protein that contains it. Each protein is formed by a group of short sub-sequences, where each fragment has a different and distinctive polar profile and where the polar profile between adjacent short sub-sequences changes orderly and gradually to coincide with the polar profile of the whole protein. When using the charge distribution as a metric, it was observed that it effectively discriminates the proteins from their functional domains. As a counterexample, the same test was applied to a set of synthetic proteins built for that purpose, revealing that any of the regularities reported here for the Ebola virus proteins and human proteins interacting with Ebola virus were not present in the synthetic proteins. Our results indicate that the polar profile of each protein studied and its corresponding functional domain are similar. Thus, when building each protein from its functional domai-adding one amino acid at a time and plotting each time its polar profile-it was observed that the resulting graphs can be divided into groups with similar polar profiles.
Comprehensive Analysis of Protein Modifications by Top-down Mass Spectrometry

PubMed Central

Zhang, Han; Ge, Ying

2012-01-01

Mass spectrometry (MS)-based proteomics is playing an increasingly important role in cardiovascular research. Proteomics includes not only identification and quantification of proteins, but also the characterization of protein modifications such as post-translational modifications and sequence variants. The conventional bottom-up approach, involving proteolytic digestion of proteins into small peptides prior to MS analysis, is routinely used for protein identification and quantification with high throughput and automation. Nevertheless, it has limitations in the analysis of protein modifications mainly due to the partial sequence coverage and loss of connections among modifications on disparate portions of a protein. An alternative approach, top-down MS, has emerged as a powerful tool for the analysis of protein modifications. The top-down approach analyzes whole proteins directly, providing a “bird’s eye” view of all existing modifications. Subsequently, each modified protein form can be isolated and fragmented in the mass spectrometer to locate the modification site. The incorporation of the non-ergodic dissociation methods such as electron capture dissociation (ECD) greatly enhances the top-down capabilities. ECD is especially useful for mapping labile post-translational modifications which are well-preserved during the ECD fragmentation process. Top-down MS with ECD has been successfully applied to cardiovascular research with the unique advantages in unraveling the molecular complexity, quantifying modified protein forms, complete mapping of modifications with full sequence coverage, discovering unexpected modifications, and identifying and quantifying positional isomers and determining the order of multiple modifications. Nevertheless, top-down MS still needs to overcome some technical challenges to realize its full potential. Herein, we reviewed the advantages and challenges of top-down methodology with a focus on its application in cardiovascular research. PMID:22187450
Characterization of Metarhizium viride Mycosis in Veiled Chameleons (Chamaeleo calyptratus), Panther Chameleons (Furcifer pardalis), and Inland Bearded Dragons (Pogona vitticeps).

PubMed

Schmidt, Volker; Klasen, Linus; Schneider, Juliane; Hübel, Jens; Pees, Michael

2017-03-01

Metarhizium viride has been associated with fatal systemic mycoses in chameleons, but subsequent data on mycoses caused by this fungus in reptiles are lacking. The aim of this investigation was therefore to obtain information on the presence of M. viride in reptiles kept as pets in captivity and its association with clinical signs and pathological findings as well as improvement of diagnostic procedures. Beside 18S ribosomal DNA (rDNA) (small subunit [SSU]) and internal transcribed spacer region 1 (ITS-1), a fragment of the large subunit (LSU) of 28S rDNA, including domain 1 (D1) and D2, was sequenced for the identification of the fungus and phylogenetic analysis. Cultural isolation and histopathological examinations as well as the pattern of antifungal drug resistance, determined by using agar diffusion testing, were additionally used for comparison of the isolates. In total, 20 isolates from eight inland bearded dragons ( Pogona vitticeps ), six veiled chameleons ( Chamaeleo calyptratus ), and six panther chameleons ( Furcifer pardalis ) were examined. Most of the lizards suffered from fungal glossitis, stomatitis, and pharyngitis or died due to visceral mycosis. Treatment with different antifungal drugs according to resistance patterns in all three different lizard species was unsuccessful. Sequence analysis resulted in four different genotypes of M. viride based on differences in the LSU fragment, whereas the SSU and ITS-1 were identical in all isolates. Sequence analysis of the SSU fragment revealed the first presentation of a valid large fragment of the SSU of M. viride According to statistical analysis, genotypes did not correlate with differences in pathogenicity, antifungal susceptibility, or species specificity. Copyright © 2017 American Society for Microbiology.
Association analysis for udder index and milking speed with imputed whole-genome sequence variants in Nordic Holstein cattle.

PubMed

Jardim, Júlia Gazzoni; Guldbrandtsen, Bernt; Lund, Mogens Sandø; Sahana, Goutam

2018-03-01

Genome-wide association testing facilitates the identification of genetic variants associated with complex traits. Mapping genes that promote genetic resistance to mastitis could reduce the cost of antibiotic use and enhance animal welfare and milk production by improving outcomes of breeding for udder health. Using imputed whole-genome sequence variants, we carried out association studies for 2 traits related to udder health, udder index, and milking speed in Nordic Holstein cattle. A total of 4,921 bulls genotyped with the BovineSNP50 BeadChip array were imputed to high-density genotypes (Illumina BovineHD BeadChip, Illumina, San Diego, CA) and, subsequently, to whole-genome sequence variants. An association analysis was carried out using a linear mixed model. Phenotypes used in the association analyses were deregressed breeding values. Multitrait meta-analysis was carried out for these 2 traits. We identified 10 and 8 chromosomes harboring markers that were significantly associated with udder index and milking speed, respectively. Strongest association signals were observed on chromosome 20 for udder index and chromosome 19 for milking speed. Multitrait meta-analysis identified 13 chromosomes harboring associated markers for the combination of udder index and milking speed. The associated region on chromosome 20 overlapped with earlier reported quantitative trait loci for similar traits in other cattle populations. Moreover, this region was located close to the FYB gene, which is involved in platelet activation and controls IL-2 expression; FYB is a strong candidate gene for udder health and worthy of further investigation. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Characterization of Metarhizium viride Mycosis in Veiled Chameleons (Chamaeleo calyptratus), Panther Chameleons (Furcifer pardalis), and Inland Bearded Dragons (Pogona vitticeps)

PubMed Central

Klasen, Linus; Schneider, Juliane; Hübel, Jens; Pees, Michael

2016-01-01

ABSTRACT Metarhizium viride has been associated with fatal systemic mycoses in chameleons, but subsequent data on mycoses caused by this fungus in reptiles are lacking. The aim of this investigation was therefore to obtain information on the presence of M. viride in reptiles kept as pets in captivity and its association with clinical signs and pathological findings as well as improvement of diagnostic procedures. Beside 18S ribosomal DNA (rDNA) (small subunit [SSU]) and internal transcribed spacer region 1 (ITS-1), a fragment of the large subunit (LSU) of 28S rDNA, including domain 1 (D1) and D2, was sequenced for the identification of the fungus and phylogenetic analysis. Cultural isolation and histopathological examinations as well as the pattern of antifungal drug resistance, determined by using agar diffusion testing, were additionally used for comparison of the isolates. In total, 20 isolates from eight inland bearded dragons (Pogona vitticeps), six veiled chameleons (Chamaeleo calyptratus), and six panther chameleons (Furcifer pardalis) were examined. Most of the lizards suffered from fungal glossitis, stomatitis, and pharyngitis or died due to visceral mycosis. Treatment with different antifungal drugs according to resistance patterns in all three different lizard species was unsuccessful. Sequence analysis resulted in four different genotypes of M. viride based on differences in the LSU fragment, whereas the SSU and ITS-1 were identical in all isolates. Sequence analysis of the SSU fragment revealed the first presentation of a valid large fragment of the SSU of M. viride. According to statistical analysis, genotypes did not correlate with differences in pathogenicity, antifungal susceptibility, or species specificity. PMID:28003420
Experimental Investigations on Subsequent Yield Surface of Pure Copper by Single-Sample and Multi-Sample Methods under Various Pre-Deformation.

PubMed

Liu, Gui-Long; Huang, Shi-Hong; Shi, Che-Si; Zeng, Bin; Zhang, Ke-Shi; Zhong, Xian-Ci

2018-02-10

Using copper thin-walled tubular specimens, the subsequent yield surfaces under pre-tension, pre-torsion and pre-combined tension-torsion are measured, where the single-sample and multi-sample methods are applied respectively to determine the yield stresses at specified offset strain. The rule and characteristics of the evolution of the subsequent yield surface are investigated. Under the conditions of different pre-strains, the influence of test point number, test sequence and specified offset strain on the measurement of subsequent yield surface and the concave phenomenon for measured yield surface are studied. Moreover, the feasibility and validity of the two methods are compared. The main conclusions are drawn as follows: (1) For the single or multi-sample method, the measured subsequent yield surfaces are remarkably different from cylindrical yield surfaces proposed by the classical plasticity theory; (2) there are apparent differences between the test results from the two kinds of methods: the multi-sample method is not influenced by the number of test points, test order and the cumulative effect of residual plastic strain resulting from the other test point, while those are very influential in the single-sample method; and (3) the measured subsequent yield surface may appear concave, which can be transformed to convex for single-sample method by changing the test sequence. However, for the multiple-sample method, the concave phenomenon will disappear when a larger offset strain is specified.
Geologic and Fossil Locality Maps of the West-Central Part of the Howard Pass Quadrangle and Part of the Adjacent Misheguk Mountain Quadrangle, Western Brooks Range, Alaska

USGS Publications Warehouse

Dover, James H.; Tailleur, Irvin L.; Dumoulin, Julie A.

2004-01-01

The map depicts the field distribution and contact relations between stratigraphic units, the tectonic relations between major stratigraphic sequences, and the detailed internal structure of these sequences. The stratigraphic sequences formed in a variety of continental margin depositional environments, and subsequently underwent a complexde formational history of imbricate thrust faulting and folding. A compilation of micro and macro fossil identifications is included in this data set.
NH4+ ad-/desorption in sequencing batch reactors: simulation, laboratory and full-scale studies.

PubMed

Schwitalla, P; Mennerich, A; Austermann-Haun, U; Müller, A; Dorninger, C; Daims, H; Holm, N C; Rönner-Holm, S G E

2008-01-01

Significant NH4-N balance deficits were found during the measurement campaigns for the data collection for dynamic simulation studies at five full-scale sequencing batch reactor (SBR) waste water treatment plants (WWTPs), as well as during subsequent calibrations at the investigated plants. Subsequent lab scale investigations showed high evidence for dynamic, cycle-specific NH4+ ad-/desorption to the activated flocs as one reason for this balance deficit. This specific dynamic was investigated at five full-scale SBR plants for the search of the general causing mechanisms. The general mechanism found was a NH4+ desorption from the activated flocs at the end of the nitrification phase with subsequent nitrification and a chemical NH4+ adsorption at the flocs in the course of the filling phases. This NH4+ ad-/desorption corresponds to an antiparallel K+ ad/-desorption.One reasonable full-scale application was investigated at three SBR plants, a controlled filling phase at the beginning of the sedimentation phase. The results indicate that this kind of filling event must be specifically hydraulic controlled and optimised in order to prevent too high waste water break through into the clear water phase, which will subsequently be discarded. IWA Publishing 2008.

[Cloning and sequence analysis of tomato fruit-specific E8 promoter from Lycopersicon esculentum (Zhongshu No.5)].

PubMed

Zhou, Xiao-hong; Chen, Xiao-guang; Zhang, Xiao-dong; Wang, Ya-nan; Li, Lin; Xi, Jia-fei; Hu, Jian-jun

2003-01-01

To obtain the gene encoding tomato fruit-specific E8 promoter therefore to prepare for exogenous gene transcription and expression in transgenic tomato fruit. The cotyledons of tomato Lycopersicon esculentum (Zhongshu No.5) were collected for extracting the genomic DNA of this plant. The fruit-specific E81.1 and E82.2 promoter DNA were then amplified by PCR, the product of which was subcloned into pGEM-T vector. After identification by restriction enzymes, the recombinant T-vectors were subjected to sequence analysis. The fragments of the promoter as amplified by PCR were of predicted length. Digestion with Xba I and Hind III /BamH I proved correct insertion of the target fragments with expected length into the recombinant T vectors. As indicated by homology analysis, the resultant tomato fruit-specific E8 promoter was highly conservative, and E82.2 promoter of Zhongshu No.5, with GenBank submission number of AF515784, proved to share 99% homology with E82.2 promoter of Zhongshu No.5 Cherry as reported by Deikman J. Tomato fruit-specific E8 promoter of Zhongshu No.5 has been successfully cloned, thus making possible the subsequent research in oral vaccine of transgenic tomato.
Microbial diversity of supra- and subgingival biofilms on freshly colonized titanium implant abutments in the human mouth.

PubMed

Heuer, W; Stiesch, M; Abraham, W R

2011-02-01

Supra- and subgingival biofilm formation is considered to be mainly responsible for early implant failure caused by inflammations of periimplant tissues. Nevertheless, little is known about the complex microbial diversity and interindividual similarities around dental implants. An atraumatic assessment was made of the diversity of microbial communities around titanium implants by single strand conformation polymorphism (SSCP) analysis of the 16S rRNA gene amplicons as well as subsequent sequence analysis. Samples of adherent supra- and subgingival periimplant biofilms were collected from ten patients. Additionally, samples of sulcusfluid were taken at titanium implant abutments and remaining teeth. The bacteria in the samples were characterized by SSCP and sequence analysis. A high diversity of bacteria varying between patients and within one patient at different locations was found. Bacteria characteristic for sulcusfluid and supra- and subgingival biofilm communities were identified. Sulcusfluid of the abutments showed higher abundance of Streptococcus species than from residual teeth. Prevotella and Rothia species frequently reported from the oral cavity were not detected at the abutments suggesting a role as late colonizers. Different niches in the human mouth are characterized by specific groups of bacteria. Implant abutments are a very valuable approach to study dental biofilm development in vivo.
Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma.

PubMed

Kubicek, Christian P; Herrera-Estrella, Alfredo; Seidl-Seiboth, Verena; Martinez, Diego A; Druzhinina, Irina S; Thon, Michael; Zeilinger, Susanne; Casas-Flores, Sergio; Horwitz, Benjamin A; Mukherjee, Prasun K; Mukherjee, Mala; Kredics, László; Alcaraz, Luis D; Aerts, Andrea; Antal, Zsuzsanna; Atanasova, Lea; Cervantes-Badillo, Mayte G; Challacombe, Jean; Chertkov, Olga; McCluskey, Kevin; Coulpier, Fanny; Deshpande, Nandan; von Döhren, Hans; Ebbole, Daniel J; Esquivel-Naranjo, Edgardo U; Fekete, Erzsébet; Flipphi, Michel; Glaser, Fabian; Gómez-Rodríguez, Elida Y; Gruber, Sabine; Han, Cliff; Henrissat, Bernard; Hermosa, Rosa; Hernández-Oñate, Miguel; Karaffa, Levente; Kosti, Idit; Le Crom, Stéphane; Lindquist, Erika; Lucas, Susan; Lübeck, Mette; Lübeck, Peter S; Margeot, Antoine; Metz, Benjamin; Misra, Monica; Nevalainen, Helena; Omann, Markus; Packer, Nicolle; Perrone, Giancarlo; Uresti-Rivera, Edith E; Salamov, Asaf; Schmoll, Monika; Seiboth, Bernhard; Shapiro, Harris; Sukno, Serenella; Tamayo-Ramos, Juan Antonio; Tisch, Doris; Wiest, Aric; Wilkinson, Heather H; Zhang, Michael; Coutinho, Pedro M; Kenerley, Charles M; Monte, Enrique; Baker, Scott E; Grigoriev, Igor V

2011-01-01

Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocladium virens, teleomorph Hypocrea virens), and a comparison with Trichoderma reesei (teleomorph Hypocrea jecorina). These three Trichoderma species display a remarkable conservation of gene order (78 to 96%), and a lack of active mobile elements probably due to repeat-induced point mutation. Several gene families are expanded in the two mycoparasitic species relative to T. reesei or other ascomycetes, and are overrepresented in non-syntenic genome regions. A phylogenetic analysis shows that T. reesei and T. virens are derived relative to T. atroviride. The mycoparasitism-specific genes thus arose in a common Trichoderma ancestor but were subsequently lost in T. reesei. The data offer a better understanding of mycoparasitism, and thus enforce the development of improved biocontrol strains for efficient and environmentally friendly protection of plants. © 2011 Kubicek et al.; licensee BioMed Central Ltd.
Culture-dependent and culture-independent diversity of Actinobacteria associated with the marine sponge Hymeniacidon perleve from the South China Sea.

PubMed

Sun, Wei; Dai, Shikun; Jiang, Shumei; Wang, Guanghua; Liu, Guohui; Wu, Houbo; Li, Xiang

2010-06-01

In this report, the diversity of Actinobacteria associated with the marine sponge Hymeniacidon perleve collected from a remote island of the South China Sea was investigated employing classical cultivation and characterization, 16S rDNA library construction, 16S rDNA-restriction fragment length polymorphism (rDNA-RFLP) and phylogenetic analysis. A total of 184 strains were isolated using seven different media and 24 isolates were selected according to their morphological characteristics for phylogenetic analysis on the basis of their 16S rRNA gene sequences. Results showed that the 24 isolates were assigned to six genera including Salinispora, Gordonia, Mycobacterium, Nocardia, Rhodococcus and Streptomyces. This is the first report that Salinispora is present in a marine sponge from the South China Sea. Subsequently, 26 rDNA clones were selected from 191 clones in an Actinobacteria-specific 16S rDNA library of the H. perleve sample, using the RFLP technique for sequencing and phylogenetic analysis. In total, 26 phylotypes were clustered in eight known genera of Actinobacteria including Mycobacterium, Amycolatopsis, Arthrobacter, Brevibacterium, Microlunatus, Nocardioides, Pseudonocardia and Streptomyces. This study contributes to our understanding of actinobacterial diversity in the marine sponge H. perleve from the South China Sea.
Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues

NASA Astrophysics Data System (ADS)

Liao, Zhijun; Wang, Xinrui; Zeng, Yeting; Zou, Quan

2016-12-01

The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
Genome analysis of Hibiscus syriacus provides insights of polyploidization and indeterminate flowering in woody plants

PubMed Central

Kim, Yong-Min; Kim, Seungill; Koo, Namjin; Shin, Ah-Young; Yeom, Seon-In; Seo, Eunyoung; Park, Seong-Jin; Kang, Won-Hee; Kim, Myung-Shin; Park, Jieun; Jang, Insu; Kim, Pan-Gyu; Byeon, Iksu; Kim, Min-Seo; Choi, JinHyuk; Ko, Gunhwan; Hwang, JiHye; Yang, Tae-Jin; Choi, Sang-Bong; Lee, Je Min; Lim, Ki-Byung; Lee, Jungho; Choi, Ik-Young; Park, Beom-Seok; Kwon, Suk-Yoon; Choi, Doil

2017-01-01

Abstract Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology. PMID:28011721
Transcriptome profile and unique genetic evolution of positively selected genes in yak lungs.

PubMed

Lan, DaoLiang; Xiong, XianRong; Ji, WenHui; Li, Jian; Mipam, Tserang-Donko; Ai, Yi; Chai, ZhiXin

2018-04-01

The yak (Bos grunniens), which is a unique bovine breed that is distributed mainly in the Qinghai-Tibetan Plateau, is considered a good model for studying plateau adaptability in mammals. The lungs are important functional organs that enable animals to adapt to their external environment. However, the genetic mechanism underlying the adaptability of yak lungs to harsh plateau environments remains unknown. To explore the unique evolutionary process and genetic mechanism of yak adaptation to plateau environments, we performed transcriptome sequencing of yak and cattle (Bos taurus) lungs using RNA-Seq technology and a subsequent comparison analysis to identify the positively selected genes in the yak. After deep sequencing, a normal transcriptome profile of yak lung that containing a total of 16,815 expressed genes was obtained, and the characteristics of yak lungs transcriptome was described by functional analysis. Furthermore, Ka/Ks comparison statistics result showed that 39 strong positively selected genes are identified from yak lungs. Further GO and KEGG analysis was conducted for the functional annotation of these genes. The results of this study provide valuable data for further explorations of the unique evolutionary process of high-altitude hypoxia adaptation in yaks in the Tibetan Plateau and the genetic mechanism at the molecular level.
History of antibiotic adaptation influences microbial evolutionary dynamics during subsequent treatment

PubMed Central

Papin, Jason A.

2017-01-01

Antibiotic regimens often include the sequential changing of drugs to limit the development and evolution of resistance of bacterial pathogens. It remains unclear how history of adaptation to one antibiotic can influence the resistance profiles when bacteria subsequently adapt to a different antibiotic. Here, we experimentally evolved Pseudomonas aeruginosa to six 2-drug sequences. We observed drug order–specific effects, whereby adaptation to the first drug can limit the rate of subsequent adaptation to the second drug, adaptation to the second drug can restore susceptibility to the first drug, or final resistance levels depend on the order of the 2-drug sequence. These findings demonstrate how resistance not only depends on the current drug regimen but also the history of past regimens. These order-specific effects may allow for rational forecasting of the evolutionary dynamics of bacteria given knowledge of past adaptations and provide support for the need to consider the history of past drug exposure when designing strategies to mitigate resistance and combat bacterial infections. PMID:28792497
Genotyping of the fish rhabdovirus, viral haemorrhagic septicaemia virus, by restriction fragment length polymorphisms

USGS Publications Warehouse

Einer-Jensen, Katja; Winton, James R.; Lorenzen, Niels

2005-01-01

The aim of this study was to develop a standardized molecular assay that used limited resources and equipment for routine genotyping of isolates of the fish rhabdovirus, viral haemorrhagic septicaemia virus (VHSV). Computer generated restriction maps, based on 62 unique full-length (1524 nt) sequences of the VHSV glycoprotein (G) gene, were used to predict restriction fragment length polymorphism (RFLP) patterns that were subsequently grouped and compared with a phylogenetic analysis of the G-gene sequences of the same set of isolates. Digestion of PCR amplicons from the full-lengthG-gene by a set of three restriction enzymes was predicted to accurately enable the assignment of the VHSV isolates into the four major genotypes discovered to date. Further sub-typing of the isolates into the recently described sub-lineages of genotype I was possible by applying three additional enzymes. Experimental evaluation of the method consisted of three steps: (i) RT-PCR amplification of the G-gene of VHSV isolates using purified viral RNA as template, (ii) digestion of the PCR products with a panel of restriction endonucleases and (iii) interpretation of the resulting RFLP profiles. The RFLP analysis was shown to approximate the level of genetic discrimination obtained by other, more labour-intensive, molecular techniques such as the ribonuclease protection assay or sequence analysis. In addition, 37 previously uncharacterised isolates from diverse sources were assigned to specific genotypes. While the assay was able to distinguish between marine and continental isolates of VHSV, the differences did not correlate with the pathogenicity of the isolates.
Biosynthesis of the active compounds of Isatis indigotica based on transcriptome sequencing and metabolites profiling

PubMed Central

2013-01-01

Backgroud Isatis indigotica is a widely used herb for the clinical treatment of colds, fever, and influenza in Traditional Chinese Medicine (TCM). Various structural classes of compounds have been identified as effective ingredients. However, little is known at genetics level about these active metabolites. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive dataset of I. indigotica. Results A database of 36,367 unigenes (average length = 1,115.67 bases) was generated by performing transcriptome sequencing. Based on the gene annotation of the transcriptome, 104 unigenes were identified covering most of the catalytic steps in the general biosynthetic pathways of indole, terpenoid, and phenylpropanoid. Subsequently, the organ-specific expression patterns of the genes involved in these pathways, and their responses to methyl jasmonate (MeJA) induction, were investigated. Metabolites profile of effective phenylpropanoid showed accumulation pattern of secondary metabolites were mostly correlated with the transcription of their biosynthetic genes. According to the analysis of UDP-dependent glycosyltransferases (UGT) family, several flavonoids were indicated to exist in I. indigotica and further identified by metabolic profile using UPLC/Q-TOF. Moreover, applying transcriptome co-expression analysis, nine new, putative UGTs were suggested as flavonol glycosyltransferases and lignan glycosyltransferases. Conclusions This database provides a pool of candidate genes involved in biosynthesis of effective metabolites in I. indigotica. Furthermore, the comprehensive analysis and characterization of the significant pathways are expected to give a better insight regarding the diversity of chemical composition, synthetic characteristics, and the regulatory mechanism which operate in this medical herb. PMID:24308360
Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

PubMed

Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

2012-01-01

Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.
Nullomers and High Order Nullomers in Genomic Sequences

PubMed Central

Vergni, Davide; Santoni, Daniele

2016-01-01

A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971
Identification of A Novel Missense Mutation in The Norrie Disease Gene: The First Molecular Genetic Analysis and Prenatal Diagnosis of Norrie Disease in An Iranian Family.

PubMed

Talebi, Farah; Ghanbari Mardasi, Farideh; Mohammadi Asl, Javad; Lashgari, Ali; Farhadi, Freidoon

2018-07-01

Norrie disease (ND) is a rare X-linked recessive disorder, which is characterized by congenital blindness and, in several cases, accompanied with mental retardation and deafness. ND is caused by mutations in NDP, located on the proximal short arm of the X chromosome (Xp11.3). The disease has been observed in many ethnic groups worldwide, however, no such case has been reported from Iran. In this study, we present the molecular analysis of two patients with ND and the subsequent prenatal diagnosis. Screening of NDP identified a hemizygous missense mutation (p.Ser133Cys) in the affected male siblings of the family. The mother was the carrier for the mutation (p.Ser133Cys). In a subsequent chorionic amniotic pregnancy, we carried out prenatal diagnosis by sequencing NDP in the chorionic villi sample at 11 weeks of gestation. The fetus was carrying the mutation and thus unaffected. This is the first mutation report and prenatal diagnosis of an Iranian family with ND, and highlights the importance of prenatal diagnostic screening of this congenital disorder and relevant genetic counseling. Copyright© by Royan Institute. All rights reserved.
Exploring the influence of encoding format on subsequent memory.

PubMed

Turney, Indira C; Dennis, Nancy A; Maillet, David; Rajah, M Natasha

2017-05-01

Distinctive encoding is greatly influenced by gist-based processes and has been shown to suffer when highly similar items are presented in close succession. Thus, elucidating the mechanisms underlying how presentation format affects gist processing is essential in determining the factors that influence these encoding processes. The current study utilised multivariate partial least squares (PLS) analysis to identify encoding networks directly associated with retrieval performance in a blocked and intermixed presentation condition. Subsequent memory analysis for successfully encoded items indicated no significant differences between reaction time and retrieval performance and presentation format. Despite no significant behavioural differences, behaviour PLS revealed differences in brain-behaviour correlations and mean condition activity in brain regions associated with gist-based vs. distinctive encoding. Specifically, the intermixed format encouraged more distinctive encoding, showing increased activation of regions associated with strategy use and visual processing (e.g., frontal and visual cortices, respectively). Alternatively, the blocked format exhibited increased gist-based processes, accompanied by increased activity in the right inferior frontal gyrus. Together, results suggest that the sequence that information is presented during encoding affects the degree to which distinctive encoding is engaged. These findings extend our understanding of the Fuzzy Trace Theory and the role of presentation format on encoding processes.
A novel deletion of SNURF/SNRPN exon 1 in a patient with Prader-Willi-like phenotype.

PubMed

Cao, Yang; AlHumaidi, Susan S; Faqeih, Eissa A; Pitel, Beth A; Lundquist, Patrick; Aypar, Umut

2017-08-01

Here we report the smallest deletion involving SNURF/SNRPN that causes major symptoms of Prader-Willi syndrome (PWS), including hypotonia, dysmorphic features, intellectual disability, and obesity. A female patient with the aforementioned and additional features was referred to the Mayo Clinic Cytogenetics laboratory for genetic testing. Chromosomal microarray analysis and subsequent Sanger sequencing identified a de novo 6.4 kb deletion at 15q11.2, containing exon 1 of the SNURF gene and exon 1 of the shortest isoform of the SNRPN gene. SNURF/SNRPN exon 1, which is methylated on the silent maternal allele, is associated with acetylated histones on the expressed paternal allele. This region also overlaps with the PWS-imprinting center (IC). Subsequent molecular methylation analysis was performed using methylation-specific MLPA (MS-MLPA), which characterized that the deletion of SNURF/SNRPN exon 1 was paternal in origin, consistent with the PWS-like phenotype. Since SNURF/SNRPN gene and the PWS-IC are known to regulate snoRNAs, it is likely that the PWS-like phenotype observed in patients with paternal SNURF/SNRPN deletion is due to the disrupted expression of SNORD116 snoRNAs. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Analysis of the flight dynamics of the Solar Maximum Mission (SMM) off-sun scientific pointing

NASA Technical Reports Server (NTRS)

Pitone, D. S.; Klein, J. R.

1989-01-01

Algorithms are presented which were created and implemented by the Goddard Space Flight Center's (GSFC's) Solar Maximum Mission (SMM) attitude operations team to support large-angle spacecraft pointing at scientific objectives. The mission objective of the post-repair SMM satellite was to study solar phenomena. However, because the scientific instruments, such as the Coronagraph/Polarimeter (CP) and the Hard X ray Burst Spectrometer (HXRBS), were able to view objects other than the Sun, attitude operations support for attitude pointing at large angles from the nominal solar-pointing attitudes was required. Subsequently, attitude support for SMM was provided for scientific objectives such as Comet Halley, Supernova 1987A, Cygnus X-1, and the Crab Nebula. In addition, the analysis was extended to include the reverse problem, computing the right ascension and declination of a body given the off-Sun angles. This analysis led to the computation of the orbits of seven new solar comets seen in the field-of-view (FOV) of the CP. The activities necessary to meet these large-angle attitude-pointing sequences, such as slew sequence planning, viewing-period prediction, and tracking-bias computation are described. Analysis is presented for the computation of maneuvers and pointing parameters relative to the SMM-unique, Sun-centered reference frame. Finally, science data and independent attitude solutions are used to evaluate the large-angle pointing performance.
Analysis of the flight dynamics of the Solar Maximum Mission (SMM) off-sun scientific pointing

NASA Technical Reports Server (NTRS)

Pitone, D. S.; Klein, J. R.; Twambly, B. J.

1990-01-01

Algorithms are presented which were created and implemented by the Goddard Space Flight Center's (GSFC's) Solar Maximum Mission (SMM) attitude operations team to support large-angle spacecraft pointing at scientific objectives. The mission objective of the post-repair SMM satellite was to study solar phenomena. However, because the scientific instruments, such as the Coronagraph/Polarimeter (CP) and the Hard X-ray Burst Spectrometer (HXRBS), were able to view objects other than the Sun, attitude operations support for attitude pointing at large angles from the nominal solar-pointing attitudes was required. Subsequently, attitude support for SMM was provided for scientific objectives such as Comet Halley, Supernova 1987A, Cygnus X-1, and the Crab Nebula. In addition, the analysis was extended to include the reverse problem, computing the right ascension and declination of a body given the off-Sun angles. This analysis led to the computation of the orbits of seven new solar comets seen in the field-of-view (FOV) of the CP. The activities necessary to meet these large-angle attitude-pointing sequences, such as slew sequence planning, viewing-period prediction, and tracking-bias computation are described. Analysis is presented for the computation of maneuvers and pointing parameters relative to the SMM-unique, Sun-centered reference frame. Finally, science data and independent attitude solutions are used to evaluate the larg-angle pointing performance.
Exome analysis in clinical practice: expanding the phenotype of Bartsocas-Papas syndrome.

PubMed

Gripp, Karen W; Ennis, Sara; Napoli, Joseph

2013-05-01

Exome analysis has had a dramatic impact on genetic research. We present the application of such newly generated information to patient care. The patient was a female, born with normal growth parameters to nonconsanguineous parents after an uneventful pregnancy. She had bilateral cleft lip/palate and ankyloblepharon. Sparse hair, dysplastic nails and hypohidrosis were subsequently noted. With exception of speech related issues, her development was normal. A clinical diagnosis of ankyloblepharon-ectodermal defects-cleft lip/palate or Hay-Wells syndrome resulted in TP63 sequence analysis. TP63 sequence and deletion/duplication analysis of all coding exons had a normal result, as did chromosome and SNP array analysis. Diagnostic exome analysis revealed a heterozygous nonsense mutation in KRT83 categorized as deleterious and associated with monilethrix. In addition, a homozygous missense variant of unknown clinical significance was reported in RIPK4. Using research based exome analysis, RIPK4 had just a few months prior been identified as pathogenic for Bartsocas-Papas syndrome. While the clinical diagnostic report implied the KRT83 mutation as a more likely cause for the patient's phenotype, clinical correlation, literature review and use of computerized mutation analysis programs allowed us to identify the homozygous RIPK4 (c.488G > A; p.Gly163Asp) mutation as the underlying pathogenic change. Consequently, we expand the phenotype of Bartsocas-Papas syndrome to an attenuated presentation resembling Hay-Wells syndrome, lacking lethality and pterygia. In contrast to the autosomal dominant Hay-Wells syndrome, Bartsocas-Papas syndrome is autosomal recessive, implying a 25% recurrence risk. Copyright © 2013 Wiley Periodicals, Inc.
Morphological and molecular identification of cryptic species in the Sergentomyia bailyi (Sinton, 1931) complex in Sri Lanka.

PubMed

Tharmatha, T; Gajapathy, K; Ramasamy, R; Surendran, S N

2017-02-01

The correct identification of sand fly vectors of leishmaniasis is important for controlling the disease. Genetic, particularly DNA sequence data, has lately become an important adjunct to the use of morphological criteria for this purpose. A recent DNA sequencing study revealed the presence of two cryptic species in the Sergentomyia bailyi species complex in India. The present study was undertaken to ascertain the presence of cryptic species in the Se. bailyi complex in Sri Lanka using morphological characteristics and DNA sequences from cytochrome c oxidase subunits. Sand flies were collected from leishmaniasis endemic and non-endemic dry zone districts of Sri Lanka. A total of 175 Se. bailyi specimens were initially screened for morphological variations and the identified samples formed two groups, tentatively termed as Se. bailyi species A and B, based on the relative length of the sensilla chaeticum and antennal flagellomere. DNA sequences from the mitochondrial cytochrome c oxidase subunit I (COI) and subunit II (COII) genes of morphologically identified Se. bailyi species A and B were subsequently analyzed. The two species showed differences in the COI and COII gene sequences and were placed in two separate clades by phylogenetic analysis. An allele specific polymerase chain reaction assay based on sequence variation in the COI gene accurately differentiated species A and B. The study therefore describes the first morphological and genetic evidence for the presence of two cryptic species within the Se. bailyi complex in Sri Lanka and a DNA-based laboratory technique for differentiating them.
UVnovo: A De Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry

PubMed Central

Robotham, Scott A.; Horton, Andrew P.; Cannon, Joe R.; Cotham, Victoria C.; Marcotte, Edward M.; Brodbelt, Jennifer S.

2016-01-01

De novo peptide sequencing by mass spectrometry represents an important strategy for characterizing novel peptides and proteins, in which a peptide’s amino acid sequence is inferred directly from the precursor peptide mass and tandem mass spectrum (MS/MS or MS3) fragment ions, without comparison to a reference proteome. This method is ideal for organisms or samples lacking a complete or well-annotated reference sequence set. One of the major barriers to de novo spectral interpretation arises from confusion of N- and C-terminal ion series due to the symmetry between b and y ion pairs created by collisional activation methods (or c, z ions for electron-based activation methods). This is known as the ‘antisymmetric path problem’ and leads to inverted amino acid subsequences within a de novo reconstruction. Here, we combine several key strategies for de novo peptide sequencing into a single high-throughput pipeline: high efficiency carbamylation blocks lysine side chains, and subsequent tryptic digestion and N-terminal peptide derivatization with the ultraviolet chromophore AMCA yields peptides susceptible to 351 nm ultraviolet photodissociation (UVPD). UVPD-MS/MS of the AMCA-modified peptides then predominantly produces y ions in the MS/MS spectra, specifically addressing the antisymmetric path problem. Finally, the program UVnovo applies a random forest algorithm to automatically learn from and then interpret UVPD mass spectra, passing results to a hidden Markov model for de novo sequence prediction and scoring. We show this combined strategy provides high performance de novo peptide sequencing, enabling the de novo sequencing of thousands of peptides from an E. coli lysate at high confidence. PMID:26938041

Piroplasms in brown hyaenas (Parahyaena brunnea) and spotted hyaenas (Crocuta crocuta) in Namibia and South Africa are closely related to Babesia lengau.

PubMed

Burroughs, Richard E J; Penzhorn, Barend L; Wiesel, Ingrid; Barker, Nancy; Vorster, Ilse; Oosthuizen, Marinda C

2017-02-01

The objective of our study was identification and molecular characterization of piroplasms and rickettsias occurring in brown (Parahyaena brunnea) and spotted hyaenas (Crocuta crocuta) from various localities in Namibia and South Africa. Whole blood (n = 59) and skin (n = 3) specimens from brown (n = 15) and spotted hyaenas (n = 47) were screened for the presence of Babesia, Theileria, Ehrlichia and Anaplasma species using the reverse line blot (RLB) hybridization technique. PCR products of 52/62 (83.9%) of the specimens hybridized only with the Theileria/Babesia genus-specific probes and not with any of the species-specific probes, suggesting the presence of a novel species or variant of a species. No Ehrlichia and/or Anaplasma species DNA could be detected. A parasite 18S ribosomal RNA gene of brown (n = 3) and spotted hyaena (n = 6) specimens was subsequently amplified and cloned, and the recombinants were sequenced. Homologous sequence searches of databases indicated that the obtained sequences were most closely related to Babesia lengau, originally described from cheetahs (Acinonyx jubatus). Observed sequence similarities were subsequently confirmed by phylogenetic analyses which showed that the obtained hyaena sequences formed a monophyletic group with B. lengau, B abesia conradae and sequences previously isolated from humans and wildlife in the western USA. Within the B. lengau clade, the obtained sequences and the published B. lengau sequences were grouped into six distinct groups, of which groups I to V represented novel B. lengau genotypes and/or gene variants. We suggest that these genotypes cannot be classified as new Babesia species, but rather as variants of B. lengau. This is the first report of occurrence of piroplasms in brown hyaenas.
The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes.

PubMed

Utro, Filippo; Di Benedetto, Valeria; Corona, Davide F V; Giancarlo, Raffaele

2016-03-15

Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter 'encoding'. Supplementary data are available at Bioinformatics online. futro@us.ibm.com. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi.

PubMed

Song, Zhangyong; Yin, Youping; Jiang, Shasha; Liu, Juanjuan; Chen, Huan; Wang, Zhongkang

2013-06-19

Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia.
Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach

Treesearch

D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr

2008-01-01

High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

DOE PAGES

Lux, Markus; Kruger, Jan; Rinke, Christian; ...

2016-12-20

A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lux, Markus; Kruger, Jan; Rinke, Christian

A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
Exome sequencing identifies a DNAJB6 mutation in a family with dominantly-inherited limb-girdle muscular dystrophy.

PubMed

Couthouis, Julien; Raphael, Alya R; Siskind, Carly; Findlay, Andrew R; Buenrostro, Jason D; Greenleaf, William J; Vogel, Hannes; Day, John W; Flanigan, Kevin M; Gitler, Aaron D

2014-05-01

Limb-girdle muscular dystrophy primarily affects the muscles of the hips and shoulders (the "limb-girdle" muscles), although it is a heterogeneous disorder that can present with varying symptoms. There is currently no cure. We sought to identify the genetic basis of limb-girdle muscular dystrophy type 1 in an American family of Northern European descent using exome sequencing. Exome sequencing was performed on DNA samples from two affected siblings and one unaffected sibling and resulted in the identification of eleven candidate mutations that co-segregated with the disease. Notably, this list included a previously reported mutation in DNAJB6, p.Phe89Ile, which was recently identified as a cause of limb-girdle muscular dystrophy type 1D. Additional family members were Sanger sequenced and the mutation in DNAJB6 was only found in affected individuals. Subsequent haplotype analysis indicated that this DNAJB6 p.Phe89Ile mutation likely arose independently of the previously reported mutation. Since other published mutations are located close by in the G/F domain of DNAJB6, this suggests that the area may represent a mutational hotspot. Exome sequencing provided an unbiased and effective method for identifying the genetic etiology of limb-girdle muscular dystrophy type 1 in a previously genetically uncharacterized family. This work further confirms the causative role of DNAJB6 mutations in limb-girdle muscular dystrophy type 1D. Copyright © 2014 Elsevier B.V. All rights reserved.
History of CRISPR-Cas from Encounter with a Mysterious Repeated Sequence to Genome Editing Technology.

PubMed

Ishino, Yoshizumi; Krupovic, Mart; Forterre, Patrick

2018-04-01

Clustered regularly interspaced short palindromic repeat (CRISPR)-Cas systems are well-known acquired immunity systems that are widespread in archaea and bacteria. The RNA-guided nucleases from CRISPR-Cas systems are currently regarded as the most reliable tools for genome editing and engineering. The first hint of their existence came in 1987, when an unusual repetitive DNA sequence, which subsequently was defined as a CRISPR, was discovered in the Escherichia coli genome during an analysis of genes involved in phosphate metabolism. Similar sequence patterns were then reported in a range of other bacteria as well as in halophilic archaea, suggesting an important role for such evolutionarily conserved clusters of repeated sequences. A critical step toward functional characterization of the CRISPR-Cas systems was the recognition of a link between CRISPRs and the associated Cas proteins, which were initially hypothesized to be involved in DNA repair in hyperthermophilic archaea. Comparative genomics, structural biology, and advanced biochemistry could then work hand in hand, not only culminating in the explosion of genome editing tools based on CRISPR-Cas9 and other class II CRISPR-Cas systems but also providing insights into the origin and evolution of this system from mobile genetic elements denoted casposons. To celebrate the 30th anniversary of the discovery of CRISPR, this minireview briefly discusses the fascinating history of CRISPR-Cas systems, from the original observation of an enigmatic sequence in E. coli to genome editing in humans. Copyright © 2018 American Society for Microbiology.
Structural analysis of the human U3 ribonucleoprotein particle reveal a conserved sequence available for base pairing with pre-rRNA.

PubMed Central

Parker, K A; Steitz, J A

1987-01-01

The human U3 ribonucleoprotein (RNP) has been analyzed to determine its protein constituents, sites of protein-RNA interaction, and RNA secondary structure. By using anti-U3 RNP antibodies and extracts prepared from HeLa cells labeled in vivo, the RNP was found to contain four nonphosphorylated proteins of 36, 30, 13, and 12.5 kilodaltons and two phosphorylated proteins of 74 and 59 kilodaltons. U3 nucleotides 72-90, 106-121, 154-166, and 190-217 must contain sites that interact with proteins since these regions are immunoprecipitated after treatment of the RNP with RNase A or T1. The secondary structure was probed with specific nucleases and by chemical modification with single-strand-specific reagents that block subsequent reverse transcription. Regions that are single stranded (and therefore potentially able to interact with a substrate RNA) include an evolutionarily conserved sequence at nucleotides 104-112 and nonconserved sequences at nucleotides 65-74, 80-84, and 88-93. Nucleotides 159-168 do not appear to be highly accessible, thus making it unlikely that this U3 sequence base pairs with sequences near the 5.8S rRNA-internal transcribed spacer II junction, as previously proposed. Alternative functions of the U3 RNP are discussed, including the possibility that U3 may participate in a processing event near the 3' end of 28S rRNA. Images PMID:2959855
Identification, cloning, and sequencing of a fragment of Amsacta moorei entomopoxvirus DNA containing the spheroidin gene and three vaccinia virus-related open reading frames.

PubMed Central

Hall, R L; Moyer, R W

1991-01-01

Entomopoxvirus virions are frequently contained within crystalline occlusion bodies, which are composed of primarily a single protein, spheroidin, which is analogous to the polyhedrin protein of baculovirus. The spheroidin gene of Amsacta moorei entomopoxvirus was identified following the microsequencing of polypeptides generated from cyanogen bromide treatment of spheroidin and the subsequent synthesis of oligonucleotide hybridization probes. DNA sequencing of a 6.8-kb region of DNA containing the spheroidin gene showed that the spheroidin protein is derived from a 3.0-kb open reading frame potentially encoding a protein of 115 kDa. Three copies of the heptanucleotide, TTTTTNT, a sequence associated with early gene transcription in the vertebrate poxviruses, and four in-frame translational termination signals were found within 60 bp upstream of the putative spheroidin gene promoter (TAAATG). The spheroidin gene promoter region contains the sequence TAAATG, which is found in many late promoters of the vertebrate poxviruses and which serves as the site of transcriptional initiation, as shown by primer extension. Primer extension experiments also showed that spheroidin gene transcripts contain 5' poly(A) sequences typical of vertebrate poxvirus late transcripts. The 92 bases upstream of the initiating TAAATG are unusually A + T rich and contain only 7 G or C residues. An analysis of open reading frames around the spheroidin gene suggests that the colinear core of "essential genes" typical of the vertebrate poxviruses is absent in A. moorei entomopoxvirus. Images PMID:1942245
Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

PubMed

Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

2013-06-01

Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools. © 2013 John Wiley & Sons Ltd.
A semi-Markov model for mitosis segmentation in time-lapse phase contrast microscopy image sequences of stem cell populations.

PubMed

Liu, An-An; Li, Kang; Kanade, Takeo

2012-02-01

We propose a semi-Markov model trained in a max-margin learning framework for mitosis event segmentation in large-scale time-lapse phase contrast microscopy image sequences of stem cell populations. Our method consists of three steps. First, we apply a constrained optimization based microscopy image segmentation method that exploits phase contrast optics to extract candidate subsequences in the input image sequence that contains mitosis events. Then, we apply a max-margin hidden conditional random field (MM-HCRF) classifier learned from human-annotated mitotic and nonmitotic sequences to classify each candidate subsequence as a mitosis or not. Finally, a max-margin semi-Markov model (MM-SMM) trained on manually-segmented mitotic sequences is utilized to reinforce the mitosis classification results, and to further segment each mitosis into four predefined temporal stages. The proposed method outperforms the event-detection CRF model recently reported by Huh as well as several other competing methods in very challenging image sequences of multipolar-shaped C3H10T1/2 mesenchymal stem cells. For mitosis detection, an overall precision of 95.8% and a recall of 88.1% were achieved. For mitosis segmentation, the mean and standard deviation for the localization errors of the start and end points of all mitosis stages were well below 1 and 2 frames, respectively. In particular, an overall temporal location error of 0.73 ± 1.29 frames was achieved for locating daughter cell birth events.
Chromosomal Organization and Sequence Diversity of Genes Encoding Lachrymatory Factor Synthase in Allium cepa L.

PubMed Central

Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi,, Naoki; Shigyo, Masayoshi

2012-01-01

Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum–shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F2 mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5. PMID:22690373
The crystal structure of a bacterial Sufu-like protein defines a novel group of bacterial proteins that are similar to the N-terminal domain of human Sufu

PubMed Central

Das, Debanu; Finn, Robert D; Abdubek, Polat; Astakhova, Tamara; Axelrod, Herbert L; Bakolitsa, Constantina; Cai, Xiaohui; Carlton, Dennis; Chen, Connie; Chiu, Hsiu-Ju; Chiu, Michelle; Clayton, Thomas; Deller, Marc C; Duan, Lian; Ellrott, Kyle; Farr, Carol L; Feuerhelm, Julie; Grant, Joanna C; Grzechnik, Anna; Han, Gye Won; Jaroszewski, Lukasz; Jin, Kevin K; Klock, Heath E; Knuth, Mark W; Kozbial, Piotr; Sri Krishna, S; Kumar, Abhinav; Lam, Winnie W; Marciano, David; Miller, Mitchell D; Morse, Andrew T; Nigoghossian, Edward; Nopakun, Amanda; Okach, Linda; Puckett, Christina; Reyes, Ron; Tien, Henry J; Trame, Christine B; van den Bedem, Henry; Weekes, Dana; Wooten, Tiffany; Xu, Qingping; Yeh, Andrew; Zhou, Jiadong; Hodgson, Keith O; Wooley, John; Elsliger, Marc-André; Deacon, Ashley M; Godzik, Adam; Lesley, Scott A; Wilson, Ian A

2010-01-01

Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam. PMID:20836087
Culturable bacteria present in the fluid of the hooded-pitcher plant Sarracenia minor based on 16S rDNA gene sequence data.

PubMed

Siragusa, Alex J; Swenson, Janice E; Casamatta, Dale A

2007-08-01

The culturable microbial community within the pitcher fluid of 93 Sarracenia minor carnivorous plants was examined over a 2-year study. Many aspects of the plant/bacterial/insect interaction within the pitcher fluid are minimally understood because the bacterial taxa present in these pitchers have not been identified. Thirteen isolates were characterized by 16S rDNA sequencing and subsequent phylogenetic analysis. The Proteobacteria were the most abundant taxa and included representatives from Serratia, Achromobacter, and Pantoea. The Actinobacteria Micrococcus was also abundant while Bacillus, Lactococcus, Chryseobacterium, and Rhodococcus were infrequently encountered. Several isolates conformed to species identifiers (>98% rDNA gene sequence similarity) including Serratia marcescens (isolates found in 27.5% of pitchers), Achromobacter xylosoxidans (37.6%), Micrococcus luteus (40.9%), Bacillus cereus (isolates found in 10.2%), Bacillus thuringiensis (5.4%), Lactococcus lactis (17.2%), and Rhodococcus equi (2.2%). Species-area curves suggest that sampling efforts were sufficient to recover a representative culturable bacterial community. The bacteria present represent a diverse community probably as a result of introduction by insect vectors, but the ecological significance remains under explored.
Evolutionary history of versatile-lipases from Agaricales through reconstruction of ancestral structures.

PubMed

Barriuso, Jorge; Martínez, María Jesús

2017-01-03

Fungal "Versatile carboxylic ester hydrolases" are enzymes with great biotechnological interest. Here we carried out a bioinformatic screening to find these proteins in genomes from Agaricales, by means of searching for conserved motifs, sequence and phylogenetic analysis, and three-dimensional modeling. Moreover, we reconstructed the molecular evolution of these enzymes along the time by inferring and analyzing the sequence of ancestral intermediate forms. The properties of the ancestral candidates are discussed on the basis of their three-dimensional structural models, the hydrophobicity of the lid, and the substrate binding intramolecular tunnel, revealing all of them featured properties of these enzymes. The evolutionary history of the putative lipases revealed an increase on the length and hydrophobicity of the lid region, as well as in the size of the substrate binding pocket, during evolution time. These facts suggest the enzymes' specialization towards certain substrates and their subsequent loss of promiscuity. These results bring to light the presence of different pools of lipases in fungi with different habitats and life styles. Despite the consistency of the data gathered from reconstruction of ancestral sequences, the heterologous expression of some of these candidates would be essential to corroborate enzymes' activities.
Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

PubMed

Loots, Gabriela G

2008-01-01

Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.
In silico analysis of β-1,3-glucanase from a psychrophilic yeast, Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Mohammadi, Salimeh; Bakar, Farah Diba Abu; Rabu, Amir; Murad, Abdul Munir Abdul

2014-09-01

1,3-beta-glucanase is an industrially important enzyme having wide range of applications especially in food industry. It is crucial to gain an understanding about the structure and functional aspects of various beta-1,3-glucanase produced from diverse sources. In this, study a cDNA encoding β-1,3-glucanase (GaExg55) was isolated from a psychrophilic yeast, Glaciozyma antarctica PI12. The cDNA sequence has been submitted to Genbank with an accession number (KJ436377). Subsequently, the perdition protein was analyzed using various bioinformatics tools to explore the properties of the protein. GaEXG55 is consisting of 1,440-bp nucleotides encoding 480 amino acid residues. Alignment of the deduced amino acid for GaExg55 with other exo-β-1,3-glucanase available at the NCBI database indicate that deduced amino acids shared a consensus motif NEP, which is signature pattern of GH5 hydrolases. Predicted molecular weight of GaExg55 is 53.66 kDa. GaExg55 sequences possesses signal peptide sequence and it is highly conserved with other fungal exo-beta-1,3 glucanase.
Seasonal and regional diversity of maple sap microbiota revealed using community PCR fingerprinting and 16S rRNA gene clone libraries.

PubMed

Filteau, Marie; Lagacé, Luc; LaPointe, Gisèle; Roy, Denis

2010-04-01

An arbitrary primed community PCR fingerprinting technique based on capillary electrophoresis was developed to study maple sap microbial community characteristics among 19 production sites in Québec over the tapping season. Presumptive fragment identification was made with corresponding fingerprint profiles of bacterial isolate cultures. Maple sap microbial communities were subsequently compared using a representative subset of 13 16S rRNA gene clone libraries followed by gene sequence analysis. Results from both methods indicated that all maple sap production sites and flow periods shared common microbiota members, but distinctive features also existed. Changes over the season in relative abundance of predominant populations showed evidence of a common pattern. Pseudomonas (64%) and Rahnella (8%) were the most abundantly and frequently represented genera of the 2239 sequences analyzed. Janthinobacterium, Leuconostoc, Lactococcus, Weissella, Epilithonimonas and Sphingomonas were revealed as occasional contaminants in maple sap. Maple sap microbiota showed a low level of deep diversity along with a high variation of similar 16S rRNA gene sequences within the Pseudomonas genus. Predominance of Pseudomonas is suggested as a typical feature of maple sap microbiota across geographical regions, production sites, and sap flow periods.
The PLAID graphics analysis impact on the space program

NASA Technical Reports Server (NTRS)

Nguyen, Jennifer P.; Wheaton, Aneice L.; Maida, James C.

1994-01-01

An ongoing project design often requires visual verification at various stages. These requirements are critically important because the subsequent phases of that project might depend on the complete verification of a particular stage. Currently, there are several software packages at JSC that provide such simulation capabilities. We present the simulation capabilities of the PLAID modeling system used in the Flight Crew Support Division for human factors analyses. We summarize some ongoing studies in kinematics, lighting, EVA activities, and discuss various applications in the mission planning of the current Space Shuttle flights and the assembly sequence of the Space Station Freedom with emphasis on the redesign effort.

Image denoising and deblurring using multispectral data

NASA Astrophysics Data System (ADS)

Semenishchev, E. A.; Voronin, V. V.; Marchuk, V. I.

2017-05-01

Currently decision-making systems get widespread. These systems are based on the analysis video sequences and also additional data. They are volume, change size, the behavior of one or a group of objects, temperature gradient, the presence of local areas with strong differences, and others. Security and control system are main areas of application. A noise on the images strongly influences the subsequent processing and decision making. This paper considers the problem of primary signal processing for solving the tasks of image denoising and deblurring of multispectral data. The additional information from multispectral channels can improve the efficiency of object classification. In this paper we use method of combining information about the objects obtained by the cameras in different frequency bands. We apply method based on simultaneous minimization L2 and the first order square difference sequence of estimates to denoising and restoring the blur on the edges. In case of loss of the information will be applied an approach based on the interpolation of data taken from the analysis of objects located in other areas and information obtained from multispectral camera. The effectiveness of the proposed approach is shown in a set of test images.
Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

PubMed

Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui

2012-11-07

RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Characterization and in-vivo evaluation of potential probiotics of the bacterial flora within the water column of a healthy shrimp larviculture system

NASA Astrophysics Data System (ADS)

Xue, Ming; Liang, Huafang; He, Yaoyao; Wen, Chongqing

2016-05-01

A thorough understanding of the normal bacterial flora associated with shrimp larviculture systems contributes to probiotic screening and disease control. The bacterial community of the water column over a commercial Litopenaeus vannamei larval rearing run was characterized with both culture-dependent and culture-independent methods. A total of 27 phylotypes at the species level were isolated and identified based on 16S rDNA sequence analysis. Denaturing gradient gel electrophoresis (DGGE) analysis of the V3-V5 region of 16S rRNA genes showed a dynamic bacterial community with major changes occurred from stages zoea to mysis during the rearing run. The sequences retrieved were affiliated to four phyla, Proteobacteria, Actinobacteria, Bacteroidetes, and Firmicutes, with the family Rhodobacteraceae being the most frequently recovered one. Subsequently, 13 representative strains conferred higher larval survival than the control when evaluated in the in-vivo experiments; in particular, three candidates, assigned to Phaeobacter sp., Arthrobacter sp., and Microbacterium sp., significantly improved larval survival ( P < 0.05). Therefore, the healthy shrimp larviculture system harbored a diverse and favorable bacterial flora, which contribute to larval development and are of great importance in exploiting novel probiotics.
Molecular characterization of influenza B virus outbreak on a cruise ship in Brazil 2012.

PubMed

Borborema, Samanta Etel Treiger; Silva, Daniela Bernardes Borges da; Silva, Kátia Corrêa Oliveira; Pinho, Margarete Aparecida Benega; Curti, Suely Pires; Paiva, Terezinha Maria de; Santos, Cecília Luiza Simões

2014-01-01

In February 2012, an outbreak of respiratory illness occurred on the cruise ship MSC Armonia in Brazil. A 31-year-old female crew member was hospitalized with respiratory failure and subsequently died. To study the etiology of the respiratory illness, tissue taken at necropsy from the deceased woman and respiratory specimens from thirteen passengers and crew members with respiratory symptoms were analyzed. Influenza real-time RT-PCR assays were performed, and the full-length hemagglutinin (HA) gene of influenza-positive samples was sequenced. Influenza B virus was detected in samples from seven of the individuals, suggesting that it was the cause of this respiratory illness outbreak. The sequence analysis of the HA gene indicated that the virus was closely related to the B/Brisbane/60/2008-like virus, Victoria lineage, a virus contained in the 2011-12 influenza vaccine for the Southern Hemisphere. Since the recommended composition of the influenza vaccine for use during the 2013 season changed, an intensive surveillance of viruses circulating worldwide is crucial. Molecular analysis is an important tool to characterize the pathogen responsible for an outbreak such as this. In addition, laboratory disease surveillance contributes to the control measures for vaccine-preventable influenza.
Small-scale enzymatic digestion of glycoproteins and proteoglycans for analysis of oligosaccharides by LC-MS and FACE gel electrophoresis.

PubMed

Estrella, Ruby P; Whitelock, John M; Roubin, Rebecca H; Packer, Nicolle H; Karlsson, Niclas G

2009-01-01

Structural characterization of oligosaccharides from proteoglycans and other glycoproteins is greatly enhanced through the use of mass spectrometry and gel electrophoresis. Sample preparation for these sensitive techniques often requires enzymatic treatments to produce oligosaccharide sequences for subsequent analysis. This chapter describes several small-scale methods for in-gel, on-blot, and in-solution enzymatic digestions in preparation for graphitized carbon liquid chromatography-mass spectrometry (LC-MS) analysis, with specific applications indicated for glycosaminoglycans (GAGs) and N-linked oligosaccharides. In addition, accompanying procedures for oligosaccharide reduction by sodium borohydride, sample desalting via carbon microcolumn, desialylation by sialidase enzyme treatment, and small-scale oligosaccharide species fractionation are included. Fluorophore-assisted carbohydrate electrophoresis (FACE) is another useful method to isolate derivatized oligosaccharides. Overall, the modularity of these techniques provides ease and flexibility for use in conjunction with mass spectrometric and electrophoretic tools for glycomic research studies.
Characterizing the rapid spread of porcine epidemic diarrhea virus (PEDV) through an animal food manufacturing facility

PubMed Central

Schumacher, Loni L.; Huss, Anne R.; Cochrane, Roger A.; Stark, Charles R.; Woodworth, Jason C.; Bai, Jianfa; Poulsen, Elizabeth G.; Chen, Qi; Main, Rodger G.; Zhang, Jianqiang; Gauger, Phillip C.; Ramirez, Alejandro; Derscheid, Rachel J.; Magstadt, Drew M.; Dritz, Steve S.

2017-01-01

New regulatory and consumer demands highlight the importance of animal feed as a part of our national food safety system. Porcine epidemic diarrhea virus (PEDV) is the first viral pathogen confirmed to be widely transmissible in animal food. Because the potential for viral contamination in animal food is not well characterized, the objectives of this study were to 1) observe the magnitude of virus contamination in an animal food manufacturing facility, and 2) investigate a proposed method, feed sequencing, to decrease virus decontamination on animal food-contact surfaces. A U.S. virulent PEDV isolate was used to inoculate 50 kg swine feed, which was mixed, conveyed, and discharged into bags using pilot-scale feed manufacturing equipment. Surfaces were swabbed and analyzed for the presence of PEDV RNA by quantitative real-time polymerase chain reaction (qPCR). Environmental swabs indicated complete contamination of animal food-contact surfaces (0/40 vs. 48/48, positive baseline samples/total baseline samples, positive subsequent samples/total subsequent samples, respectively; P < 0.05) and near complete contamination of non-animal food-contact surfaces (0/24 vs. 16/18, positive baseline samples/total baseline samples, positive subsequent samples/total subsequent samples, respectively; P < 0.05). Flushing animal food-contact surfaces with low-risk feed is commonly used to reduce cross-contamination in animal feed manufacturing. Thus, four subsequent 50 kg batches of virus-free swine feed were manufactured using the same system to test its impact on decontaminating animal food-contact surfaces. Even after 4 subsequent sequences, animal food-contact surfaces retained viral RNA (28/33 positive samples/total samples), with conveying system being more contaminated than the mixer. A bioassay to test infectivity of dust from animal food-contact surfaces failed to produce infectivity. This study demonstrates the potential widespread viral contamination of surfaces in an animal food manufacturing facility and the difficulty of removing contamination using conventional feed sequencing, which underscores the importance for preventing viruses from entering and contaminating such facilities. PMID:29095859
A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

PubMed Central

Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth

2016-01-01

Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220
Visually driven chaining of elementary swim patterns into a goal-directed motor sequence: a virtual reality study of zebrafish prey capture.

PubMed

Trivedi, Chintan A; Bollmann, Johann H

2013-01-01

Prey capture behavior critically depends on rapid processing of sensory input in order to track, approach, and catch the target. When using vision, the nervous system faces the problem of extracting relevant information from a continuous stream of input in order to detect and categorize visible objects as potential prey and to select appropriate motor patterns for approach. For prey capture, many vertebrates exhibit intermittent locomotion, in which discrete motor patterns are chained into a sequence, interrupted by short periods of rest. Here, using high-speed recordings of full-length prey capture sequences performed by freely swimming zebrafish larvae in the presence of a single paramecium, we provide a detailed kinematic analysis of first and subsequent swim bouts during prey capture. Using Fourier analysis, we show that individual swim bouts represent an elementary motor pattern. Changes in orientation are directed toward the target on a graded scale and are implemented by an asymmetric tail bend component superimposed on this basic motor pattern. To further investigate the role of visual feedback on the efficiency and speed of this complex behavior, we developed a closed-loop virtual reality setup in which minimally restrained larvae recapitulated interconnected swim patterns closely resembling those observed during prey capture in freely moving fish. Systematic variation of stimulus properties showed that prey capture is initiated within a narrow range of stimulus size and velocity. Furthermore, variations in the delay and location of swim triggered visual feedback showed that the reaction time of secondary and later swims is shorter for stimuli that appear within a narrow spatio-temporal window following a swim. This suggests that the larva may generate an expectation of stimulus position, which enables accelerated motor sequencing if the expectation is met by appropriate visual feedback.
Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism.

PubMed

van der Meulen, Sjoerd B; de Jong, Anne; Kok, Jan

2016-01-01

RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.
A functional analysis of the spacer of V(D)J recombination signal sequences.

PubMed

Lee, Alfred Ian; Fugmann, Sebastian D; Cowell, Lindsay G; Ptaszek, Leon M; Kelsoe, Garnett; Schatz, David G

2003-10-01

During lymphocyte development, V(D)J recombination assembles antigen receptor genes from component V, D, and J gene segments. These gene segments are flanked by a recombination signal sequence (RSS), which serves as the binding site for the recombination machinery. The murine Jbeta2.6 gene segment is a recombinationally inactive pseudogene, but examination of its RSS reveals no obvious reason for its failure to recombine. Mutagenesis of the Jbeta2.6 RSS demonstrates that the sequences of the heptamer, nonamer, and spacer are all important. Strikingly, changes solely in the spacer sequence can result in dramatic differences in the level of recombination. The subsequent analysis of a library of more than 4,000 spacer variants revealed that spacer residues of particular functional importance are correlated with their degree of conservation. Biochemical assays indicate distinct cooperation between the spacer and heptamer/nonamer along each step of the reaction pathway. The results suggest that the spacer serves not only to ensure the appropriate distance between the heptamer and nonamer but also regulates RSS activity by providing additional RAG:RSS interaction surfaces. We conclude that while RSSs are defined by a "digital" requirement for absolutely conserved nucleotides, the quality of RSS function is determined in an "analog" manner by numerous complex interactions between the RAG proteins and the less-well conserved nucleotides in the heptamer, the nonamer, and, importantly, the spacer. Those modulatory effects are accurately predicted by a new computational algorithm for "RSS information content." The interplay between such binary and multiplicative modes of interactions provides a general model for analyzing protein-DNA interactions in various biological systems.
Two COWP-like cysteine rich proteins from Eimeria nieschulzi (coccidia, apicomplexa) are expressed during sporulation and involved in the sporocyst wall formation.

PubMed

Jonscher, Ernst; Erdbeer, Alexander; Günther, Marie; Kurth, Michael

2015-07-25

The family of cysteine rich proteins of the oocyst wall (COWPs) originally described in Cryptosporidium can also be found in Toxoplasma gondii (TgOWPs) localised to the oocyst wall as well. Genome sequence analysis of Eimeria suggests that these proteins may also exist in this genus and led us to the assumption that these proteins may also play a role in oocyst wall formation. In this study, COWP-like encoding sequences had been identified in Eimeria nieschulzi. The predicted gene sequences were subsequently utilized in reporter gene assays to observe time of expression and localisation of the reporter protein in vivo. Both investigated proteins, EnOWP2 and EnOWP6, were expressed during sporulation. The EnOWP2-promoter driven mCherry was found in the cytoplasm and the EnOWP2, respectively EnOWP6, fused to mCherry was initially observed in the extracytoplasmatic space between sporoblast and oocyst wall. This, so far unnamed compartment was designated as circumplasm. Later, the mCherry reporter co-localised with the sporocyst wall of the sporulated oocysts. This observation had been confirmed by confocal microscopy, excystation experiments and IFA. Transcript analysis revealed the intron-exon structure of these genes and confirmed the expression of EnOWP2 and EnOWP6 during sporogony. Our results allow us to assume a role, of both investigated EnOWP proteins, in the sporocyst wall formation of E. nieschulzi. Data mining and sequence comparisons to T. gondii and other Eimeria species allow us to hypothesise a conserved process within the coccidia. A role in oocyst wall formation had not been observed in E. nieschulzi.
Viral Diversity Threshold for Adaptive Immunity in Prokaryotes

PubMed Central

Weinberger, Ariel D.; Wolf, Yuri I.; Lobkovsky, Alexander E.; Gilmore, Michael S.; Koonin, Eugene V.

2012-01-01

ABSTRACT Bacteria and archaea face continual onslaughts of rapidly diversifying viruses and plasmids. Many prokaryotes maintain adaptive immune systems known as clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (Cas). CRISPR-Cas systems are genomic sensors that serially acquire viral and plasmid DNA fragments (spacers) that are utilized to target and cleave matching viral and plasmid DNA in subsequent genomic invasions, offering critical immunological memory. Only 50% of sequenced bacteria possess CRISPR-Cas immunity, in contrast to over 90% of sequenced archaea. To probe why half of bacteria lack CRISPR-Cas immunity, we combined comparative genomics and mathematical modeling. Analysis of hundreds of diverse prokaryotic genomes shows that CRISPR-Cas systems are substantially more prevalent in thermophiles than in mesophiles. With sequenced bacteria disproportionately mesophilic and sequenced archaea mostly thermophilic, the presence of CRISPR-Cas appears to depend more on environmental temperature than on bacterial-archaeal taxonomy. Mutation rates are typically severalfold higher in mesophilic prokaryotes than in thermophilic prokaryotes. To quantitatively test whether accelerated viral mutation leads microbes to lose CRISPR-Cas systems, we developed a stochastic model of virus-CRISPR coevolution. The model competes CRISPR-Cas-positive (CRISPR-Cas+) prokaryotes against CRISPR-Cas-negative (CRISPR-Cas−) prokaryotes, continually weighing the antiviral benefits conferred by CRISPR-Cas immunity against its fitness costs. Tracking this cost-benefit analysis across parameter space reveals viral mutation rate thresholds beyond which CRISPR-Cas cannot provide sufficient immunity and is purged from host populations. These results offer a simple, testable viral diversity hypothesis to explain why mesophilic bacteria disproportionately lack CRISPR-Cas immunity. More generally, fundamental limits on the adaptability of biological sensors (Lamarckian evolution) are predicted. PMID:23221803
A novel recurrent mutation in MITF predisposes to familial and sporadic melanoma

PubMed Central

Yokoyama, Satoru; Woods, Susan L.; Boyle, Glen M.; Aoude, Lauren G.; MacGregor, Stuart; Zismann, Victoria; Gartside, Michael; Cust, Anne E.; Haq, Rizwan; Harland, Mark; Taylor, John C.; Duffy, David L.; Holohan, Kelly; Dutton-Regester, Ken; Palmer, Jane M.; Bonazzi, Vanessa; Stark, Mitchell S.; Symmons, Judith; Law, Matthew H.; Schmidt, Christopher; Lanagan, Cathy; O’Connor, Linda; Holland, Elizabeth A.; Schmid, Helen; Maskiell, Judith A.; Jetann, Jodie; Ferguson, Megan; Jenkins, Mark A.; Kefford, Richard F.; Giles, Graham G.; Armstrong, Bruce K.; Aitken, Joanne F.; Hopper, John L.; Whiteman, David C.; Pharoah, Paul D.; Easton, Douglas F.; Dunning, Alison M.; Newton-Bishop, Julia A.; Montgomery, Grant W.; Martin, Nicholas G.; Mann, Graham J.; Bishop, D. Timothy; Tsao, Hensin; Trent, Jeffrey M.; Fisher, David E.; Hayward, Nicholas K.; Brown, Kevin M.

2012-01-01

So far, two familial melanoma genes have been identified, accounting for a minority of genetic risk in families. Mutations in CDKN2A account for approximately 40% of familial cases1, and predisposing mutations in CDK4 have been reported in a very small number of melanoma kindreds2. To identify other familial melanoma genes, here we conducted whole-genome sequencing of probands from several melanoma families, identifying one individual carrying a novel germline variant (coding DNA sequence c.G1075A; protein sequence p.E318K; rs149617956) in the melanoma-lineage-specific oncogene microphthalmia-associated transcription factor (MITF). Although the variant co-segregated with melanoma in some but not all cases in the family, linkage analysis of 31 families subsequently identified to carry the variant generated a log odds ratio (lod) score of 2.7 under a dominant model, indicating E318K as a possible intermediate risk variant. Consistent with this, the E318K variant was significantly associated with melanoma in a large Australian case–control sample. Likewise, it was similarly associated in an independent case–control sample from the United Kingdom. In the Australian sample, the variant allele was significantly over-represented in cases with a family history of melanoma, multiple primary melanomas, or both. The variant allele was also associated with increased naevus count and non-blue eye colour. Functional analysis of E318K showed that MITF encoded by the variant allele had impaired sumoylation and differentially regulated several MITF targets. These data indicate that MITF is a melanoma-predisposition gene and highlight the utility of whole-genome sequencing to identify novel rare variants associated with disease susceptibility. PMID:22080950
Development of an Analysis Pipeline Characterizing Multiple Hypervariable Regions of 16S rRNA Using Mock Samples.

PubMed

Barb, Jennifer J; Oler, Andrew J; Kim, Hyung-Suk; Chalmers, Natalia; Wallen, Gwenyth R; Cashion, Ann; Munson, Peter J; Ames, Nancy J

2016-01-01

There is much speculation on which hypervariable region provides the highest bacterial specificity in 16S rRNA sequencing. The optimum solution to prevent bias and to obtain a comprehensive view of complex bacterial communities would be to sequence the entire 16S rRNA gene; however, this is not possible with second generation standard library design and short-read next-generation sequencing technology. This paper examines a new process using seven hypervariable or V regions of the 16S rRNA (six amplicons: V2, V3, V4, V6-7, V8, and V9) processed simultaneously on the Ion Torrent Personal Genome Machine (Life Technologies, Grand Island, NY). Four mock samples were amplified using the 16S Ion Metagenomics Kit™ (Life Technologies) and their sequencing data is subjected to a novel analytical pipeline. Results are presented at family and genus level. The Kullback-Leibler divergence (DKL), a measure of the departure of the computed from the nominal bacterial distribution in the mock samples, was used to infer which region performed best at the family and genus levels. Three different hypervariable regions, V2, V4, and V6-7, produced the lowest divergence compared to the known mock sample. The V9 region gave the highest (worst) average DKL while the V4 gave the lowest (best) average DKL. In addition to having a high DKL, the V9 region in both the forward and reverse directions performed the worst finding only 17% and 53% of the known family level and 12% and 47% of the genus level bacteria, while results from the forward and reverse V4 region identified all 17 family level bacteria. The results of our analysis have shown that our sequencing methods using 6 hypervariable regions of the 16S rRNA and subsequent analysis is valid. This method also allowed for the assessment of how well each of the variable regions might perform simultaneously. Our findings will provide the basis for future work intended to assess microbial abundance at different time points throughout a clinical protocol.
Complete genome sequence of a natural Escherichia coli O145:H11 isolate that belongs to Phylo-group A

USDA-ARS?s Scientific Manuscript database

Escherichia coli O145:H11 strain RM14721 was originally isolated from wildlife feces near a leafy greens-growing region in Yuma, Arizona. This strain was initially positive in stx1; however, in the subsequent cultures, stx1 was not detected by PCR. Here we report the complete genome sequence and ann...
NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

PubMed Central

Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

2011-01-01

Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign. PMID:22073191
NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data.

PubMed

Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

2011-01-01

Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.
A model-based approach for detection of runways and other objects in image sequences acquired using an on-board camera

NASA Technical Reports Server (NTRS)

Kasturi, Rangachar; Devadiga, Sadashiva; Tang, Yuan-Liang

1994-01-01

This research was initiated as a part of the Advanced Sensor and Imaging System Technology (ASSIST) program at NASA Langley Research Center. The primary goal of this research is the development of image analysis algorithms for the detection of runways and other objects using an on-board camera. Initial effort was concentrated on images acquired using a passive millimeter wave (PMMW) sensor. The images obtained using PMMW sensors under poor visibility conditions due to atmospheric fog are characterized by very low spatial resolution but good image contrast compared to those images obtained using sensors operating in the visible spectrum. Algorithms developed for analyzing these images using a model of the runway and other objects are described in Part 1 of this report. Experimental verification of these algorithms was limited to a sequence of images simulated from a single frame of PMMW image. Subsequent development and evaluation of algorithms was done using video image sequences. These images have better spatial and temporal resolution compared to PMMW images. Algorithms for reliable recognition of runways and accurate estimation of spatial position of stationary objects on the ground have been developed and evaluated using several image sequences. These algorithms are described in Part 2 of this report. A list of all publications resulting from this work is also included.
The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat

PubMed Central

Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

2016-01-01

Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. PMID:26645327
Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements

PubMed Central

Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.

2012-01-01

Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921

Some links on this page may take you to non-federal websites. Their policies may differ from this site.